CN105930379A - Method and system for collecting log data by means of interceptor - Google Patents

Method and system for collecting log data by means of interceptor Download PDF

Info

Publication number
CN105930379A
CN105930379A CN201610230525.3A CN201610230525A CN105930379A CN 105930379 A CN105930379 A CN 105930379A CN 201610230525 A CN201610230525 A CN 201610230525A CN 105930379 A CN105930379 A CN 105930379A
Authority
CN
China
Prior art keywords
blocker
module
relevant information
data
log data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610230525.3A
Other languages
Chinese (zh)
Inventor
金晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201610230525.3A priority Critical patent/CN105930379A/en
Publication of CN105930379A publication Critical patent/CN105930379A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Abstract

The invention relates to a method and system for collecting log data by means of an interceptor. The method comprises following steps: S1. an interceptor is established in a customized mode and is placed under a Flume root directory; S2. data is received by the interceptor and unwanted data is filtered to obtain log data which needs to be collected. A large application system usually generates log data having dozens of or even hundreds of fields; when the Flume built-in interceptor is used for collecting logs, the collected logs still have corresponding so many fields; but basically, we do not need pay attention to many of the fields which are of no actual utility value so that data transmission tardiness and storage consumption are caused. According to the method and system, a Flume interceptor is customized so that the fields we want are extracted and some fields can be encrypted. Therefore, data transmission quantity and storage consumption are reduced.

Description

A kind of method and system utilizing blocker to carry out collection of log data
Technical field
The present invention relates to a kind of method and system utilizing blocker to carry out collection of log data, belong to calculating Machine software field.
Background technology
Flume is the High Availabitity that Cloudera provides, highly reliable, distributed massive logs The system gather, being polymerized and transmit, Flume supports to customize Various types of data sender in log system, For collecting data;Meanwhile, Flume provides and data carries out simple process, and writes various data and connect The ability of recipient's (customizable).Flume has the various blocker carried, such as: TimestampInterceptor、HostInterceptor、RegexExtractorInterceptor Deng, by using different blockers, it is achieved different functions.But the above blocker, and The content of original daily record data can not be changed, when a log information has tens the most up to a hundred fields Time, under traditional Flume processes, the daily record collected still has corresponding so many fields.
Summary of the invention
The technical problem to be solved is, according to the demand of practical business, in order to better meet Data are in the process of application layer, it is provided that a kind of by self-defined Flume blocker, filter out unwanted Field, and field encryption is processed, source data is carried out pretreatment.Decrease the transmission quantity of data, fall The method and system utilizing blocker to carry out collection of log data of the expense of low storage.
The technical scheme is that one utilizes blocker to carry out daily record number According to the method collected, specifically include following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection Daily record data.
The invention has the beneficial effects as follows: the daily record data often produced in a big application system has several Ten the most up to a hundred fields, when the blocker using Flume to carry carries out log collection, collect Daily record still has corresponding so many fields, but substantially having a lot of field is that we need not close Note, there is no actual value, so bring the slow expense with storage of transmission performance of data.Logical Cross self-defined Flume blocker, the field that we need can be extracted, and certain field is added Close process.Decrease the transmission quantity of data, reduce the expense of storage.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
Further, described relevant information includes regular expression, the separator of every a line interfield, required Will the separator that uses of the subscript of row field and multiple subscript.
Further, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing Put blocker interface.
Further, relevant information is processed as by relevant information by described step 1.2.2 Unicode code conversion is character string.
Further, described process logic includes single process and batch processing.
Further, described step 2 filters out unwanted field, and field is encrypted.
Use above-mentioned further scheme to provide the benefit that, decrease the transmission quantity of data, and reduce and deposit The expense of storage.
The technical scheme is that one utilizes blocker to carry out daily record number According to the system collected, including blocker module and data filtering module;
Described blocker module is used for one blocker of self-defined structure, and blocker is put into the root of Flume Under catalogue;
Described data filtering module is used for controlling described blocker and receives data and filter out unwanted number According to, obtain the daily record data needing to collect.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described blocker module include class definition module, configuration module, parameter setting module and Catalogue module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
Further, described relevant information includes regular expression, the separator of every a line interfield, required Will the separator that uses of the subscript of row field and multiple subscript.
Accompanying drawing explanation
Fig. 1 is a kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 1 Flow chart;
Fig. 2 is a kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 1 Structured flowchart.
In accompanying drawing, the list of parts representated by each label is as follows:
1, blocker module, 2, data filtering module.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining this Invention, is not intended to limit the scope of the present invention.
As it is shown in figure 1, utilize blocker to carry out daily record data receipts for the one described in the embodiment of the present invention 1 The method of collection, specifically includes following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection Daily record data.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 2, in reality On the basis of executing example 1, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 3, in reality On the basis of executing example 2, described relevant information include regular expression, the separator of every a line interfield, The separator etc. that the subscript of required row field and multiple subscript use.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 4, in reality On the basis of executing example 3, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing Put blocker interface.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 5, in reality On the basis of executing example 4, relevant information is processed as by relevant information by described step 1.2.2 Unicode coding carries out being converted to character string.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 6, in reality On the basis of executing example 4 or 5, described process logic includes single process and batch processing.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 7, in reality On the basis of executing any one of example 1-6, described step 2 filters out unwanted field, and field is entered Row encryption.
As in figure 2 it is shown, utilize blocker to carry out daily record data receipts for the one described in the embodiment of the present invention 1 The system of collection, including blocker module 1 and data filtering module 2;
Blocker, for one blocker of self-defined structure, is put into Flume's by described blocker module 1 Under root;
Described data filtering module 2 is used for controlling described blocker and receives data and filter out unwanted number According to, obtain the daily record data needing to collect.
A kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 2, in reality On the basis of executing example 1, described blocker module 1 includes that class definition module, configuration module, parameter are arranged Module and catalogue module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
A kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 3, in reality On the basis of executing example 2, described relevant information include regular expression, the separator of every a line interfield, The separator etc. that the subscript of required row field and multiple subscript use.
The technical program core includes two parts:
1) java code is write, self-defined blocker: specifically include following steps:
A) one class CustomInterceptor of definition realizes Interceptor interface.
B) at CustomInterceptor apoplexy due to endogenous wind defined variable, these variablees need Flume Configuration file carries out what configuration used.Configuration regular expression (regex), every a line interfield Separator (fields_separator), by separators after, the subscript of required row field (indexs) separator (indexs_separator), multiple subscript that, multiple subscripts use use Separator (indexs_separator).
That c) adds CustomInterceptor has ginseng building method.And to corresponding variable Reason.Carry out being converted to character string by the unicode being transmitted through in configuration file coding.
D) writing concrete logic intercept () method to be processed, one is single process, one It it is batch processing.
E) an internal interface Builder defined in interface, in configure method, carries out one A little parameter configuration.And be given, when not configuring some parameters in the conf of flume, provide its acquiescence Value.By its builder method, return an interceptor object.
F) by above step, the code development of self-defined blocker completes, is then packaged into jar, It is put in the lib under the root of Flume
2) configuration information of amendment Flume:
Enter into the conf under the installation directory of Flume, configure source, channel, sink, and Self-defining blocker is quoted by source, to carrying out at the variable defined in code before Configuration.
Self-defined blocker exploitation completed is packaged into jar, is put into the lib under the root of flume In, enter into the conf under the installation directory of Flume, configure source, channel, sink, and Self-defining blocker is quoted by source, to carrying out at the variable defined in code before Configuration.It is achieved in that the exploitation of the self-defined blocker of whole Flume.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (10)

1. one kind utilizes the method that blocker carries out collection of log data, it is characterised in that specifically include Following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection Daily record data.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 1, its Being characterised by, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 2, its Being characterised by, described relevant information includes regular expression, the separator of every a line interfield, required The separator that the subscript of row field and multiple subscript use.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 3, its Being characterised by, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing Put blocker interface.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 4, its It is characterised by, relevant information is processed as by the unicode in relevant information by described step 1.2.2 Code conversion is character string.
6. according to a kind of side utilizing blocker to carry out collection of log data described in claim 4 or 5 Method, it is characterised in that described process logic includes single process and batch processing.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 1, its It is characterised by, described step 2 filters out unwanted field, and field is encrypted.
8. one kind utilizes the system that blocker carries out collection of log data, it is characterised in that include blocker Module and data filtering module;
Described blocker module is used for one blocker of self-defined structure, and blocker is put into the root of Flume Under catalogue;
Described data filtering module is used for controlling described blocker and receives data and filter out unwanted number According to, obtain the daily record data needing to collect.
A kind of system utilizing blocker to carry out collection of log data the most according to claim 8, its Being characterised by, described blocker module includes class definition module, configuration module, parameter setting module and mesh Record module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
The most according to claim 9 a kind of utilize that blocker carries out collection of log data be System, it is characterised in that described relevant information include regular expression, the separator of every a line interfield, The separator that the subscript of required row field and multiple subscript use.
CN201610230525.3A 2016-04-14 2016-04-14 Method and system for collecting log data by means of interceptor Pending CN105930379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610230525.3A CN105930379A (en) 2016-04-14 2016-04-14 Method and system for collecting log data by means of interceptor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610230525.3A CN105930379A (en) 2016-04-14 2016-04-14 Method and system for collecting log data by means of interceptor

Publications (1)

Publication Number Publication Date
CN105930379A true CN105930379A (en) 2016-09-07

Family

ID=56838120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610230525.3A Pending CN105930379A (en) 2016-04-14 2016-04-14 Method and system for collecting log data by means of interceptor

Country Status (1)

Country Link
CN (1) CN105930379A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106855837A (en) * 2016-12-15 2017-06-16 咪咕文化科技有限公司 A kind of data processing method and device based on Flume
CN107872437A (en) * 2016-09-27 2018-04-03 阿里巴巴集团控股有限公司 A kind of method, apparatus and server for service request
CN108829879A (en) * 2018-06-26 2018-11-16 天津城建大学 A kind of charging pile data monitoring method
CN109327336A (en) * 2018-10-10 2019-02-12 武汉思普崚技术有限公司 A large amount of Firewall Log data fast resolving method and apparatus
CN109460412A (en) * 2018-11-14 2019-03-12 北京锐安科技有限公司 Data aggregation method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150155A (en) * 2011-12-07 2013-06-12 金蝶软件(中国)有限公司 Data interception method and device
CN105005549A (en) * 2015-07-31 2015-10-28 山东蚁巡网络科技有限公司 User-defined chained log analysis device and method
US20160077932A1 (en) * 2014-01-20 2016-03-17 International Business Machines Corporation High availability cache in server cluster

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150155A (en) * 2011-12-07 2013-06-12 金蝶软件(中国)有限公司 Data interception method and device
US20160077932A1 (en) * 2014-01-20 2016-03-17 International Business Machines Corporation High availability cache in server cluster
CN105005549A (en) * 2015-07-31 2015-10-28 山东蚁巡网络科技有限公司 User-defined chained log analysis device and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107872437A (en) * 2016-09-27 2018-04-03 阿里巴巴集团控股有限公司 A kind of method, apparatus and server for service request
CN107872437B (en) * 2016-09-27 2021-07-09 阿里巴巴集团控股有限公司 Method, device and server for service request
CN106855837A (en) * 2016-12-15 2017-06-16 咪咕文化科技有限公司 A kind of data processing method and device based on Flume
CN108829879A (en) * 2018-06-26 2018-11-16 天津城建大学 A kind of charging pile data monitoring method
CN109327336A (en) * 2018-10-10 2019-02-12 武汉思普崚技术有限公司 A large amount of Firewall Log data fast resolving method and apparatus
CN109327336B (en) * 2018-10-10 2022-04-26 武汉思普崚技术有限公司 Method and equipment for quickly analyzing large amount of firewall log data
CN109460412A (en) * 2018-11-14 2019-03-12 北京锐安科技有限公司 Data aggregation method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105930379A (en) Method and system for collecting log data by means of interceptor
CN103138989B (en) A kind of massive logs analyzes system and method
CN105868075A (en) System and method for monitoring and analyzing great deal of logs in real time
CN109800262A (en) Data share exchange method and system
CN107577771A (en) A kind of big data digging system
CN102298623A (en) Method for acquiring dialog list data
CN105630797B (en) Data processing method and system
CN104461742A (en) Method and device for optimizing computing equipment
CN103425771A (en) Method and device for excavating data regular expressions
CN102567488A (en) System and method for mining data of electric vehicle based on cloud computer framework
CN112084016B (en) Stream computing performance optimization system and method based on flink
CN109831316A (en) Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing
CN103810197A (en) Hadoop-based data processing method and system
CN104731852A (en) Big data system
CN102436501A (en) Parallel file managing system based on web
CN106354493B (en) A kind of implementation method for the development mode solving traditional software exploitation pain spot
CN101610459A (en) The automatically acquiring MMS content system and method
CN110149339A (en) A kind of method and system for realizing AWS API based on RESTful API
CN109739473A (en) The development approach of business interface
CN103970874A (en) Method and device for processing Hadoop files
CN110209722A (en) A kind of data-interface for data exchange
CN109359146A (en) A kind of automating ETL data processing tools and its application method
CN110515997A (en) A kind of big data intelligent analysis system
CN113268430A (en) CAN bus fuzzy test method based on data analysis
CN107943988B (en) Data splicing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160907