CN105930379A - Method and system for collecting log data by means of interceptor - Google Patents
Method and system for collecting log data by means of interceptor Download PDFInfo
- Publication number
- CN105930379A CN105930379A CN201610230525.3A CN201610230525A CN105930379A CN 105930379 A CN105930379 A CN 105930379A CN 201610230525 A CN201610230525 A CN 201610230525A CN 105930379 A CN105930379 A CN 105930379A
- Authority
- CN
- China
- Prior art keywords
- blocker
- module
- relevant information
- data
- log data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/144—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
Abstract
The invention relates to a method and system for collecting log data by means of an interceptor. The method comprises following steps: S1. an interceptor is established in a customized mode and is placed under a Flume root directory; S2. data is received by the interceptor and unwanted data is filtered to obtain log data which needs to be collected. A large application system usually generates log data having dozens of or even hundreds of fields; when the Flume built-in interceptor is used for collecting logs, the collected logs still have corresponding so many fields; but basically, we do not need pay attention to many of the fields which are of no actual utility value so that data transmission tardiness and storage consumption are caused. According to the method and system, a Flume interceptor is customized so that the fields we want are extracted and some fields can be encrypted. Therefore, data transmission quantity and storage consumption are reduced.
Description
Technical field
The present invention relates to a kind of method and system utilizing blocker to carry out collection of log data, belong to calculating
Machine software field.
Background technology
Flume is the High Availabitity that Cloudera provides, highly reliable, distributed massive logs
The system gather, being polymerized and transmit, Flume supports to customize Various types of data sender in log system,
For collecting data;Meanwhile, Flume provides and data carries out simple process, and writes various data and connect
The ability of recipient's (customizable).Flume has the various blocker carried, such as:
TimestampInterceptor、HostInterceptor、RegexExtractorInterceptor
Deng, by using different blockers, it is achieved different functions.But the above blocker, and
The content of original daily record data can not be changed, when a log information has tens the most up to a hundred fields
Time, under traditional Flume processes, the daily record collected still has corresponding so many fields.
Summary of the invention
The technical problem to be solved is, according to the demand of practical business, in order to better meet
Data are in the process of application layer, it is provided that a kind of by self-defined Flume blocker, filter out unwanted
Field, and field encryption is processed, source data is carried out pretreatment.Decrease the transmission quantity of data, fall
The method and system utilizing blocker to carry out collection of log data of the expense of low storage.
The technical scheme is that one utilizes blocker to carry out daily record number
According to the method collected, specifically include following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection
Daily record data.
The invention has the beneficial effects as follows: the daily record data often produced in a big application system has several
Ten the most up to a hundred fields, when the blocker using Flume to carry carries out log collection, collect
Daily record still has corresponding so many fields, but substantially having a lot of field is that we need not close
Note, there is no actual value, so bring the slow expense with storage of transmission performance of data.Logical
Cross self-defined Flume blocker, the field that we need can be extracted, and certain field is added
Close process.Decrease the transmission quantity of data, reduce the expense of storage.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration
Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete
Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
Further, described relevant information includes regular expression, the separator of every a line interfield, required
Will the separator that uses of the subscript of row field and multiple subscript.
Further, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out
Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing
Put blocker interface.
Further, relevant information is processed as by relevant information by described step 1.2.2
Unicode code conversion is character string.
Further, described process logic includes single process and batch processing.
Further, described step 2 filters out unwanted field, and field is encrypted.
Use above-mentioned further scheme to provide the benefit that, decrease the transmission quantity of data, and reduce and deposit
The expense of storage.
The technical scheme is that one utilizes blocker to carry out daily record number
According to the system collected, including blocker module and data filtering module;
Described blocker module is used for one blocker of self-defined structure, and blocker is put into the root of Flume
Under catalogue;
Described data filtering module is used for controlling described blocker and receives data and filter out unwanted number
According to, obtain the daily record data needing to collect.
On the basis of technique scheme, the present invention can also do following improvement.
Further, described blocker module include class definition module, configuration module, parameter setting module and
Catalogue module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root
According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins
Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
Further, described relevant information includes regular expression, the separator of every a line interfield, required
Will the separator that uses of the subscript of row field and multiple subscript.
Accompanying drawing explanation
Fig. 1 is a kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 1
Flow chart;
Fig. 2 is a kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 1
Structured flowchart.
In accompanying drawing, the list of parts representated by each label is as follows:
1, blocker module, 2, data filtering module.
Detailed description of the invention
Being described principle and the feature of the present invention below in conjunction with accompanying drawing, example is served only for explaining this
Invention, is not intended to limit the scope of the present invention.
As it is shown in figure 1, utilize blocker to carry out daily record data receipts for the one described in the embodiment of the present invention 1
The method of collection, specifically includes following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection
Daily record data.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 2, in reality
On the basis of executing example 1, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration
Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete
Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 3, in reality
On the basis of executing example 2, described relevant information include regular expression, the separator of every a line interfield,
The separator etc. that the subscript of required row field and multiple subscript use.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 4, in reality
On the basis of executing example 3, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out
Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing
Put blocker interface.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 5, in reality
On the basis of executing example 4, relevant information is processed as by relevant information by described step 1.2.2
Unicode coding carries out being converted to character string.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 6, in reality
On the basis of executing example 4 or 5, described process logic includes single process and batch processing.
A kind of method utilizing blocker to carry out collection of log data described in the embodiment of the present invention 7, in reality
On the basis of executing any one of example 1-6, described step 2 filters out unwanted field, and field is entered
Row encryption.
As in figure 2 it is shown, utilize blocker to carry out daily record data receipts for the one described in the embodiment of the present invention 1
The system of collection, including blocker module 1 and data filtering module 2;
Blocker, for one blocker of self-defined structure, is put into Flume's by described blocker module 1
Under root;
Described data filtering module 2 is used for controlling described blocker and receives data and filter out unwanted number
According to, obtain the daily record data needing to collect.
A kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 2, in reality
On the basis of executing example 1, described blocker module 1 includes that class definition module, configuration module, parameter are arranged
Module and catalogue module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root
According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins
Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
A kind of system utilizing blocker to carry out collection of log data described in the embodiment of the present invention 3, in reality
On the basis of executing example 2, described relevant information include regular expression, the separator of every a line interfield,
The separator etc. that the subscript of required row field and multiple subscript use.
The technical program core includes two parts:
1) java code is write, self-defined blocker: specifically include following steps:
A) one class CustomInterceptor of definition realizes Interceptor interface.
B) at CustomInterceptor apoplexy due to endogenous wind defined variable, these variablees need Flume
Configuration file carries out what configuration used.Configuration regular expression (regex), every a line interfield
Separator (fields_separator), by separators after, the subscript of required row field
(indexs) separator (indexs_separator), multiple subscript that, multiple subscripts use use
Separator (indexs_separator).
That c) adds CustomInterceptor has ginseng building method.And to corresponding variable
Reason.Carry out being converted to character string by the unicode being transmitted through in configuration file coding.
D) writing concrete logic intercept () method to be processed, one is single process, one
It it is batch processing.
E) an internal interface Builder defined in interface, in configure method, carries out one
A little parameter configuration.And be given, when not configuring some parameters in the conf of flume, provide its acquiescence
Value.By its builder method, return an interceptor object.
F) by above step, the code development of self-defined blocker completes, is then packaged into jar,
It is put in the lib under the root of Flume
2) configuration information of amendment Flume:
Enter into the conf under the installation directory of Flume, configure source, channel, sink, and
Self-defining blocker is quoted by source, to carrying out at the variable defined in code before
Configuration.
Self-defined blocker exploitation completed is packaged into jar, is put into the lib under the root of flume
In, enter into the conf under the installation directory of Flume, configure source, channel, sink, and
Self-defining blocker is quoted by source, to carrying out at the variable defined in code before
Configuration.It is achieved in that the exploitation of the self-defined blocker of whole Flume.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention
Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this
Within bright protection domain.
Claims (10)
1. one kind utilizes the method that blocker carries out collection of log data, it is characterised in that specifically include
Following steps:
Step 1: one blocker of self-defined structure, is put into blocker under the root of Flume;
Step 2: described blocker receives data and filters out unwanted data, obtains needing collection
Daily record data.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 1, its
Being characterised by, described step 1 specifically includes following steps:
Step 1.1: define a self-defined blocker class;
Step 1.2: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information, according to configuration
Relevant information blocker interface is set;
Step 1.3: at internal interface generator defined in blocker interface, line parameter of going forward side by side configures, complete
Become the setting of blocker;
Step 1.4: blocker is put under the root of Flume.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 2, its
Being characterised by, described relevant information includes regular expression, the separator of every a line interfield, required
The separator that the subscript of row field and multiple subscript use.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 3, its
Being characterised by, described step 1.2 specifically includes following steps:
Step 1.2.1: at self-defined blocker apoplexy due to endogenous wind defined variable, and configure relevant information;
Step 1.2.2: be added with ginseng structured approach at self-defined blocker apoplexy due to endogenous wind, and relevant information is carried out
Process;
Step 1.2.3: arrange process logic, sets according to according to the relevant information after processing logic and processing
Put blocker interface.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 4, its
It is characterised by, relevant information is processed as by the unicode in relevant information by described step 1.2.2
Code conversion is character string.
6. according to a kind of side utilizing blocker to carry out collection of log data described in claim 4 or 5
Method, it is characterised in that described process logic includes single process and batch processing.
A kind of method utilizing blocker to carry out collection of log data the most according to claim 1, its
It is characterised by, described step 2 filters out unwanted field, and field is encrypted.
8. one kind utilizes the system that blocker carries out collection of log data, it is characterised in that include blocker
Module and data filtering module;
Described blocker module is used for one blocker of self-defined structure, and blocker is put into the root of Flume
Under catalogue;
Described data filtering module is used for controlling described blocker and receives data and filter out unwanted number
According to, obtain the daily record data needing to collect.
A kind of system utilizing blocker to carry out collection of log data the most according to claim 8, its
Being characterised by, described blocker module includes class definition module, configuration module, parameter setting module and mesh
Record module;
Described class definition module is for one self-defined blocker class of definition;
Described configuration module is used at self-defined blocker apoplexy due to endogenous wind defined variable, and configures relevant information, root
According to the relevant information of configuration, blocker interface is set;
Described parameter setting module is used at internal interface generator defined in blocker interface, and joins
Number configuration, completes the setting of blocker;
Described catalogue module is under the root that blocker is put into Flume.
The most according to claim 9 a kind of utilize that blocker carries out collection of log data be
System, it is characterised in that described relevant information include regular expression, the separator of every a line interfield,
The separator that the subscript of required row field and multiple subscript use.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610230525.3A CN105930379A (en) | 2016-04-14 | 2016-04-14 | Method and system for collecting log data by means of interceptor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610230525.3A CN105930379A (en) | 2016-04-14 | 2016-04-14 | Method and system for collecting log data by means of interceptor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105930379A true CN105930379A (en) | 2016-09-07 |
Family
ID=56838120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610230525.3A Pending CN105930379A (en) | 2016-04-14 | 2016-04-14 | Method and system for collecting log data by means of interceptor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105930379A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106855837A (en) * | 2016-12-15 | 2017-06-16 | 咪咕文化科技有限公司 | A kind of data processing method and device based on Flume |
CN107872437A (en) * | 2016-09-27 | 2018-04-03 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and server for service request |
CN108829879A (en) * | 2018-06-26 | 2018-11-16 | 天津城建大学 | A kind of charging pile data monitoring method |
CN109327336A (en) * | 2018-10-10 | 2019-02-12 | 武汉思普崚技术有限公司 | A large amount of Firewall Log data fast resolving method and apparatus |
CN109460412A (en) * | 2018-11-14 | 2019-03-12 | 北京锐安科技有限公司 | Data aggregation method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150155A (en) * | 2011-12-07 | 2013-06-12 | 金蝶软件(中国)有限公司 | Data interception method and device |
CN105005549A (en) * | 2015-07-31 | 2015-10-28 | 山东蚁巡网络科技有限公司 | User-defined chained log analysis device and method |
US20160077932A1 (en) * | 2014-01-20 | 2016-03-17 | International Business Machines Corporation | High availability cache in server cluster |
-
2016
- 2016-04-14 CN CN201610230525.3A patent/CN105930379A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150155A (en) * | 2011-12-07 | 2013-06-12 | 金蝶软件(中国)有限公司 | Data interception method and device |
US20160077932A1 (en) * | 2014-01-20 | 2016-03-17 | International Business Machines Corporation | High availability cache in server cluster |
CN105005549A (en) * | 2015-07-31 | 2015-10-28 | 山东蚁巡网络科技有限公司 | User-defined chained log analysis device and method |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107872437A (en) * | 2016-09-27 | 2018-04-03 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and server for service request |
CN107872437B (en) * | 2016-09-27 | 2021-07-09 | 阿里巴巴集团控股有限公司 | Method, device and server for service request |
CN106855837A (en) * | 2016-12-15 | 2017-06-16 | 咪咕文化科技有限公司 | A kind of data processing method and device based on Flume |
CN108829879A (en) * | 2018-06-26 | 2018-11-16 | 天津城建大学 | A kind of charging pile data monitoring method |
CN109327336A (en) * | 2018-10-10 | 2019-02-12 | 武汉思普崚技术有限公司 | A large amount of Firewall Log data fast resolving method and apparatus |
CN109327336B (en) * | 2018-10-10 | 2022-04-26 | 武汉思普崚技术有限公司 | Method and equipment for quickly analyzing large amount of firewall log data |
CN109460412A (en) * | 2018-11-14 | 2019-03-12 | 北京锐安科技有限公司 | Data aggregation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105930379A (en) | Method and system for collecting log data by means of interceptor | |
CN103138989B (en) | A kind of massive logs analyzes system and method | |
CN105868075A (en) | System and method for monitoring and analyzing great deal of logs in real time | |
CN109800262A (en) | Data share exchange method and system | |
CN107577771A (en) | A kind of big data digging system | |
CN102298623A (en) | Method for acquiring dialog list data | |
CN105630797B (en) | Data processing method and system | |
CN104461742A (en) | Method and device for optimizing computing equipment | |
CN103425771A (en) | Method and device for excavating data regular expressions | |
CN102567488A (en) | System and method for mining data of electric vehicle based on cloud computer framework | |
CN112084016B (en) | Stream computing performance optimization system and method based on flink | |
CN109831316A (en) | Massive logs real-time analyzer, real-time analysis method and readable storage medium storing program for executing | |
CN103810197A (en) | Hadoop-based data processing method and system | |
CN104731852A (en) | Big data system | |
CN102436501A (en) | Parallel file managing system based on web | |
CN106354493B (en) | A kind of implementation method for the development mode solving traditional software exploitation pain spot | |
CN101610459A (en) | The automatically acquiring MMS content system and method | |
CN110149339A (en) | A kind of method and system for realizing AWS API based on RESTful API | |
CN109739473A (en) | The development approach of business interface | |
CN103970874A (en) | Method and device for processing Hadoop files | |
CN110209722A (en) | A kind of data-interface for data exchange | |
CN109359146A (en) | A kind of automating ETL data processing tools and its application method | |
CN110515997A (en) | A kind of big data intelligent analysis system | |
CN113268430A (en) | CAN bus fuzzy test method based on data analysis | |
CN107943988B (en) | Data splicing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160907 |