CN107590182A

CN107590182A - A kind of distributed information log collection method

Info

Publication number: CN107590182A
Application number: CN201710654304.3A
Authority: CN
Inventors: 向友君; 何家成; 张勰; 朱叶; 吴宗泽
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2018-01-16
Anticipated expiration: 2037-08-03
Also published as: CN107590182B

Abstract

The invention discloses a kind of distributed log collecting method, including：Daily record data is collected using double buffers；Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns；Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns；Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG；Finally daily record data is read out.Compared to traditional logs collection method, this distributed log collecting method innovatively proposes the non-obstruction log collection scheme based on double buffers, the classification daily record forwarding scheme based on master/slave Reactor patterns, based on the compound twin-channel log systems of Flume NG and data cleansing platform scheme, so that system is higher to the collection efficiency of daily record, and the daily record data collected and land storage more conforms to the demand of analysis personnel, reduces the time cost of later data analysis.

Description

A kind of distributed information log collection method

Technical field

The present invention relates to big data to gather the log data acquisition link in field, more particularly to electric business industry, specifically It is related to a kind of distributed log collecting method.

Background technology

Recent decades, domestic economy flourish, with the improvement of living standards, the consuming capacity of people progressively strengthens, The channel of consumption is also progressively widened, particularly net purchase.

People browse commodity, collecting commodities, addition commodity to shopping cart, order payment, order when using electric business platform Evaluation, the behavioral data of such user individual, will be stored in platform service backstage in the form of User action log.In order to Consumer's Experience is lifted, is increased the volume of product sales, major electric business platform carries out data mining analysis by collecting user data, realizes essence Quasi- marketing, individual commodity recommendation is into the hot topic studied in recent years.The fast development of artificial intelligence and deep learning, also for Family behavioural analysis has widened channel with data mining.With the continuous growth of net purchase user, net purchase frequency and commodity amount, kind The increase of class, the also sharp increase of User action log yield, such as：The data volume that Taobao collects daily is up to 50TB how The user data of magnanimity is efficiently collected, is the significant concern point for carrying out big data analysis.And to monitoring the efficient utilization of daily record, It is then the important channel for improving electric business platform cluster service quality.

The content of the invention

The invention aims to solve in traditional electric business industry the drawbacks described above of the prior art on daily record data, A kind of distributed log collecting method is provided.

The purpose of the present invention can be reached by adopting the following technical scheme that：

A kind of distributed log collecting method, the described method comprises the following steps：

Operation system produces daily record data；

Daily record data is collected using double buffers；

Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns；

Daily record data is forwarded using based on master/slave Reactor patterns；

Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG；

Daily record data is read out.

Further, it is described that the detailed process that daily record data is collected is included using double buffers：

Double buffering is the one of which pattern of caching design.Cached by rationally designing, the obstruction for solving business thread is asked Topic.

In the non-obstruction log collection scheme based on double buffers, before the producer of log information is referred to as in logic Platform, perform the daily record consumer that specific I/O operation is carried out and be referred to as backstage.In electric business platform log collection this application scenarios, Each business thread can be understood as a foreground, and daily record thread belongs to backstage.The program is distinguished for log collection front and back Prepare a buffering area, be referred to as buffering area A and buffering area B, during initialization, each two buffer stoppers of Buffer allocation, One master one is standby, and foreground host buffer block is used to receive current log data, and standby buffer stopper is awaited orders.Buffering area B daily record data is by rear Platform thread reads and reported, and this programme use condition variable is as mechanism synchronous between multithreading.Etc. staying in conditional-variable Backstage daily record reports thread, is waken up under two following situations：Front-end business thread writes full buffering area or the stand-by period surpasses Spend 5 seconds.Scheme main logic performs timeline as shown in Figure 2.

During initialization, four buffer stoppers B1, B2, B3, B4, each two of front and back are distributed.Front desk service thread was responsible for day Will data are written in buffering area A, and backstage daily record thread is responsible for the daily record data read in buffering area B and carries out actual I/O Operation, data are write into specific file descriptor.When buffering area is that space-time, rear end daily record thread etc. is stayed in conditional-variable.

As shown in Figure 2, the 3rd second when, foreground host buffer block B1 has write full, starts stand-by buffer block B2, full by having write Buffer stopper B1 submits to buffering area A containers.At the 4th second, buffer stopper B2 has also write full, conditional-variable trigger condition establishment, rear end Daily record thread is waken up from kernel, and buffering area A and buffering area B buffer stopper are swapped, and thus, in the 4+ seconds, has been write Two full buffer stoppers B1, B2 are handed in buffering area B, and backstage daily record thread is responsible for the daily record data of two buffer stoppers Reported, and be originally used for two buffer stoppers B3, B4 of sky, be handed over to buffering area A, used for business thread, it can be seen that, The follow-up daily record of foreground log collection work and backstage reports work to perform parallel completely, and front desk service thread will not be because of IO Operate and block.When data report completion, front and back is owned by two empty buffering areas, is consistent with original state.

Front and back thread manages two even more than buffer stopper by buffering area container so that buffer stopper scheduling is cleverer Living, the applicable scene of scheme is wider, for example to enter demand smaller when foreground log write, can be in A, B when collector journal speed is slower A pair of buffer stoppers are exchanged between buffering area.When a large amount of daily records of business thread write-in or rear end daily record report work in a short time When time-consuming longer, it can finally be discharged by business thread dynamically distributes buffer stopper by background thread.

To ensure that the data in buffering area are bound to export in limiting time, as described above, this programme is set in addition One time threshold is as rear end daily record thread wakening condition., can be according to practical business need in other business scenarios Seek the size for setting the time threshold.

Using double buffers, the program is decoupled service logic and specific I/O logics.Business thread need to be only responsible for Log information is generated, specifically write operation is carried out at what point in time without being concerned about, by asynchronous mode, ensures business processing stream The real-time of journey.Every message is sent to rear end compared to other common Log Collect Systems, based on the non-of double buffers Multiple log informations are combined into a big buffered data by obstruction log collection scheme, are disposably sent to rear end, are avoided frequently Rear end daily record thread is triggered, the effect similar to batch processing is realized, reduces overhead.By by journalizing logic from To be peeled off in the critical path of business processing flow, front end possesses non-obstruction, low latency, the performance of low overhead, and for rear end, Sufficiently large daily record handling capacity is ensure that on the premise of occupancy system fewer resource.

Further, it is described to give concurrent service request reasonable distribution to each service using LVS+Keepalived patterns The detailed process of node includes：

High Availabitity load balancing, such as accompanying drawing are realized using LVS (Linux Virtual Server)+Keepalived patterns Shown in 1.

Load balancing service sets at least two-server, and one is Master nodes, and other are Slave nodes, structure LVS Virtual Server Cluster systems, and in each load balancing node deployment Keepalived components, Master nodes with Cast communication is carried out by Virtual Router Redundacy Protocol VRRP between Slave nodes, Keepalived passes through its core protocol VRRP realizes the failure automatic switchover between the isolation of load balancing node failure and node.System by load balancing service, Service request is rationally distributed to each service node of Agent layer, improves system entirety concurrent request bearing capacity.

Further, the detailed process that described use is forwarded to daily record data based on master/slave Reactor patterns is such as Under：

Reactor patterns and Proactor patterns are all to handle the common design pattern of network concurrent, and Reactor is supported Multiple event sources are monitored by I/O multiplexings in a port, realize bottom-layer network monitoring, case distribution and upper layer application The affairs separation of specific processing logic.Concurrent event operation when primary Reactor patterns can be to low consumption keeps high-performance.By Substantially it is to belong to synchronous i/o in Reactor patterns, the processing of ready event set is serial process in internal system, and its is right In the time-consuming event action of height and do not apply to, exactly because its synchronous characteristic, Reactor in the high time-consuming operation of processing, The work of other parallel processings is may result in be blocked.Comparatively speaking, Proactor patterns are due to the spy of its asynchronous I/O Property, operated without waiting for the processing of specific asynchronous event, overcome the defects of above-mentioned Reactor patterns are present, can handle simultaneously Multiple service requests, it is that concurrent high time-consuming event handling scene suitably selects.

The characteristics of concurrent processing demand and existing I/O for electric business platform different priorities daily record multiplex model, The present invention devises the classification daily record forwarding scheme based on master/slave Reactor patterns.In a particular application with data forwarding service Hold the form of component to realize, while synchronous event disposal ability and the energy similar to the processing of Proactor patterns asynchronous event are provided Power.

This programme designs two layers of Reactor, and structure is as shown in Figure 3：

MainReactor (main Reactor) is located at first layer, is responsible for monitoring the new connection from each node of business cluster just Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority.

SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB. SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system According to source.Herein by the design of SubReactor layers, ensure that scheme possesses real-time streaming data processing and height takes event handling Ability.

Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4：

SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in this scenario, difficult point It is the realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, with half Synchronously/half asynchronous thought is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing Ability.SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is asynchronous layer, queue respectively Layer, layer, sync.

(1) asynchronous layer is realized by reactor and event separator, the event of responsible Treated Base, in monitoring service system Readable event caused by the event source of connection is established.And bind readable event with corresponding event handler, it is put into task In queue.

(2) queue layer is realized by blocking queue, is safeguarded as task queue at one group of configured good pending event Reason task.

(3) layer, sync is realized by thread pool, is responsible for the parsing of processing daily record data, journal file caches and specific Daily record forwarding logic.

SubReactorB is by the way that specific event handling logic to be given to the thread pool positioned at layer, sync, all time-consuming operations It is unified to be all responsible for by the thread of thread pool, the asynchronism callback SubReactorB after the completion of event handling.So SubReactorB externally provides asynchronous performance.Similar to Proactor pattern thoughts, when receiving the different of event handling During step readjustment, illustrate that the event has handled completion.Herein by it is half synchronous/half asynchronous mode thought, with application layer software The form of design optimization, it instead of dependence of the asynchronous I/O for operating system basic-level support.

Classification daily record forwarding scheme based on master/slave Reactor patterns provides difference according to the priority of log information Data processing and forwarding scheme, externally provide the ability of synchronization and asynchronism and concurrency event handling, ensure that high priority simultaneously The real-time of daily record data forwarding and the high-throughput of low priority daily record data forwarding.Further, since this programme employ it is double The Reactor models of layer, also make it that this layer has very big autgmentability to data forwarding scheme in SubReactor, can be according to difference Business demand customize multiple SubReactor, be not limited to be made a distinction with daily record priority or Log Types, or The user-defined routing plan of person.

Further, the tool that described use is cleaned to daily record data, filtered based on the compound binary channels of Flume NG Body process is as follows：

This programme is designed based on the log system Flume NG that increase income.Daily record data is in log system this layer with event Form exist, basic data structure when event is Flume NG internal transmissions and processing.

Flume data flows are with event structure as shown in figure 5, event is made up of event header and event body two parts.

Data exist in the form of key-value pair (key-value) in event header, the number of key-value pair and the content of key-value pair It can be set by User Defined, be generally used for recording the attribute of this daily record, such as Log Source node IP, event creation time is stabbed, The information such as Log Source module name, logging level.

Event body is a byte arrays structure, the specifying information of log, when Log Source is journal file, the thing Part body is a character string for containing single file text information.

The setting of event header provides the daily record header of structuring, foundation is provided for data route, in later data Processing in data cleansing with having vital effect.

One complete Flume log system Collector layer, can be made up of one or more Agent, such as institute above State, Agent has three main components, Source, Channel and Sink.FlumeAgent data processings basic procedure such as Fig. 6 It is shown；

Configurableization real time data cleaning platform is designed based on Flume log systems, it is intended to exports full dose user's row For daily record while, according to the data cleansing of user configuration rules, specific data are landed and stored, for further analyzing Processing.As shown in Figure 7；

Data cleansing be carry out big data analysis before must be through process, cleaning quality and the close phase of Data Analysis Model effect Close.The model and algorithm of data cleansing, and one of existing hot topic studied both at home and abroad.This programme distributed information log system it On, real time data cleaning module is designed, real-time logs data are cleaned in the rule that acquisition phase customizes according to user, made The data sample that log system provides more conforms to the demand of data analyst.Usual data cleansing be related to missing values cleaning, The steps such as format content cleaning, logic error cleaning, the cleaning of non-demand data.Daily record cleaning assembly is designed herein LogCleanInterceptor, coordinate Flume multiplexing Channel selectors (MultiplexingChannelSelector) component, cleaning and the route processing of real time data, scheme logic such as Fig. 8 are realized It is shown.

1) Source components read in data from Log Source.

2) Source appoints write-in task to give Channel processors (ChannelProcessor).

3) event is passed to daily record cleaning assembly (LogCleanInterceptor) by Channel processors, the component from In configuration file read event cleaning rule, the event of matched rule is handled, extract event body related content, addition or Change event header.

4) event through over cleaning is passed to multiplexing Channel selectors by Channel processors (Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.

5) Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific Channel.

Daily record cleaning of this programme based on Flume design High Availabitities, filtering scheme, ensure log transmission by transactional attribute During reliability, to ensure the efficiency of whole system data stream transmitting, solve upstream data writing rate and downstream data The operation such as cleaning, landing storage reading rate inconsistent the problem of bringing, the present invention answer in two kinds of primary passage top-level designs Binary channels CDual-Channel is closed, to provide sufficient buffer memory capacity while ensureing Channel floor height handling capacities.Pass through simultaneously Daily record cleaning assembly (LogCleanInterceptor) is designed, coordinates primary multiplexing Channel selectors, realizes Before daily record data landing storage, real-time logs stream is cleaned according to the rule of user configuration, makes whole electric business platform point The daily record data that cloth Log Collect System is collected more conforms to the demand of data analyst, reduce later data analysis when Between cost.

The present invention is had the following advantages relative to prior art and effect：

1st, the present invention carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided adopts Collect low delay, the performance of low overhead, solve common log system acquisition layer and operation system degree of coupling height, Maintenance Difficulty The problem of degree is big, daily record data buffering aggravates operation system load.

2nd, the present invention is with the classification daily record forwarding scheme based on master/slave Reactor patterns, there is provided supports high concurrent, height Handle up, the performance that daily record data transmission is highly reliable, reduce the coupling of operation system and log system, solve system buffer Layer supports high concurrent write-in and data to keep the key issue of high reliability forwarding.

3rd, the present invention is cleaned and filtered to daily record data with based on the compound binary channels of Flume NG, is ensureing system High-throughput is inconsistent with solving upstream data writing rate and downstream data Consumption rate on the premise of sufficient buffer capacity The problem of and reduce later data analysis time cost.

Brief description of the drawings

Fig. 1 is distributed information log collection method load balancing node topology figure disclosed by the invention；

Fig. 2 is the logic using double buffers log collection scheme of distributed information log collection method disclosed by the invention Perform time line chart；

Fig. 3 is being forwarded based on master/slave Reactor patterns daily record data for distributed information log collection method disclosed by the invention The master/slave Reactor patterns double-decker figure of scheme；

Fig. 4 is that the classification daily record based on master/slave Reactor patterns of distributed information log collection method disclosed by the invention turns Originating party case basic logic figure；

Fig. 5 is the Flume internal data flow graphs of distributed information log collection method disclosed by the invention；

Fig. 6 is the Flume Agent data processing basic flow sheets of distributed information log collection method disclosed by the invention；

Fig. 7 be distributed information log collection method disclosed by the invention based on the compound binary channels of Flume NG to daily record data The structure chart cleaned, filtered；

Fig. 8 is the data cleansing logic chart of distributed information log collection method disclosed by the invention；

Fig. 9 is the schematic diagram of distributed information log collection method disclosed by the invention；

Figure 10 is distributed information log collection method process step figure disclosed by the invention；

Figure 11 is the system framework figure that distributed information log collection method disclosed by the invention is applied to adaptation electric business platform；

Figure 12 is that distributed information log collection method disclosed by the invention is applied to adaptation electric business plateform system flow chart.

Embodiment

To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.

Embodiment one

Present embodiment discloses a kind of distributed log collecting method, process step figure is referring to the drawings shown in 10, by attached Figure 10 understands that the distributed log collecting method specifically includes following steps：

S1, operation system produce daily record data；

S2, using double buffers daily record data is collected；

In embodiment, described is collected specially using double buffers to daily record data：Daily record is grasped Make logic to peel off from the critical path of business processing flow, log collection work is completed by Lightweight component in service node Make, ensure the non-obstruction of business application under log collection scene, low latency and rear end daily record thread low overhead, high-throughput Performance.

S3, using LVS+Keepalived patterns give concurrent service request reasonable distribution to each service node；

It is described to be given concurrent service request reasonable distribution using LVS+Keepalived patterns in embodiment Each service node is specially：

S4, using based on master/slave Reactor patterns classification forwarding is carried out to daily record data；

In embodiment, described use is based on master/slave Reactor patterns and classification forwarding tool is carried out to daily record data Body is：

By increasing agent node among common operation system clustered node and log system clustered node, by business System is full decoupled with log system, and operation system need to only be absorbed in realization and maintenance service logic, is carried for platform core business For quality services.For log information, operation system need to only be submitted to agent node, be responsible for day by agent node The caching of will and the distribution processor of different type daily record.

Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4：In the party SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in case, and difficult point is The realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, it is same with half / half asynchronous thought of step is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing Ability.SubReactorB asynchronous event concurrent processing internal logic using three-decker realize, be respectively asynchronous layer, queue layer, Layer, sync.

S5, using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG；

It is described that daily record data is cleaned based on Flume NG compound binary channels, filters tool in embodiment Body is：

S501, Source component read in data from Log Source.

S502, Source appoint write-in task to give Channel processors (ChannelProcessor).

Event is passed to daily record cleaning assembly (LogCleanInterceptor), the group by S503, Channel processor Part reads event cleaning rule from configuration file, and the event of matched rule is handled, and extracts event body related content, adds Add or change event header.

Event through over cleaning is passed to multiplexing Channel selectors by S504, Channel processor (Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.

S505, Channel processor are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to Specific Channel.

S6, daily record data is read out.

Embodiment two

A kind of distributed information log collection method is applied to specific electric business system by the present embodiment, i.e., is distributed for electric business platform The scene of formula clustered node log collection a, there is provided service node loads light, operation system and log system decoupling, daily record System easily extends highly reliable, the high-performance scheme with safeguarding.Accompanying drawing 10 gives the distributed information log collection method and carries out daily record The flow chart of collection, the step of illustrating whole log collection process, and accompanying drawing 11, Figure 12 are distribution proposed by the present invention respectively Log collecting method is applied to the system framework figure and system flow chart of specific electric business system, real in order to specifically introduce whole positioning Apply to be achieved by the following way and be described：

S1, electric business plateform system produce daily record data；

S2, using double buffers daily record data is collected；

S501, Source component read in data from Log Source

S6, electric business platform are read out to daily record data.

So far the whole mistake by distributed information log collection method collector journal data of the present invention from electric business platform is realized Journey.

In summary, the embodiment is to perform stream using electric business plateform system workflow and distributed information log collection method The mode that journey combines comprehensively describes the process of electric business plateform system collector journal data in embodiment.This method distributed information log Collection method carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided collection is low to prolong When, the performance of low overhead；Scheme is forwarded with the classification daily record based on master/slave Reactor patterns, there is provided supports height simultaneously Hair, height are handled up, daily record data transmits highly reliable performance, reduce the coupling of operation system and log system；With based on The compound binary channels of Flume NG is cleaned and filtered to daily record data, is ensureing system high-throughput and sufficient buffer capacity Under the premise of, solve upstream data writing rate present in traditional electric business systematic collection daily record and downstream data Consumption rate not The problem of consistent, meanwhile, not only solve common log system acquisition layer and operation system degree of coupling are high, maintenance difficulties are big, Daily record data buffering aggravates the problem of operation system load and also solves the support high concurrent write-in of system buffer layer and data The key issue of high reliability forwarding is kept, so as to efficiently collect the user data of magnanimity, lifts the informationization of electric business industry Intelligent level and raising electric business platform cluster service quality, are combined with big data for electric business industry and provide new development point.

Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims

1. a kind of distributed information log collection method, it is characterised in that methods described comprises the following steps：

Operation system produces daily record data；

Daily record data is collected using double buffers；

Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns；

Daily record data is read out.

2. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System is collected specially to daily record data：

Journalizing logic is peeled off from the critical path of business processing flow, it is complete by Lightweight component in service node Worked into log collection, ensure the non-obstruction of business application, low latency under log collection scene and rear end daily record thread is low opens Pin, the performance of high-throughput.

3. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System is collected specially to daily record data：

Service logic and specific I/O logics are decoupled using double buffers, business thread need to only be responsible for generation daily record and disappear Breath, is not related to and specifically carries out write operation at what point in time, by asynchronous mode, ensures the real-time of business processing flow.

4. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System daily record data is collected it is middle multiple log informations are combined into a big buffered data, be disposably sent to rear end, keep away Exempt from frequently to trigger rear end.

5. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses LVS+ Concurrent service request reasonable distribution is specially by Keepalived patterns to each service node：

High Availabitity load balancing is realized using LVS+Keepalived patterns, load balancing service sets at least two-server, One is Master nodes, and other are Slave nodes, builds LVS Virtual Server Cluster systems, and in each load balancing section Point deployment Keepalived components, group is carried out between Master nodes and Slave nodes by Virtual Router Redundacy Protocol VRRP Communication is broadcast, Keepalived realizes the failure between the isolation of load balancing node failure and node by its core protocol VRRP Automatically switch.

A kind of 6. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/ Carrying out classification forwarding to daily record data from Reactor patterns is specially：

By increasing agent node among common operation system clustered node and log system clustered node, by operation system Full decoupled with log system, operation system is only absorbed in realization and maintenance service logic, is provided for platform core business high-quality Service, for log information, operation system need to only be submitted to agent node, be responsible for the slow of daily record by agent node Deposit and the distribution processor of different type daily record.

A kind of 7. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/ Carrying out classification forwarding to daily record data from Reactor patterns is specially：

Two layers of Reactor, respectively MainReactor and SubReactor are designed, wherein,

MainReactor is located at first layer, as main Reactor, is responsible for monitoring the new connection from each node of business cluster just Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority；

SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB, wherein, SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system According to source.

A kind of 8. distributed information log collection method according to claim 7, it is characterised in that described SubReactorB By using the form of Reactor+ thread pools, so that partly synchronously/half asynchronous thought is designed, externally asynchronous event is provided simultaneously The ability of processing is sent out, wherein, SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is respectively Asynchronous layer, queue layer, layer, sync.

9. a kind of distributed information log collection method according to claim 1, it is characterised in that described use is based on The compound binary channels of Flume NG cleaned to daily record data, is filtered and is specifically included following steps：

Source components read in data from Log Source；

Source appoints write-in task to give Channel processors；

Event is passed to daily record cleaning assembly by Channel processors, and the component reads event cleaning rule from configuration file, The event of matched rule is handled, extracts event body related content, addition or modification event header；

Event through over cleaning is passed to multiplexing Channel selectors by Channel processors, and the component is from event header Event type information is obtained, the routing iinformation of event is obtained according to user configuration, the list of thing for adding routing iinformation is returned Give Channel processors；

Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific Channel。

10. a kind of distributed information log collection method according to claim 9, it is characterised in that described use is based on The compound binary channels of Flume NG is cleaned to daily record data, filtered specially：

Based on the compound twin-channel log systems of Flume NG and data cleaning method by designing the distributed information log of High Availabitity System, and the target that realization configurableization real time data is cleaned in log system, support to be advised according to customized data cleansing Then and routing rule, the daily record data after cleaning is exported into different memory nodes respectively.