CN107590182A - A kind of distributed information log collection method - Google Patents

A kind of distributed information log collection method Download PDF

Info

Publication number
CN107590182A
CN107590182A CN201710654304.3A CN201710654304A CN107590182A CN 107590182 A CN107590182 A CN 107590182A CN 201710654304 A CN201710654304 A CN 201710654304A CN 107590182 A CN107590182 A CN 107590182A
Authority
CN
China
Prior art keywords
daily record
event
log
data
record data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710654304.3A
Other languages
Chinese (zh)
Other versions
CN107590182B (en
Inventor
向友君
何家成
张勰
朱叶
吴宗泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710654304.3A priority Critical patent/CN107590182B/en
Publication of CN107590182A publication Critical patent/CN107590182A/en
Application granted granted Critical
Publication of CN107590182B publication Critical patent/CN107590182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of distributed log collecting method, including:Daily record data is collected using double buffers;Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns;Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;Finally daily record data is read out.Compared to traditional logs collection method, this distributed log collecting method innovatively proposes the non-obstruction log collection scheme based on double buffers, the classification daily record forwarding scheme based on master/slave Reactor patterns, based on the compound twin-channel log systems of Flume NG and data cleansing platform scheme, so that system is higher to the collection efficiency of daily record, and the daily record data collected and land storage more conforms to the demand of analysis personnel, reduces the time cost of later data analysis.

Description

A kind of distributed information log collection method
Technical field
The present invention relates to big data to gather the log data acquisition link in field, more particularly to electric business industry, specifically It is related to a kind of distributed log collecting method.
Background technology
Recent decades, domestic economy flourish, with the improvement of living standards, the consuming capacity of people progressively strengthens, The channel of consumption is also progressively widened, particularly net purchase.
People browse commodity, collecting commodities, addition commodity to shopping cart, order payment, order when using electric business platform Evaluation, the behavioral data of such user individual, will be stored in platform service backstage in the form of User action log.In order to Consumer's Experience is lifted, is increased the volume of product sales, major electric business platform carries out data mining analysis by collecting user data, realizes essence Quasi- marketing, individual commodity recommendation is into the hot topic studied in recent years.The fast development of artificial intelligence and deep learning, also for Family behavioural analysis has widened channel with data mining.With the continuous growth of net purchase user, net purchase frequency and commodity amount, kind The increase of class, the also sharp increase of User action log yield, such as:The data volume that Taobao collects daily is up to 50TB how The user data of magnanimity is efficiently collected, is the significant concern point for carrying out big data analysis.And to monitoring the efficient utilization of daily record, It is then the important channel for improving electric business platform cluster service quality.
The content of the invention
The invention aims to solve in traditional electric business industry the drawbacks described above of the prior art on daily record data, A kind of distributed log collecting method is provided.
The purpose of the present invention can be reached by adopting the following technical scheme that:
A kind of distributed log collecting method, the described method comprises the following steps:
Operation system produces daily record data;
Daily record data is collected using double buffers;
Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;
Daily record data is forwarded using based on master/slave Reactor patterns;
Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
Daily record data is read out.
Further, it is described that the detailed process that daily record data is collected is included using double buffers:
Double buffering is the one of which pattern of caching design.Cached by rationally designing, the obstruction for solving business thread is asked Topic.
In the non-obstruction log collection scheme based on double buffers, before the producer of log information is referred to as in logic Platform, perform the daily record consumer that specific I/O operation is carried out and be referred to as backstage.In electric business platform log collection this application scenarios, Each business thread can be understood as a foreground, and daily record thread belongs to backstage.The program is distinguished for log collection front and back Prepare a buffering area, be referred to as buffering area A and buffering area B, during initialization, each two buffer stoppers of Buffer allocation, One master one is standby, and foreground host buffer block is used to receive current log data, and standby buffer stopper is awaited orders.Buffering area B daily record data is by rear Platform thread reads and reported, and this programme use condition variable is as mechanism synchronous between multithreading.Etc. staying in conditional-variable Backstage daily record reports thread, is waken up under two following situations:Front-end business thread writes full buffering area or the stand-by period surpasses Spend 5 seconds.Scheme main logic performs timeline as shown in Figure 2.
During initialization, four buffer stoppers B1, B2, B3, B4, each two of front and back are distributed.Front desk service thread was responsible for day Will data are written in buffering area A, and backstage daily record thread is responsible for the daily record data read in buffering area B and carries out actual I/O Operation, data are write into specific file descriptor.When buffering area is that space-time, rear end daily record thread etc. is stayed in conditional-variable.
As shown in Figure 2, the 3rd second when, foreground host buffer block B1 has write full, starts stand-by buffer block B2, full by having write Buffer stopper B1 submits to buffering area A containers.At the 4th second, buffer stopper B2 has also write full, conditional-variable trigger condition establishment, rear end Daily record thread is waken up from kernel, and buffering area A and buffering area B buffer stopper are swapped, and thus, in the 4+ seconds, has been write Two full buffer stoppers B1, B2 are handed in buffering area B, and backstage daily record thread is responsible for the daily record data of two buffer stoppers Reported, and be originally used for two buffer stoppers B3, B4 of sky, be handed over to buffering area A, used for business thread, it can be seen that, The follow-up daily record of foreground log collection work and backstage reports work to perform parallel completely, and front desk service thread will not be because of IO Operate and block.When data report completion, front and back is owned by two empty buffering areas, is consistent with original state.
Front and back thread manages two even more than buffer stopper by buffering area container so that buffer stopper scheduling is cleverer Living, the applicable scene of scheme is wider, for example to enter demand smaller when foreground log write, can be in A, B when collector journal speed is slower A pair of buffer stoppers are exchanged between buffering area.When a large amount of daily records of business thread write-in or rear end daily record report work in a short time When time-consuming longer, it can finally be discharged by business thread dynamically distributes buffer stopper by background thread.
To ensure that the data in buffering area are bound to export in limiting time, as described above, this programme is set in addition One time threshold is as rear end daily record thread wakening condition., can be according to practical business need in other business scenarios Seek the size for setting the time threshold.
Using double buffers, the program is decoupled service logic and specific I/O logics.Business thread need to be only responsible for Log information is generated, specifically write operation is carried out at what point in time without being concerned about, by asynchronous mode, ensures business processing stream The real-time of journey.Every message is sent to rear end compared to other common Log Collect Systems, based on the non-of double buffers Multiple log informations are combined into a big buffered data by obstruction log collection scheme, are disposably sent to rear end, are avoided frequently Rear end daily record thread is triggered, the effect similar to batch processing is realized, reduces overhead.By by journalizing logic from To be peeled off in the critical path of business processing flow, front end possesses non-obstruction, low latency, the performance of low overhead, and for rear end, Sufficiently large daily record handling capacity is ensure that on the premise of occupancy system fewer resource.
Further, it is described to give concurrent service request reasonable distribution to each service using LVS+Keepalived patterns The detailed process of node includes:
High Availabitity load balancing, such as accompanying drawing are realized using LVS (Linux Virtual Server)+Keepalived patterns Shown in 1.
Load balancing service sets at least two-server, and one is Master nodes, and other are Slave nodes, structure LVS Virtual Server Cluster systems, and in each load balancing node deployment Keepalived components, Master nodes with Cast communication is carried out by Virtual Router Redundacy Protocol VRRP between Slave nodes, Keepalived passes through its core protocol VRRP realizes the failure automatic switchover between the isolation of load balancing node failure and node.System by load balancing service, Service request is rationally distributed to each service node of Agent layer, improves system entirety concurrent request bearing capacity.
Further, the detailed process that described use is forwarded to daily record data based on master/slave Reactor patterns is such as Under:
Reactor patterns and Proactor patterns are all to handle the common design pattern of network concurrent, and Reactor is supported Multiple event sources are monitored by I/O multiplexings in a port, realize bottom-layer network monitoring, case distribution and upper layer application The affairs separation of specific processing logic.Concurrent event operation when primary Reactor patterns can be to low consumption keeps high-performance.By Substantially it is to belong to synchronous i/o in Reactor patterns, the processing of ready event set is serial process in internal system, and its is right In the time-consuming event action of height and do not apply to, exactly because its synchronous characteristic, Reactor in the high time-consuming operation of processing, The work of other parallel processings is may result in be blocked.Comparatively speaking, Proactor patterns are due to the spy of its asynchronous I/O Property, operated without waiting for the processing of specific asynchronous event, overcome the defects of above-mentioned Reactor patterns are present, can handle simultaneously Multiple service requests, it is that concurrent high time-consuming event handling scene suitably selects.
The characteristics of concurrent processing demand and existing I/O for electric business platform different priorities daily record multiplex model, The present invention devises the classification daily record forwarding scheme based on master/slave Reactor patterns.In a particular application with data forwarding service Hold the form of component to realize, while synchronous event disposal ability and the energy similar to the processing of Proactor patterns asynchronous event are provided Power.
This programme designs two layers of Reactor, and structure is as shown in Figure 3:
MainReactor (main Reactor) is located at first layer, is responsible for monitoring the new connection from each node of business cluster just Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority.
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB. SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system According to source.Herein by the design of SubReactor layers, ensure that scheme possesses real-time streaming data processing and height takes event handling Ability.
Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4:
SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in this scenario, difficult point It is the realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, with half Synchronously/half asynchronous thought is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing Ability.SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is asynchronous layer, queue respectively Layer, layer, sync.
(1) asynchronous layer is realized by reactor and event separator, the event of responsible Treated Base, in monitoring service system Readable event caused by the event source of connection is established.And bind readable event with corresponding event handler, it is put into task In queue.
(2) queue layer is realized by blocking queue, is safeguarded as task queue at one group of configured good pending event Reason task.
(3) layer, sync is realized by thread pool, is responsible for the parsing of processing daily record data, journal file caches and specific Daily record forwarding logic.
SubReactorB is by the way that specific event handling logic to be given to the thread pool positioned at layer, sync, all time-consuming operations It is unified to be all responsible for by the thread of thread pool, the asynchronism callback SubReactorB after the completion of event handling.So SubReactorB externally provides asynchronous performance.Similar to Proactor pattern thoughts, when receiving the different of event handling During step readjustment, illustrate that the event has handled completion.Herein by it is half synchronous/half asynchronous mode thought, with application layer software The form of design optimization, it instead of dependence of the asynchronous I/O for operating system basic-level support.
Classification daily record forwarding scheme based on master/slave Reactor patterns provides difference according to the priority of log information Data processing and forwarding scheme, externally provide the ability of synchronization and asynchronism and concurrency event handling, ensure that high priority simultaneously The real-time of daily record data forwarding and the high-throughput of low priority daily record data forwarding.Further, since this programme employ it is double The Reactor models of layer, also make it that this layer has very big autgmentability to data forwarding scheme in SubReactor, can be according to difference Business demand customize multiple SubReactor, be not limited to be made a distinction with daily record priority or Log Types, or The user-defined routing plan of person.
Further, the tool that described use is cleaned to daily record data, filtered based on the compound binary channels of Flume NG Body process is as follows:
This programme is designed based on the log system Flume NG that increase income.Daily record data is in log system this layer with event Form exist, basic data structure when event is Flume NG internal transmissions and processing.
Flume data flows are with event structure as shown in figure 5, event is made up of event header and event body two parts.
Data exist in the form of key-value pair (key-value) in event header, the number of key-value pair and the content of key-value pair It can be set by User Defined, be generally used for recording the attribute of this daily record, such as Log Source node IP, event creation time is stabbed, The information such as Log Source module name, logging level.
Event body is a byte arrays structure, the specifying information of log, when Log Source is journal file, the thing Part body is a character string for containing single file text information.
The setting of event header provides the daily record header of structuring, foundation is provided for data route, in later data Processing in data cleansing with having vital effect.
One complete Flume log system Collector layer, can be made up of one or more Agent, such as institute above State, Agent has three main components, Source, Channel and Sink.FlumeAgent data processings basic procedure such as Fig. 6 It is shown;
Configurableization real time data cleaning platform is designed based on Flume log systems, it is intended to exports full dose user's row For daily record while, according to the data cleansing of user configuration rules, specific data are landed and stored, for further analyzing Processing.As shown in Figure 7;
Data cleansing be carry out big data analysis before must be through process, cleaning quality and the close phase of Data Analysis Model effect Close.The model and algorithm of data cleansing, and one of existing hot topic studied both at home and abroad.This programme distributed information log system it On, real time data cleaning module is designed, real-time logs data are cleaned in the rule that acquisition phase customizes according to user, made The data sample that log system provides more conforms to the demand of data analyst.Usual data cleansing be related to missing values cleaning, The steps such as format content cleaning, logic error cleaning, the cleaning of non-demand data.Daily record cleaning assembly is designed herein LogCleanInterceptor, coordinate Flume multiplexing Channel selectors (MultiplexingChannelSelector) component, cleaning and the route processing of real time data, scheme logic such as Fig. 8 are realized It is shown.
1) Source components read in data from Log Source.
2) Source appoints write-in task to give Channel processors (ChannelProcessor).
3) event is passed to daily record cleaning assembly (LogCleanInterceptor) by Channel processors, the component from In configuration file read event cleaning rule, the event of matched rule is handled, extract event body related content, addition or Change event header.
4) event through over cleaning is passed to multiplexing Channel selectors by Channel processors (Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
5) Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific Channel.
Daily record cleaning of this programme based on Flume design High Availabitities, filtering scheme, ensure log transmission by transactional attribute During reliability, to ensure the efficiency of whole system data stream transmitting, solve upstream data writing rate and downstream data The operation such as cleaning, landing storage reading rate inconsistent the problem of bringing, the present invention answer in two kinds of primary passage top-level designs Binary channels CDual-Channel is closed, to provide sufficient buffer memory capacity while ensureing Channel floor height handling capacities.Pass through simultaneously Daily record cleaning assembly (LogCleanInterceptor) is designed, coordinates primary multiplexing Channel selectors, realizes Before daily record data landing storage, real-time logs stream is cleaned according to the rule of user configuration, makes whole electric business platform point The daily record data that cloth Log Collect System is collected more conforms to the demand of data analyst, reduce later data analysis when Between cost.
The present invention is had the following advantages relative to prior art and effect:
1st, the present invention carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided adopts Collect low delay, the performance of low overhead, solve common log system acquisition layer and operation system degree of coupling height, Maintenance Difficulty The problem of degree is big, daily record data buffering aggravates operation system load.
2nd, the present invention is with the classification daily record forwarding scheme based on master/slave Reactor patterns, there is provided supports high concurrent, height Handle up, the performance that daily record data transmission is highly reliable, reduce the coupling of operation system and log system, solve system buffer Layer supports high concurrent write-in and data to keep the key issue of high reliability forwarding.
3rd, the present invention is cleaned and filtered to daily record data with based on the compound binary channels of Flume NG, is ensureing system High-throughput is inconsistent with solving upstream data writing rate and downstream data Consumption rate on the premise of sufficient buffer capacity The problem of and reduce later data analysis time cost.
Brief description of the drawings
Fig. 1 is distributed information log collection method load balancing node topology figure disclosed by the invention;
Fig. 2 is the logic using double buffers log collection scheme of distributed information log collection method disclosed by the invention Perform time line chart;
Fig. 3 is being forwarded based on master/slave Reactor patterns daily record data for distributed information log collection method disclosed by the invention The master/slave Reactor patterns double-decker figure of scheme;
Fig. 4 is that the classification daily record based on master/slave Reactor patterns of distributed information log collection method disclosed by the invention turns Originating party case basic logic figure;
Fig. 5 is the Flume internal data flow graphs of distributed information log collection method disclosed by the invention;
Fig. 6 is the Flume Agent data processing basic flow sheets of distributed information log collection method disclosed by the invention;
Fig. 7 be distributed information log collection method disclosed by the invention based on the compound binary channels of Flume NG to daily record data The structure chart cleaned, filtered;
Fig. 8 is the data cleansing logic chart of distributed information log collection method disclosed by the invention;
Fig. 9 is the schematic diagram of distributed information log collection method disclosed by the invention;
Figure 10 is distributed information log collection method process step figure disclosed by the invention;
Figure 11 is the system framework figure that distributed information log collection method disclosed by the invention is applied to adaptation electric business platform;
Figure 12 is that distributed information log collection method disclosed by the invention is applied to adaptation electric business plateform system flow chart.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Embodiment one
Present embodiment discloses a kind of distributed log collecting method, process step figure is referring to the drawings shown in 10, by attached Figure 10 understands that the distributed log collecting method specifically includes following steps:
S1, operation system produce daily record data;
S2, using double buffers daily record data is collected;
In embodiment, described is collected specially using double buffers to daily record data:Daily record is grasped Make logic to peel off from the critical path of business processing flow, log collection work is completed by Lightweight component in service node Make, ensure the non-obstruction of business application under log collection scene, low latency and rear end daily record thread low overhead, high-throughput Performance.
In the non-obstruction log collection scheme based on double buffers, before the producer of log information is referred to as in logic Platform, perform the daily record consumer that specific I/O operation is carried out and be referred to as backstage.In electric business platform log collection this application scenarios, Each business thread can be understood as a foreground, and daily record thread belongs to backstage.The program is distinguished for log collection front and back Prepare a buffering area, be referred to as buffering area A and buffering area B, during initialization, each two buffer stoppers of Buffer allocation, One master one is standby, and foreground host buffer block is used to receive current log data, and standby buffer stopper is awaited orders.Buffering area B daily record data is by rear Platform thread reads and reported, and this programme use condition variable is as mechanism synchronous between multithreading.Etc. staying in conditional-variable Backstage daily record reports thread, is waken up under two following situations:Front-end business thread writes full buffering area or the stand-by period surpasses Spend 5 seconds.Scheme main logic performs timeline as shown in Figure 2.
Using double buffers, the program is decoupled service logic and specific I/O logics.Business thread need to be only responsible for Log information is generated, specifically write operation is carried out at what point in time without being concerned about, by asynchronous mode, ensures business processing stream The real-time of journey.Every message is sent to rear end compared to other common Log Collect Systems, based on the non-of double buffers Multiple log informations are combined into a big buffered data by obstruction log collection scheme, are disposably sent to rear end, are avoided frequently Rear end daily record thread is triggered, the effect similar to batch processing is realized, reduces overhead.By by journalizing logic from To be peeled off in the critical path of business processing flow, front end possesses non-obstruction, low latency, the performance of low overhead, and for rear end, Sufficiently large daily record handling capacity is ensure that on the premise of occupancy system fewer resource.
S3, using LVS+Keepalived patterns give concurrent service request reasonable distribution to each service node;
It is described to be given concurrent service request reasonable distribution using LVS+Keepalived patterns in embodiment Each service node is specially:
High Availabitity load balancing, such as accompanying drawing are realized using LVS (Linux Virtual Server)+Keepalived patterns Shown in 1.
Load balancing service sets at least two-server, and one is Master nodes, and other are Slave nodes, structure LVS Virtual Server Cluster systems, and in each load balancing node deployment Keepalived components, Master nodes with Cast communication is carried out by Virtual Router Redundacy Protocol VRRP between Slave nodes, Keepalived passes through its core protocol VRRP realizes the failure automatic switchover between the isolation of load balancing node failure and node.System by load balancing service, Service request is rationally distributed to each service node of Agent layer, improves system entirety concurrent request bearing capacity.
S4, using based on master/slave Reactor patterns classification forwarding is carried out to daily record data;
In embodiment, described use is based on master/slave Reactor patterns and classification forwarding tool is carried out to daily record data Body is:
By increasing agent node among common operation system clustered node and log system clustered node, by business System is full decoupled with log system, and operation system need to only be absorbed in realization and maintenance service logic, is carried for platform core business For quality services.For log information, operation system need to only be submitted to agent node, be responsible for day by agent node The caching of will and the distribution processor of different type daily record.
The characteristics of concurrent processing demand and existing I/O for electric business platform different priorities daily record multiplex model, The present invention devises the classification daily record forwarding scheme based on master/slave Reactor patterns.In a particular application with data forwarding service Hold the form of component to realize, while synchronous event disposal ability and the energy similar to the processing of Proactor patterns asynchronous event are provided Power.
This programme designs two layers of Reactor, and structure is as shown in Figure 3:
MainReactor (main Reactor) is located at first layer, is responsible for monitoring the new connection from each node of business cluster just Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority.
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB. SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system According to source.Herein by the design of SubReactor layers, ensure that scheme possesses real-time streaming data processing and height takes event handling Ability.
Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4:In the party SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in case, and difficult point is The realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, it is same with half / half asynchronous thought of step is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing Ability.SubReactorB asynchronous event concurrent processing internal logic using three-decker realize, be respectively asynchronous layer, queue layer, Layer, sync.
Classification daily record forwarding scheme based on master/slave Reactor patterns provides difference according to the priority of log information Data processing and forwarding scheme, externally provide the ability of synchronization and asynchronism and concurrency event handling, ensure that high priority simultaneously The real-time of daily record data forwarding and the high-throughput of low priority daily record data forwarding.Further, since this programme employ it is double The Reactor models of layer, also make it that this layer has very big autgmentability to data forwarding scheme in SubReactor, can be according to difference Business demand customize multiple SubReactor, be not limited to be made a distinction with daily record priority or Log Types, or The user-defined routing plan of person.
S5, using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
It is described that daily record data is cleaned based on Flume NG compound binary channels, filters tool in embodiment Body is:
S501, Source component read in data from Log Source.
S502, Source appoint write-in task to give Channel processors (ChannelProcessor).
Event is passed to daily record cleaning assembly (LogCleanInterceptor), the group by S503, Channel processor Part reads event cleaning rule from configuration file, and the event of matched rule is handled, and extracts event body related content, adds Add or change event header.
Event through over cleaning is passed to multiplexing Channel selectors by S504, Channel processor (Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
S505, Channel processor are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to Specific Channel.
Daily record cleaning of this programme based on Flume design High Availabitities, filtering scheme, ensure log transmission by transactional attribute During reliability, to ensure the efficiency of whole system data stream transmitting, solve upstream data writing rate and downstream data The operation such as cleaning, landing storage reading rate inconsistent the problem of bringing, the present invention answer in two kinds of primary passage top-level designs Binary channels CDual-Channel is closed, to provide sufficient buffer memory capacity while ensureing Channel floor height handling capacities.Pass through simultaneously Daily record cleaning assembly (LogCleanInterceptor) is designed, coordinates primary multiplexing Channel selectors, realizes Before daily record data landing storage, real-time logs stream is cleaned according to the rule of user configuration, makes whole electric business platform point The daily record data that cloth Log Collect System is collected more conforms to the demand of data analyst, reduce later data analysis when Between cost.
S6, daily record data is read out.
Embodiment two
A kind of distributed information log collection method is applied to specific electric business system by the present embodiment, i.e., is distributed for electric business platform The scene of formula clustered node log collection a, there is provided service node loads light, operation system and log system decoupling, daily record System easily extends highly reliable, the high-performance scheme with safeguarding.Accompanying drawing 10 gives the distributed information log collection method and carries out daily record The flow chart of collection, the step of illustrating whole log collection process, and accompanying drawing 11, Figure 12 are distribution proposed by the present invention respectively Log collecting method is applied to the system framework figure and system flow chart of specific electric business system, real in order to specifically introduce whole positioning Apply to be achieved by the following way and be described:
S1, electric business plateform system produce daily record data;
S2, using double buffers daily record data is collected;
S3, using LVS+Keepalived patterns give concurrent service request reasonable distribution to each service node;
S4, using based on master/slave Reactor patterns classification forwarding is carried out to daily record data;
S5, using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
It is described that daily record data is cleaned based on Flume NG compound binary channels, filters tool in embodiment Body is:
S501, Source component read in data from Log Source
S502, Source appoint write-in task to give Channel processors (ChannelProcessor).
Event is passed to daily record cleaning assembly (LogCleanInterceptor), the group by S503, Channel processor Part reads event cleaning rule from configuration file, and the event of matched rule is handled, and extracts event body related content, adds Add or change event header.
Event through over cleaning is passed to multiplexing Channel selectors by S504, Channel processor (Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
S505, Channel processor are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to Specific Channel.
S6, electric business platform are read out to daily record data.
So far the whole mistake by distributed information log collection method collector journal data of the present invention from electric business platform is realized Journey.
In summary, the embodiment is to perform stream using electric business plateform system workflow and distributed information log collection method The mode that journey combines comprehensively describes the process of electric business plateform system collector journal data in embodiment.This method distributed information log Collection method carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided collection is low to prolong When, the performance of low overhead;Scheme is forwarded with the classification daily record based on master/slave Reactor patterns, there is provided supports height simultaneously Hair, height are handled up, daily record data transmits highly reliable performance, reduce the coupling of operation system and log system;With based on The compound binary channels of Flume NG is cleaned and filtered to daily record data, is ensureing system high-throughput and sufficient buffer capacity Under the premise of, solve upstream data writing rate present in traditional electric business systematic collection daily record and downstream data Consumption rate not The problem of consistent, meanwhile, not only solve common log system acquisition layer and operation system degree of coupling are high, maintenance difficulties are big, Daily record data buffering aggravates the problem of operation system load and also solves the support high concurrent write-in of system buffer layer and data The key issue of high reliability forwarding is kept, so as to efficiently collect the user data of magnanimity, lifts the informationization of electric business industry Intelligent level and raising electric business platform cluster service quality, are combined with big data for electric business industry and provide new development point.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (10)

1. a kind of distributed information log collection method, it is characterised in that methods described comprises the following steps:
Operation system produces daily record data;
Daily record data is collected using double buffers;
Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;
Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns;
Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
Daily record data is read out.
2. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System is collected specially to daily record data:
Journalizing logic is peeled off from the critical path of business processing flow, it is complete by Lightweight component in service node Worked into log collection, ensure the non-obstruction of business application, low latency under log collection scene and rear end daily record thread is low opens Pin, the performance of high-throughput.
3. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System is collected specially to daily record data:
Service logic and specific I/O logics are decoupled using double buffers, business thread need to only be responsible for generation daily record and disappear Breath, is not related to and specifically carries out write operation at what point in time, by asynchronous mode, ensures the real-time of business processing flow.
4. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine System daily record data is collected it is middle multiple log informations are combined into a big buffered data, be disposably sent to rear end, keep away Exempt from frequently to trigger rear end.
5. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses LVS+ Concurrent service request reasonable distribution is specially by Keepalived patterns to each service node:
High Availabitity load balancing is realized using LVS+Keepalived patterns, load balancing service sets at least two-server, One is Master nodes, and other are Slave nodes, builds LVS Virtual Server Cluster systems, and in each load balancing section Point deployment Keepalived components, group is carried out between Master nodes and Slave nodes by Virtual Router Redundacy Protocol VRRP Communication is broadcast, Keepalived realizes the failure between the isolation of load balancing node failure and node by its core protocol VRRP Automatically switch.
A kind of 6. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/ Carrying out classification forwarding to daily record data from Reactor patterns is specially:
By increasing agent node among common operation system clustered node and log system clustered node, by operation system Full decoupled with log system, operation system is only absorbed in realization and maintenance service logic, is provided for platform core business high-quality Service, for log information, operation system need to only be submitted to agent node, be responsible for the slow of daily record by agent node Deposit and the distribution processor of different type daily record.
A kind of 7. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/ Carrying out classification forwarding to daily record data from Reactor patterns is specially:
Two layers of Reactor, respectively MainReactor and SubReactor are designed, wherein,
MainReactor is located at first layer, as main Reactor, is responsible for monitoring the new connection from each node of business cluster just Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority;
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB, wherein, SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system According to source.
A kind of 8. distributed information log collection method according to claim 7, it is characterised in that described SubReactorB By using the form of Reactor+ thread pools, so that partly synchronously/half asynchronous thought is designed, externally asynchronous event is provided simultaneously The ability of processing is sent out, wherein, SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is respectively Asynchronous layer, queue layer, layer, sync.
9. a kind of distributed information log collection method according to claim 1, it is characterised in that described use is based on The compound binary channels of Flume NG cleaned to daily record data, is filtered and is specifically included following steps:
Source components read in data from Log Source;
Source appoints write-in task to give Channel processors;
Event is passed to daily record cleaning assembly by Channel processors, and the component reads event cleaning rule from configuration file, The event of matched rule is handled, extracts event body related content, addition or modification event header;
Event through over cleaning is passed to multiplexing Channel selectors by Channel processors, and the component is from event header Event type information is obtained, the routing iinformation of event is obtained according to user configuration, the list of thing for adding routing iinformation is returned Give Channel processors;
Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific Channel。
10. a kind of distributed information log collection method according to claim 9, it is characterised in that described use is based on The compound binary channels of Flume NG is cleaned to daily record data, filtered specially:
Based on the compound twin-channel log systems of Flume NG and data cleaning method by designing the distributed information log of High Availabitity System, and the target that realization configurableization real time data is cleaned in log system, support to be advised according to customized data cleansing Then and routing rule, the daily record data after cleaning is exported into different memory nodes respectively.
CN201710654304.3A 2017-08-03 2017-08-03 Distributed log collection method Active CN107590182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710654304.3A CN107590182B (en) 2017-08-03 2017-08-03 Distributed log collection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710654304.3A CN107590182B (en) 2017-08-03 2017-08-03 Distributed log collection method

Publications (2)

Publication Number Publication Date
CN107590182A true CN107590182A (en) 2018-01-16
CN107590182B CN107590182B (en) 2020-06-19

Family

ID=61042096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710654304.3A Active CN107590182B (en) 2017-08-03 2017-08-03 Distributed log collection method

Country Status (1)

Country Link
CN (1) CN107590182B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280015A (en) * 2018-02-07 2018-07-13 福建星瑞格软件有限公司 Cluster server daily record real-time processing method based on big data and computer equipment
CN108319543A (en) * 2018-01-24 2018-07-24 广州江南科友科技股份有限公司 A kind of asynchronous processing method and its medium, system of computer log data
CN109815221A (en) * 2018-12-20 2019-05-28 中科曙光南京研究院有限公司 A kind of quasi real time stream data cleaning method and cleaning system
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN110377578A (en) * 2019-07-12 2019-10-25 苏州浪潮智能科技有限公司 A kind of data processing method and device based on improved Flume
CN110389933A (en) * 2019-07-01 2019-10-29 京信通信系统(中国)有限公司 Blog management method and device between a kind of process
CN110569112A (en) * 2019-09-12 2019-12-13 华云超融合科技有限公司 Log data writing method and object storage daemon device
CN110599243A (en) * 2019-09-03 2019-12-20 浩鲸云计算科技股份有限公司 Customer-oriented journey marketing method and system
CN111158876A (en) * 2019-12-26 2020-05-15 杭州安恒信息技术股份有限公司 Log processing method, device and equipment and computer readable storage medium
CN111290860A (en) * 2018-12-10 2020-06-16 中国移动通信集团四川有限公司 Data channel adjusting method, device, equipment and medium
CN112000583A (en) * 2020-09-17 2020-11-27 深圳市有方科技股份有限公司 Debugging information capturing method and device
WO2021036684A1 (en) * 2019-08-27 2021-03-04 深圳前海微众银行股份有限公司 Distributed data synchronization method, apparatus and device and readable storage medium
CN112882808A (en) * 2021-02-08 2021-06-01 上海弘积信息科技有限公司 Method for collecting and sending big data audit log of application delivery equipment
CN113032375A (en) * 2019-12-24 2021-06-25 广州如加网络科技有限公司 Data acquisition and aggregation method based on Flume
CN113111071A (en) * 2021-05-11 2021-07-13 星辰天合(北京)数据科技有限公司 Object processing method, device, nonvolatile storage medium and processor
CN114780348A (en) * 2022-04-28 2022-07-22 四川虹魔方网络科技有限公司 Method for asynchronously monitoring client operation log based on distributed deployment environment
CN115333800A (en) * 2022-07-27 2022-11-11 中国第一汽车股份有限公司 Vehicle-mounted vehicle-cloud integrated log collecting and analyzing method, vehicle and cloud server
CN117312101A (en) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 Method and device for determining structure log, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3752314A (en) * 1971-08-27 1973-08-14 Rust Eng Co Flume water recycling apparatus
CN103309767A (en) * 2012-03-08 2013-09-18 阿里巴巴集团控股有限公司 Method and device for processing client log
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN106549809A (en) * 2016-11-24 2017-03-29 成都广达新网科技股份有限公司 One kind realizes network management system equipment state acquisition methods and device
CN106874160A (en) * 2017-01-23 2017-06-20 上海斐讯数据通信技术有限公司 Log server and its management method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3752314A (en) * 1971-08-27 1973-08-14 Rust Eng Co Flume water recycling apparatus
CN103309767A (en) * 2012-03-08 2013-09-18 阿里巴巴集团控股有限公司 Method and device for processing client log
CN104036025A (en) * 2014-06-27 2014-09-10 蓝盾信息安全技术有限公司 Distribution-base mass log collection system
CN106549809A (en) * 2016-11-24 2017-03-29 成都广达新网科技股份有限公司 One kind realizes network management system equipment state acquisition methods and device
CN106874160A (en) * 2017-01-23 2017-06-20 上海斐讯数据通信技术有限公司 Log server and its management method

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319543A (en) * 2018-01-24 2018-07-24 广州江南科友科技股份有限公司 A kind of asynchronous processing method and its medium, system of computer log data
CN108280015A (en) * 2018-02-07 2018-07-13 福建星瑞格软件有限公司 Cluster server daily record real-time processing method based on big data and computer equipment
CN111290860A (en) * 2018-12-10 2020-06-16 中国移动通信集团四川有限公司 Data channel adjusting method, device, equipment and medium
CN111290860B (en) * 2018-12-10 2023-08-15 中国移动通信集团四川有限公司 Data channel adjusting method, device, equipment and medium
CN109815221A (en) * 2018-12-20 2019-05-28 中科曙光南京研究院有限公司 A kind of quasi real time stream data cleaning method and cleaning system
CN110298001A (en) * 2019-05-30 2019-10-01 北京奇艺世纪科技有限公司 The acquisition methods and device and computer readable storage medium of daily record data packet
CN110298001B (en) * 2019-05-30 2021-11-09 北京奇艺世纪科技有限公司 Method and device for acquiring log data packet and computer readable storage medium
CN110389933A (en) * 2019-07-01 2019-10-29 京信通信系统(中国)有限公司 Blog management method and device between a kind of process
CN110389933B (en) * 2019-07-01 2022-04-22 京信网络系统股份有限公司 Inter-process log management method and device
CN110377578A (en) * 2019-07-12 2019-10-25 苏州浪潮智能科技有限公司 A kind of data processing method and device based on improved Flume
CN110377578B (en) * 2019-07-12 2022-06-07 苏州浪潮智能科技有限公司 Improved Flume-based data processing method and device
WO2021036684A1 (en) * 2019-08-27 2021-03-04 深圳前海微众银行股份有限公司 Distributed data synchronization method, apparatus and device and readable storage medium
CN110599243A (en) * 2019-09-03 2019-12-20 浩鲸云计算科技股份有限公司 Customer-oriented journey marketing method and system
CN110569112A (en) * 2019-09-12 2019-12-13 华云超融合科技有限公司 Log data writing method and object storage daemon device
CN110569112B (en) * 2019-09-12 2022-04-08 江苏安超云软件有限公司 Log data writing method and object storage daemon device
CN113032375A (en) * 2019-12-24 2021-06-25 广州如加网络科技有限公司 Data acquisition and aggregation method based on Flume
CN111158876A (en) * 2019-12-26 2020-05-15 杭州安恒信息技术股份有限公司 Log processing method, device and equipment and computer readable storage medium
CN112000583A (en) * 2020-09-17 2020-11-27 深圳市有方科技股份有限公司 Debugging information capturing method and device
CN112882808B (en) * 2021-02-08 2023-10-24 上海弘积信息科技有限公司 Method for collecting and transmitting big data audit log of application delivery equipment
CN112882808A (en) * 2021-02-08 2021-06-01 上海弘积信息科技有限公司 Method for collecting and sending big data audit log of application delivery equipment
CN113111071A (en) * 2021-05-11 2021-07-13 星辰天合(北京)数据科技有限公司 Object processing method, device, nonvolatile storage medium and processor
CN113111071B (en) * 2021-05-11 2024-05-07 北京星辰天合科技股份有限公司 Object processing method, device, nonvolatile storage medium and processor
CN114780348A (en) * 2022-04-28 2022-07-22 四川虹魔方网络科技有限公司 Method for asynchronously monitoring client operation log based on distributed deployment environment
CN114780348B (en) * 2022-04-28 2023-02-07 四川虹魔方网络科技有限公司 Method for asynchronously monitoring client operation log based on distributed deployment environment
CN115333800A (en) * 2022-07-27 2022-11-11 中国第一汽车股份有限公司 Vehicle-mounted vehicle-cloud integrated log collecting and analyzing method, vehicle and cloud server
CN117312101A (en) * 2023-11-28 2023-12-29 苏州元脑智能科技有限公司 Method and device for determining structure log, storage medium and electronic equipment
CN117312101B (en) * 2023-11-28 2024-02-27 苏州元脑智能科技有限公司 Method and device for determining structure log, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN107590182B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN107590182A (en) A kind of distributed information log collection method
CN105959151B (en) A kind of Stream Processing system and method for High Availabitity
CN109492040B (en) System suitable for processing mass short message data in data center
CN101388844B (en) Data flow processing method and system
CN103024014B (en) By the method and system of the mass data distribution processor of message queue
CN104639374B (en) A kind of application deployment management system
WO2020215532A1 (en) System and method for data synchronization between heterogeneous databases, and storage medium
CN101519078B (en) Method for synchronizing multi-region data of comprehensive monitoring system
CN106339509A (en) Power grid operation data sharing system based on large data technology
CN103870297B (en) The performance data collection system and method for virtual machine in cloud computing environment
CN107679192A (en) More cluster synergistic data processing method, system, storage medium and equipment
CN107038162A (en) Real time data querying method and system based on database journal
CN107332719A (en) A kind of method that daily record is analyzed in real time in CDN system
CN104391930A (en) Distributed file storage device and method
CN105518641B (en) Point-to-Point Data clone method, equipment and system and host node switching method, equipment and system
CN107480251A (en) A kind of system for managing data access
CN107018042A (en) Method for tracing and tracing system for online service system
CN104104738A (en) FTP-based (file transfer protocol-based) data exchange system
CN109710731A (en) A kind of multidirectional processing system of data flow based on Flink
CN104778188A (en) Distributed device log collection method
CN103634411B (en) A kind of marketing data real time broadcasting system and method with state consistency
CN110266783A (en) A kind of railway CTC system communications platform based on DDS
CN109677465A (en) Distributed real-time systems framework for track traffic synthetic monitoring system
CN107733696A (en) A kind of machine learning and artificial intelligence application all-in-one dispositions method
CN106375480A (en) Electric energy data real-time acquisition system and method based on distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant