CN107590182A - A kind of distributed information log collection method - Google Patents
A kind of distributed information log collection method Download PDFInfo
- Publication number
- CN107590182A CN107590182A CN201710654304.3A CN201710654304A CN107590182A CN 107590182 A CN107590182 A CN 107590182A CN 201710654304 A CN201710654304 A CN 201710654304A CN 107590182 A CN107590182 A CN 107590182A
- Authority
- CN
- China
- Prior art keywords
- daily record
- event
- log
- data
- record data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a kind of distributed log collecting method, including:Daily record data is collected using double buffers;Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns;Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;Finally daily record data is read out.Compared to traditional logs collection method, this distributed log collecting method innovatively proposes the non-obstruction log collection scheme based on double buffers, the classification daily record forwarding scheme based on master/slave Reactor patterns, based on the compound twin-channel log systems of Flume NG and data cleansing platform scheme, so that system is higher to the collection efficiency of daily record, and the daily record data collected and land storage more conforms to the demand of analysis personnel, reduces the time cost of later data analysis.
Description
Technical field
The present invention relates to big data to gather the log data acquisition link in field, more particularly to electric business industry, specifically
It is related to a kind of distributed log collecting method.
Background technology
Recent decades, domestic economy flourish, with the improvement of living standards, the consuming capacity of people progressively strengthens,
The channel of consumption is also progressively widened, particularly net purchase.
People browse commodity, collecting commodities, addition commodity to shopping cart, order payment, order when using electric business platform
Evaluation, the behavioral data of such user individual, will be stored in platform service backstage in the form of User action log.In order to
Consumer's Experience is lifted, is increased the volume of product sales, major electric business platform carries out data mining analysis by collecting user data, realizes essence
Quasi- marketing, individual commodity recommendation is into the hot topic studied in recent years.The fast development of artificial intelligence and deep learning, also for
Family behavioural analysis has widened channel with data mining.With the continuous growth of net purchase user, net purchase frequency and commodity amount, kind
The increase of class, the also sharp increase of User action log yield, such as:The data volume that Taobao collects daily is up to 50TB how
The user data of magnanimity is efficiently collected, is the significant concern point for carrying out big data analysis.And to monitoring the efficient utilization of daily record,
It is then the important channel for improving electric business platform cluster service quality.
The content of the invention
The invention aims to solve in traditional electric business industry the drawbacks described above of the prior art on daily record data,
A kind of distributed log collecting method is provided.
The purpose of the present invention can be reached by adopting the following technical scheme that:
A kind of distributed log collecting method, the described method comprises the following steps:
Operation system produces daily record data;
Daily record data is collected using double buffers;
Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;
Daily record data is forwarded using based on master/slave Reactor patterns;
Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
Daily record data is read out.
Further, it is described that the detailed process that daily record data is collected is included using double buffers:
Double buffering is the one of which pattern of caching design.Cached by rationally designing, the obstruction for solving business thread is asked
Topic.
In the non-obstruction log collection scheme based on double buffers, before the producer of log information is referred to as in logic
Platform, perform the daily record consumer that specific I/O operation is carried out and be referred to as backstage.In electric business platform log collection this application scenarios,
Each business thread can be understood as a foreground, and daily record thread belongs to backstage.The program is distinguished for log collection front and back
Prepare a buffering area, be referred to as buffering area A and buffering area B, during initialization, each two buffer stoppers of Buffer allocation,
One master one is standby, and foreground host buffer block is used to receive current log data, and standby buffer stopper is awaited orders.Buffering area B daily record data is by rear
Platform thread reads and reported, and this programme use condition variable is as mechanism synchronous between multithreading.Etc. staying in conditional-variable
Backstage daily record reports thread, is waken up under two following situations:Front-end business thread writes full buffering area or the stand-by period surpasses
Spend 5 seconds.Scheme main logic performs timeline as shown in Figure 2.
During initialization, four buffer stoppers B1, B2, B3, B4, each two of front and back are distributed.Front desk service thread was responsible for day
Will data are written in buffering area A, and backstage daily record thread is responsible for the daily record data read in buffering area B and carries out actual I/O
Operation, data are write into specific file descriptor.When buffering area is that space-time, rear end daily record thread etc. is stayed in conditional-variable.
As shown in Figure 2, the 3rd second when, foreground host buffer block B1 has write full, starts stand-by buffer block B2, full by having write
Buffer stopper B1 submits to buffering area A containers.At the 4th second, buffer stopper B2 has also write full, conditional-variable trigger condition establishment, rear end
Daily record thread is waken up from kernel, and buffering area A and buffering area B buffer stopper are swapped, and thus, in the 4+ seconds, has been write
Two full buffer stoppers B1, B2 are handed in buffering area B, and backstage daily record thread is responsible for the daily record data of two buffer stoppers
Reported, and be originally used for two buffer stoppers B3, B4 of sky, be handed over to buffering area A, used for business thread, it can be seen that,
The follow-up daily record of foreground log collection work and backstage reports work to perform parallel completely, and front desk service thread will not be because of IO
Operate and block.When data report completion, front and back is owned by two empty buffering areas, is consistent with original state.
Front and back thread manages two even more than buffer stopper by buffering area container so that buffer stopper scheduling is cleverer
Living, the applicable scene of scheme is wider, for example to enter demand smaller when foreground log write, can be in A, B when collector journal speed is slower
A pair of buffer stoppers are exchanged between buffering area.When a large amount of daily records of business thread write-in or rear end daily record report work in a short time
When time-consuming longer, it can finally be discharged by business thread dynamically distributes buffer stopper by background thread.
To ensure that the data in buffering area are bound to export in limiting time, as described above, this programme is set in addition
One time threshold is as rear end daily record thread wakening condition., can be according to practical business need in other business scenarios
Seek the size for setting the time threshold.
Using double buffers, the program is decoupled service logic and specific I/O logics.Business thread need to be only responsible for
Log information is generated, specifically write operation is carried out at what point in time without being concerned about, by asynchronous mode, ensures business processing stream
The real-time of journey.Every message is sent to rear end compared to other common Log Collect Systems, based on the non-of double buffers
Multiple log informations are combined into a big buffered data by obstruction log collection scheme, are disposably sent to rear end, are avoided frequently
Rear end daily record thread is triggered, the effect similar to batch processing is realized, reduces overhead.By by journalizing logic from
To be peeled off in the critical path of business processing flow, front end possesses non-obstruction, low latency, the performance of low overhead, and for rear end,
Sufficiently large daily record handling capacity is ensure that on the premise of occupancy system fewer resource.
Further, it is described to give concurrent service request reasonable distribution to each service using LVS+Keepalived patterns
The detailed process of node includes:
High Availabitity load balancing, such as accompanying drawing are realized using LVS (Linux Virtual Server)+Keepalived patterns
Shown in 1.
Load balancing service sets at least two-server, and one is Master nodes, and other are Slave nodes, structure
LVS Virtual Server Cluster systems, and in each load balancing node deployment Keepalived components, Master nodes with
Cast communication is carried out by Virtual Router Redundacy Protocol VRRP between Slave nodes, Keepalived passes through its core protocol
VRRP realizes the failure automatic switchover between the isolation of load balancing node failure and node.System by load balancing service,
Service request is rationally distributed to each service node of Agent layer, improves system entirety concurrent request bearing capacity.
Further, the detailed process that described use is forwarded to daily record data based on master/slave Reactor patterns is such as
Under:
Reactor patterns and Proactor patterns are all to handle the common design pattern of network concurrent, and Reactor is supported
Multiple event sources are monitored by I/O multiplexings in a port, realize bottom-layer network monitoring, case distribution and upper layer application
The affairs separation of specific processing logic.Concurrent event operation when primary Reactor patterns can be to low consumption keeps high-performance.By
Substantially it is to belong to synchronous i/o in Reactor patterns, the processing of ready event set is serial process in internal system, and its is right
In the time-consuming event action of height and do not apply to, exactly because its synchronous characteristic, Reactor in the high time-consuming operation of processing,
The work of other parallel processings is may result in be blocked.Comparatively speaking, Proactor patterns are due to the spy of its asynchronous I/O
Property, operated without waiting for the processing of specific asynchronous event, overcome the defects of above-mentioned Reactor patterns are present, can handle simultaneously
Multiple service requests, it is that concurrent high time-consuming event handling scene suitably selects.
The characteristics of concurrent processing demand and existing I/O for electric business platform different priorities daily record multiplex model,
The present invention devises the classification daily record forwarding scheme based on master/slave Reactor patterns.In a particular application with data forwarding service
Hold the form of component to realize, while synchronous event disposal ability and the energy similar to the processing of Proactor patterns asynchronous event are provided
Power.
This programme designs two layers of Reactor, and structure is as shown in Figure 3:
MainReactor (main Reactor) is located at first layer, is responsible for monitoring the new connection from each node of business cluster just
Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority.
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB.
SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source
Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record
Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system
According to source.Herein by the design of SubReactor layers, ensure that scheme possesses real-time streaming data processing and height takes event handling
Ability.
Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4:
SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in this scenario, difficult point
It is the realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, with half
Synchronously/half asynchronous thought is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing
Ability.SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is asynchronous layer, queue respectively
Layer, layer, sync.
(1) asynchronous layer is realized by reactor and event separator, the event of responsible Treated Base, in monitoring service system
Readable event caused by the event source of connection is established.And bind readable event with corresponding event handler, it is put into task
In queue.
(2) queue layer is realized by blocking queue, is safeguarded as task queue at one group of configured good pending event
Reason task.
(3) layer, sync is realized by thread pool, is responsible for the parsing of processing daily record data, journal file caches and specific
Daily record forwarding logic.
SubReactorB is by the way that specific event handling logic to be given to the thread pool positioned at layer, sync, all time-consuming operations
It is unified to be all responsible for by the thread of thread pool, the asynchronism callback SubReactorB after the completion of event handling.So
SubReactorB externally provides asynchronous performance.Similar to Proactor pattern thoughts, when receiving the different of event handling
During step readjustment, illustrate that the event has handled completion.Herein by it is half synchronous/half asynchronous mode thought, with application layer software
The form of design optimization, it instead of dependence of the asynchronous I/O for operating system basic-level support.
Classification daily record forwarding scheme based on master/slave Reactor patterns provides difference according to the priority of log information
Data processing and forwarding scheme, externally provide the ability of synchronization and asynchronism and concurrency event handling, ensure that high priority simultaneously
The real-time of daily record data forwarding and the high-throughput of low priority daily record data forwarding.Further, since this programme employ it is double
The Reactor models of layer, also make it that this layer has very big autgmentability to data forwarding scheme in SubReactor, can be according to difference
Business demand customize multiple SubReactor, be not limited to be made a distinction with daily record priority or Log Types, or
The user-defined routing plan of person.
Further, the tool that described use is cleaned to daily record data, filtered based on the compound binary channels of Flume NG
Body process is as follows:
This programme is designed based on the log system Flume NG that increase income.Daily record data is in log system this layer with event
Form exist, basic data structure when event is Flume NG internal transmissions and processing.
Flume data flows are with event structure as shown in figure 5, event is made up of event header and event body two parts.
Data exist in the form of key-value pair (key-value) in event header, the number of key-value pair and the content of key-value pair
It can be set by User Defined, be generally used for recording the attribute of this daily record, such as Log Source node IP, event creation time is stabbed,
The information such as Log Source module name, logging level.
Event body is a byte arrays structure, the specifying information of log, when Log Source is journal file, the thing
Part body is a character string for containing single file text information.
The setting of event header provides the daily record header of structuring, foundation is provided for data route, in later data
Processing in data cleansing with having vital effect.
One complete Flume log system Collector layer, can be made up of one or more Agent, such as institute above
State, Agent has three main components, Source, Channel and Sink.FlumeAgent data processings basic procedure such as Fig. 6
It is shown;
Configurableization real time data cleaning platform is designed based on Flume log systems, it is intended to exports full dose user's row
For daily record while, according to the data cleansing of user configuration rules, specific data are landed and stored, for further analyzing
Processing.As shown in Figure 7;
Data cleansing be carry out big data analysis before must be through process, cleaning quality and the close phase of Data Analysis Model effect
Close.The model and algorithm of data cleansing, and one of existing hot topic studied both at home and abroad.This programme distributed information log system it
On, real time data cleaning module is designed, real-time logs data are cleaned in the rule that acquisition phase customizes according to user, made
The data sample that log system provides more conforms to the demand of data analyst.Usual data cleansing be related to missing values cleaning,
The steps such as format content cleaning, logic error cleaning, the cleaning of non-demand data.Daily record cleaning assembly is designed herein
LogCleanInterceptor, coordinate Flume multiplexing Channel selectors
(MultiplexingChannelSelector) component, cleaning and the route processing of real time data, scheme logic such as Fig. 8 are realized
It is shown.
1) Source components read in data from Log Source.
2) Source appoints write-in task to give Channel processors (ChannelProcessor).
3) event is passed to daily record cleaning assembly (LogCleanInterceptor) by Channel processors, the component from
In configuration file read event cleaning rule, the event of matched rule is handled, extract event body related content, addition or
Change event header.
4) event through over cleaning is passed to multiplexing Channel selectors by Channel processors
(Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration
The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
5) Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific
Channel.
Daily record cleaning of this programme based on Flume design High Availabitities, filtering scheme, ensure log transmission by transactional attribute
During reliability, to ensure the efficiency of whole system data stream transmitting, solve upstream data writing rate and downstream data
The operation such as cleaning, landing storage reading rate inconsistent the problem of bringing, the present invention answer in two kinds of primary passage top-level designs
Binary channels CDual-Channel is closed, to provide sufficient buffer memory capacity while ensureing Channel floor height handling capacities.Pass through simultaneously
Daily record cleaning assembly (LogCleanInterceptor) is designed, coordinates primary multiplexing Channel selectors, realizes
Before daily record data landing storage, real-time logs stream is cleaned according to the rule of user configuration, makes whole electric business platform point
The daily record data that cloth Log Collect System is collected more conforms to the demand of data analyst, reduce later data analysis when
Between cost.
The present invention is had the following advantages relative to prior art and effect:
1st, the present invention carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided adopts
Collect low delay, the performance of low overhead, solve common log system acquisition layer and operation system degree of coupling height, Maintenance Difficulty
The problem of degree is big, daily record data buffering aggravates operation system load.
2nd, the present invention is with the classification daily record forwarding scheme based on master/slave Reactor patterns, there is provided supports high concurrent, height
Handle up, the performance that daily record data transmission is highly reliable, reduce the coupling of operation system and log system, solve system buffer
Layer supports high concurrent write-in and data to keep the key issue of high reliability forwarding.
3rd, the present invention is cleaned and filtered to daily record data with based on the compound binary channels of Flume NG, is ensureing system
High-throughput is inconsistent with solving upstream data writing rate and downstream data Consumption rate on the premise of sufficient buffer capacity
The problem of and reduce later data analysis time cost.
Brief description of the drawings
Fig. 1 is distributed information log collection method load balancing node topology figure disclosed by the invention;
Fig. 2 is the logic using double buffers log collection scheme of distributed information log collection method disclosed by the invention
Perform time line chart;
Fig. 3 is being forwarded based on master/slave Reactor patterns daily record data for distributed information log collection method disclosed by the invention
The master/slave Reactor patterns double-decker figure of scheme;
Fig. 4 is that the classification daily record based on master/slave Reactor patterns of distributed information log collection method disclosed by the invention turns
Originating party case basic logic figure;
Fig. 5 is the Flume internal data flow graphs of distributed information log collection method disclosed by the invention;
Fig. 6 is the Flume Agent data processing basic flow sheets of distributed information log collection method disclosed by the invention;
Fig. 7 be distributed information log collection method disclosed by the invention based on the compound binary channels of Flume NG to daily record data
The structure chart cleaned, filtered;
Fig. 8 is the data cleansing logic chart of distributed information log collection method disclosed by the invention;
Fig. 9 is the schematic diagram of distributed information log collection method disclosed by the invention;
Figure 10 is distributed information log collection method process step figure disclosed by the invention;
Figure 11 is the system framework figure that distributed information log collection method disclosed by the invention is applied to adaptation electric business platform;
Figure 12 is that distributed information log collection method disclosed by the invention is applied to adaptation electric business plateform system flow chart.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Embodiment one
Present embodiment discloses a kind of distributed log collecting method, process step figure is referring to the drawings shown in 10, by attached
Figure 10 understands that the distributed log collecting method specifically includes following steps:
S1, operation system produce daily record data;
S2, using double buffers daily record data is collected;
In embodiment, described is collected specially using double buffers to daily record data:Daily record is grasped
Make logic to peel off from the critical path of business processing flow, log collection work is completed by Lightweight component in service node
Make, ensure the non-obstruction of business application under log collection scene, low latency and rear end daily record thread low overhead, high-throughput
Performance.
In the non-obstruction log collection scheme based on double buffers, before the producer of log information is referred to as in logic
Platform, perform the daily record consumer that specific I/O operation is carried out and be referred to as backstage.In electric business platform log collection this application scenarios,
Each business thread can be understood as a foreground, and daily record thread belongs to backstage.The program is distinguished for log collection front and back
Prepare a buffering area, be referred to as buffering area A and buffering area B, during initialization, each two buffer stoppers of Buffer allocation,
One master one is standby, and foreground host buffer block is used to receive current log data, and standby buffer stopper is awaited orders.Buffering area B daily record data is by rear
Platform thread reads and reported, and this programme use condition variable is as mechanism synchronous between multithreading.Etc. staying in conditional-variable
Backstage daily record reports thread, is waken up under two following situations:Front-end business thread writes full buffering area or the stand-by period surpasses
Spend 5 seconds.Scheme main logic performs timeline as shown in Figure 2.
Using double buffers, the program is decoupled service logic and specific I/O logics.Business thread need to be only responsible for
Log information is generated, specifically write operation is carried out at what point in time without being concerned about, by asynchronous mode, ensures business processing stream
The real-time of journey.Every message is sent to rear end compared to other common Log Collect Systems, based on the non-of double buffers
Multiple log informations are combined into a big buffered data by obstruction log collection scheme, are disposably sent to rear end, are avoided frequently
Rear end daily record thread is triggered, the effect similar to batch processing is realized, reduces overhead.By by journalizing logic from
To be peeled off in the critical path of business processing flow, front end possesses non-obstruction, low latency, the performance of low overhead, and for rear end,
Sufficiently large daily record handling capacity is ensure that on the premise of occupancy system fewer resource.
S3, using LVS+Keepalived patterns give concurrent service request reasonable distribution to each service node;
It is described to be given concurrent service request reasonable distribution using LVS+Keepalived patterns in embodiment
Each service node is specially:
High Availabitity load balancing, such as accompanying drawing are realized using LVS (Linux Virtual Server)+Keepalived patterns
Shown in 1.
Load balancing service sets at least two-server, and one is Master nodes, and other are Slave nodes, structure
LVS Virtual Server Cluster systems, and in each load balancing node deployment Keepalived components, Master nodes with
Cast communication is carried out by Virtual Router Redundacy Protocol VRRP between Slave nodes, Keepalived passes through its core protocol
VRRP realizes the failure automatic switchover between the isolation of load balancing node failure and node.System by load balancing service,
Service request is rationally distributed to each service node of Agent layer, improves system entirety concurrent request bearing capacity.
S4, using based on master/slave Reactor patterns classification forwarding is carried out to daily record data;
In embodiment, described use is based on master/slave Reactor patterns and classification forwarding tool is carried out to daily record data
Body is:
By increasing agent node among common operation system clustered node and log system clustered node, by business
System is full decoupled with log system, and operation system need to only be absorbed in realization and maintenance service logic, is carried for platform core business
For quality services.For log information, operation system need to only be submitted to agent node, be responsible for day by agent node
The caching of will and the distribution processor of different type daily record.
The characteristics of concurrent processing demand and existing I/O for electric business platform different priorities daily record multiplex model,
The present invention devises the classification daily record forwarding scheme based on master/slave Reactor patterns.In a particular application with data forwarding service
Hold the form of component to realize, while synchronous event disposal ability and the energy similar to the processing of Proactor patterns asynchronous event are provided
Power.
This programme designs two layers of Reactor, and structure is as shown in Figure 3:
MainReactor (main Reactor) is located at first layer, is responsible for monitoring the new connection from each node of business cluster just
Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority.
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB.
SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source
Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record
Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system
According to source.Herein by the design of SubReactor layers, ensure that scheme possesses real-time streaming data processing and height takes event handling
Ability.
Classification daily record forwarding scheme basic logic of the present invention based on master/slave Reactor patterns is as shown in Figure 4:In the party
SubReactorA daily record forwarding mechanism and primary Reactor patterns thinking are basically identical in case, and difficult point is
The realization of SubReactorB asynchronous event concurrent processing.Form of the invention by using Reactor+ thread pools, it is same with half
/ half asynchronous thought of step is designed to SubReactorB, SubReactorB is externally provided asynchronous event concurrent processing
Ability.SubReactorB asynchronous event concurrent processing internal logic using three-decker realize, be respectively asynchronous layer, queue layer,
Layer, sync.
Classification daily record forwarding scheme based on master/slave Reactor patterns provides difference according to the priority of log information
Data processing and forwarding scheme, externally provide the ability of synchronization and asynchronism and concurrency event handling, ensure that high priority simultaneously
The real-time of daily record data forwarding and the high-throughput of low priority daily record data forwarding.Further, since this programme employ it is double
The Reactor models of layer, also make it that this layer has very big autgmentability to data forwarding scheme in SubReactor, can be according to difference
Business demand customize multiple SubReactor, be not limited to be made a distinction with daily record priority or Log Types, or
The user-defined routing plan of person.
S5, using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
It is described that daily record data is cleaned based on Flume NG compound binary channels, filters tool in embodiment
Body is:
S501, Source component read in data from Log Source.
S502, Source appoint write-in task to give Channel processors (ChannelProcessor).
Event is passed to daily record cleaning assembly (LogCleanInterceptor), the group by S503, Channel processor
Part reads event cleaning rule from configuration file, and the event of matched rule is handled, and extracts event body related content, adds
Add or change event header.
Event through over cleaning is passed to multiplexing Channel selectors by S504, Channel processor
(Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration
The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
S505, Channel processor are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to
Specific Channel.
Daily record cleaning of this programme based on Flume design High Availabitities, filtering scheme, ensure log transmission by transactional attribute
During reliability, to ensure the efficiency of whole system data stream transmitting, solve upstream data writing rate and downstream data
The operation such as cleaning, landing storage reading rate inconsistent the problem of bringing, the present invention answer in two kinds of primary passage top-level designs
Binary channels CDual-Channel is closed, to provide sufficient buffer memory capacity while ensureing Channel floor height handling capacities.Pass through simultaneously
Daily record cleaning assembly (LogCleanInterceptor) is designed, coordinates primary multiplexing Channel selectors, realizes
Before daily record data landing storage, real-time logs stream is cleaned according to the rule of user configuration, makes whole electric business platform point
The daily record data that cloth Log Collect System is collected more conforms to the demand of data analyst, reduce later data analysis when
Between cost.
S6, daily record data is read out.
Embodiment two
A kind of distributed information log collection method is applied to specific electric business system by the present embodiment, i.e., is distributed for electric business platform
The scene of formula clustered node log collection a, there is provided service node loads light, operation system and log system decoupling, daily record
System easily extends highly reliable, the high-performance scheme with safeguarding.Accompanying drawing 10 gives the distributed information log collection method and carries out daily record
The flow chart of collection, the step of illustrating whole log collection process, and accompanying drawing 11, Figure 12 are distribution proposed by the present invention respectively
Log collecting method is applied to the system framework figure and system flow chart of specific electric business system, real in order to specifically introduce whole positioning
Apply to be achieved by the following way and be described:
S1, electric business plateform system produce daily record data;
S2, using double buffers daily record data is collected;
S3, using LVS+Keepalived patterns give concurrent service request reasonable distribution to each service node;
S4, using based on master/slave Reactor patterns classification forwarding is carried out to daily record data;
S5, using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
It is described that daily record data is cleaned based on Flume NG compound binary channels, filters tool in embodiment
Body is:
S501, Source component read in data from Log Source
S502, Source appoint write-in task to give Channel processors (ChannelProcessor).
Event is passed to daily record cleaning assembly (LogCleanInterceptor), the group by S503, Channel processor
Part reads event cleaning rule from configuration file, and the event of matched rule is handled, and extracts event body related content, adds
Add or change event header.
Event through over cleaning is passed to multiplexing Channel selectors by S504, Channel processor
(Multiplexing ChannelSelector), the component obtains event type information from event header, according to user configuration
The routing iinformation of acquisition event, the list of thing for adding routing iinformation is returned into Channel processors.
S505, Channel processor are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to
Specific Channel.
S6, electric business platform are read out to daily record data.
So far the whole mistake by distributed information log collection method collector journal data of the present invention from electric business platform is realized
Journey.
In summary, the embodiment is to perform stream using electric business plateform system workflow and distributed information log collection method
The mode that journey combines comprehensively describes the process of electric business plateform system collector journal data in embodiment.This method distributed information log
Collection method carries out collection of log data with the non-obstruction log collection scheme based on double buffers, there is provided collection is low to prolong
When, the performance of low overhead;Scheme is forwarded with the classification daily record based on master/slave Reactor patterns, there is provided supports height simultaneously
Hair, height are handled up, daily record data transmits highly reliable performance, reduce the coupling of operation system and log system;With based on
The compound binary channels of Flume NG is cleaned and filtered to daily record data, is ensureing system high-throughput and sufficient buffer capacity
Under the premise of, solve upstream data writing rate present in traditional electric business systematic collection daily record and downstream data Consumption rate not
The problem of consistent, meanwhile, not only solve common log system acquisition layer and operation system degree of coupling are high, maintenance difficulties are big,
Daily record data buffering aggravates the problem of operation system load and also solves the support high concurrent write-in of system buffer layer and data
The key issue of high reliability forwarding is kept, so as to efficiently collect the user data of magnanimity, lifts the informationization of electric business industry
Intelligent level and raising electric business platform cluster service quality, are combined with big data for electric business industry and provide new development point.
Above-described embodiment is the preferable embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any Spirit Essences without departing from the present invention with made under principle change, modification, replacement, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (10)
1. a kind of distributed information log collection method, it is characterised in that methods described comprises the following steps:
Operation system produces daily record data;
Daily record data is collected using double buffers;
Concurrent service request reasonable distribution is given to each service node using LVS+Keepalived patterns;
Classification forwarding is carried out to daily record data using based on master/slave Reactor patterns;
Using being cleaned, filtered to daily record data based on the compound binary channels of Flume NG;
Daily record data is read out.
2. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine
System is collected specially to daily record data:
Journalizing logic is peeled off from the critical path of business processing flow, it is complete by Lightweight component in service node
Worked into log collection, ensure the non-obstruction of business application, low latency under log collection scene and rear end daily record thread is low opens
Pin, the performance of high-throughput.
3. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine
System is collected specially to daily record data:
Service logic and specific I/O logics are decoupled using double buffers, business thread need to only be responsible for generation daily record and disappear
Breath, is not related to and specifically carries out write operation at what point in time, by asynchronous mode, ensures the real-time of business processing flow.
4. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses double buffering machine
System daily record data is collected it is middle multiple log informations are combined into a big buffered data, be disposably sent to rear end, keep away
Exempt from frequently to trigger rear end.
5. a kind of distributed information log collection method according to claim 1, it is characterised in that described uses LVS+
Concurrent service request reasonable distribution is specially by Keepalived patterns to each service node:
High Availabitity load balancing is realized using LVS+Keepalived patterns, load balancing service sets at least two-server,
One is Master nodes, and other are Slave nodes, builds LVS Virtual Server Cluster systems, and in each load balancing section
Point deployment Keepalived components, group is carried out between Master nodes and Slave nodes by Virtual Router Redundacy Protocol VRRP
Communication is broadcast, Keepalived realizes the failure between the isolation of load balancing node failure and node by its core protocol VRRP
Automatically switch.
A kind of 6. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/
Carrying out classification forwarding to daily record data from Reactor patterns is specially:
By increasing agent node among common operation system clustered node and log system clustered node, by operation system
Full decoupled with log system, operation system is only absorbed in realization and maintenance service logic, is provided for platform core business high-quality
Service, for log information, operation system need to only be submitted to agent node, be responsible for the slow of daily record by agent node
Deposit and the distribution processor of different type daily record.
A kind of 7. distributed information log collection method according to claim 1, it is characterised in that described use be based on it is main/
Carrying out classification forwarding to daily record data from Reactor patterns is specially:
Two layers of Reactor, respectively MainReactor and SubReactor are designed, wherein,
MainReactor is located at first layer, as main Reactor, is responsible for monitoring the new connection from each node of business cluster just
Thread event, and different SubReactor is distributed into the network I/O operation for having established connection according to daily record priority;
SubReactor is located at the second layer, including two from Reactor:SubReactorA and SubReactorB, wherein,
SubReactorA is responsible for the acquisition forwarding work of high priority daily record, and synchronous event concurrent processing is provided to front end Log Source
Ability, provides rear end log system stable log stream, and SubReactorB is responsible for the acquisition forwarding work of low priority daily record
Make, the ability of asynchronous event concurrent processing is provided front end Log Source, the number of file cache form is provided rear end log system
According to source.
A kind of 8. distributed information log collection method according to claim 7, it is characterised in that described SubReactorB
By using the form of Reactor+ thread pools, so that partly synchronously/half asynchronous thought is designed, externally asynchronous event is provided simultaneously
The ability of processing is sent out, wherein, SubReactorB asynchronous event concurrent processing internal logic is realized using three-decker, is respectively
Asynchronous layer, queue layer, layer, sync.
9. a kind of distributed information log collection method according to claim 1, it is characterised in that described use is based on
The compound binary channels of Flume NG cleaned to daily record data, is filtered and is specifically included following steps:
Source components read in data from Log Source;
Source appoints write-in task to give Channel processors;
Event is passed to daily record cleaning assembly by Channel processors, and the component reads event cleaning rule from configuration file,
The event of matched rule is handled, extracts event body related content, addition or modification event header;
Event through over cleaning is passed to multiplexing Channel selectors by Channel processors, and the component is from event header
Event type information is obtained, the routing iinformation of event is obtained according to user configuration, the list of thing for adding routing iinformation is returned
Give Channel processors;
Channel processors are responsible for by batch event in the form of affairs, according to corresponding routing iinformation, are sent to specific
Channel。
10. a kind of distributed information log collection method according to claim 9, it is characterised in that described use is based on
The compound binary channels of Flume NG is cleaned to daily record data, filtered specially:
Based on the compound twin-channel log systems of Flume NG and data cleaning method by designing the distributed information log of High Availabitity
System, and the target that realization configurableization real time data is cleaned in log system, support to be advised according to customized data cleansing
Then and routing rule, the daily record data after cleaning is exported into different memory nodes respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710654304.3A CN107590182B (en) | 2017-08-03 | 2017-08-03 | Distributed log collection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710654304.3A CN107590182B (en) | 2017-08-03 | 2017-08-03 | Distributed log collection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107590182A true CN107590182A (en) | 2018-01-16 |
CN107590182B CN107590182B (en) | 2020-06-19 |
Family
ID=61042096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710654304.3A Active CN107590182B (en) | 2017-08-03 | 2017-08-03 | Distributed log collection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107590182B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108280015A (en) * | 2018-02-07 | 2018-07-13 | 福建星瑞格软件有限公司 | Cluster server daily record real-time processing method based on big data and computer equipment |
CN108319543A (en) * | 2018-01-24 | 2018-07-24 | 广州江南科友科技股份有限公司 | A kind of asynchronous processing method and its medium, system of computer log data |
CN109815221A (en) * | 2018-12-20 | 2019-05-28 | 中科曙光南京研究院有限公司 | A kind of quasi real time stream data cleaning method and cleaning system |
CN110298001A (en) * | 2019-05-30 | 2019-10-01 | 北京奇艺世纪科技有限公司 | The acquisition methods and device and computer readable storage medium of daily record data packet |
CN110377578A (en) * | 2019-07-12 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of data processing method and device based on improved Flume |
CN110389933A (en) * | 2019-07-01 | 2019-10-29 | 京信通信系统(中国)有限公司 | Blog management method and device between a kind of process |
CN110569112A (en) * | 2019-09-12 | 2019-12-13 | 华云超融合科技有限公司 | Log data writing method and object storage daemon device |
CN110599243A (en) * | 2019-09-03 | 2019-12-20 | 浩鲸云计算科技股份有限公司 | Customer-oriented journey marketing method and system |
CN111158876A (en) * | 2019-12-26 | 2020-05-15 | 杭州安恒信息技术股份有限公司 | Log processing method, device and equipment and computer readable storage medium |
CN111290860A (en) * | 2018-12-10 | 2020-06-16 | 中国移动通信集团四川有限公司 | Data channel adjusting method, device, equipment and medium |
CN112000583A (en) * | 2020-09-17 | 2020-11-27 | 深圳市有方科技股份有限公司 | Debugging information capturing method and device |
WO2021036684A1 (en) * | 2019-08-27 | 2021-03-04 | 深圳前海微众银行股份有限公司 | Distributed data synchronization method, apparatus and device and readable storage medium |
CN112882808A (en) * | 2021-02-08 | 2021-06-01 | 上海弘积信息科技有限公司 | Method for collecting and sending big data audit log of application delivery equipment |
CN113032375A (en) * | 2019-12-24 | 2021-06-25 | 广州如加网络科技有限公司 | Data acquisition and aggregation method based on Flume |
CN113111071A (en) * | 2021-05-11 | 2021-07-13 | 星辰天合(北京)数据科技有限公司 | Object processing method, device, nonvolatile storage medium and processor |
CN114780348A (en) * | 2022-04-28 | 2022-07-22 | 四川虹魔方网络科技有限公司 | Method for asynchronously monitoring client operation log based on distributed deployment environment |
CN115333800A (en) * | 2022-07-27 | 2022-11-11 | 中国第一汽车股份有限公司 | Vehicle-mounted vehicle-cloud integrated log collecting and analyzing method, vehicle and cloud server |
CN117312101A (en) * | 2023-11-28 | 2023-12-29 | 苏州元脑智能科技有限公司 | Method and device for determining structure log, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3752314A (en) * | 1971-08-27 | 1973-08-14 | Rust Eng Co | Flume water recycling apparatus |
CN103309767A (en) * | 2012-03-08 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for processing client log |
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
CN106549809A (en) * | 2016-11-24 | 2017-03-29 | 成都广达新网科技股份有限公司 | One kind realizes network management system equipment state acquisition methods and device |
CN106874160A (en) * | 2017-01-23 | 2017-06-20 | 上海斐讯数据通信技术有限公司 | Log server and its management method |
-
2017
- 2017-08-03 CN CN201710654304.3A patent/CN107590182B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3752314A (en) * | 1971-08-27 | 1973-08-14 | Rust Eng Co | Flume water recycling apparatus |
CN103309767A (en) * | 2012-03-08 | 2013-09-18 | 阿里巴巴集团控股有限公司 | Method and device for processing client log |
CN104036025A (en) * | 2014-06-27 | 2014-09-10 | 蓝盾信息安全技术有限公司 | Distribution-base mass log collection system |
CN106549809A (en) * | 2016-11-24 | 2017-03-29 | 成都广达新网科技股份有限公司 | One kind realizes network management system equipment state acquisition methods and device |
CN106874160A (en) * | 2017-01-23 | 2017-06-20 | 上海斐讯数据通信技术有限公司 | Log server and its management method |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319543A (en) * | 2018-01-24 | 2018-07-24 | 广州江南科友科技股份有限公司 | A kind of asynchronous processing method and its medium, system of computer log data |
CN108280015A (en) * | 2018-02-07 | 2018-07-13 | 福建星瑞格软件有限公司 | Cluster server daily record real-time processing method based on big data and computer equipment |
CN111290860A (en) * | 2018-12-10 | 2020-06-16 | 中国移动通信集团四川有限公司 | Data channel adjusting method, device, equipment and medium |
CN111290860B (en) * | 2018-12-10 | 2023-08-15 | 中国移动通信集团四川有限公司 | Data channel adjusting method, device, equipment and medium |
CN109815221A (en) * | 2018-12-20 | 2019-05-28 | 中科曙光南京研究院有限公司 | A kind of quasi real time stream data cleaning method and cleaning system |
CN110298001A (en) * | 2019-05-30 | 2019-10-01 | 北京奇艺世纪科技有限公司 | The acquisition methods and device and computer readable storage medium of daily record data packet |
CN110298001B (en) * | 2019-05-30 | 2021-11-09 | 北京奇艺世纪科技有限公司 | Method and device for acquiring log data packet and computer readable storage medium |
CN110389933A (en) * | 2019-07-01 | 2019-10-29 | 京信通信系统(中国)有限公司 | Blog management method and device between a kind of process |
CN110389933B (en) * | 2019-07-01 | 2022-04-22 | 京信网络系统股份有限公司 | Inter-process log management method and device |
CN110377578A (en) * | 2019-07-12 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of data processing method and device based on improved Flume |
CN110377578B (en) * | 2019-07-12 | 2022-06-07 | 苏州浪潮智能科技有限公司 | Improved Flume-based data processing method and device |
WO2021036684A1 (en) * | 2019-08-27 | 2021-03-04 | 深圳前海微众银行股份有限公司 | Distributed data synchronization method, apparatus and device and readable storage medium |
CN110599243A (en) * | 2019-09-03 | 2019-12-20 | 浩鲸云计算科技股份有限公司 | Customer-oriented journey marketing method and system |
CN110569112A (en) * | 2019-09-12 | 2019-12-13 | 华云超融合科技有限公司 | Log data writing method and object storage daemon device |
CN110569112B (en) * | 2019-09-12 | 2022-04-08 | 江苏安超云软件有限公司 | Log data writing method and object storage daemon device |
CN113032375A (en) * | 2019-12-24 | 2021-06-25 | 广州如加网络科技有限公司 | Data acquisition and aggregation method based on Flume |
CN111158876A (en) * | 2019-12-26 | 2020-05-15 | 杭州安恒信息技术股份有限公司 | Log processing method, device and equipment and computer readable storage medium |
CN112000583A (en) * | 2020-09-17 | 2020-11-27 | 深圳市有方科技股份有限公司 | Debugging information capturing method and device |
CN112882808B (en) * | 2021-02-08 | 2023-10-24 | 上海弘积信息科技有限公司 | Method for collecting and transmitting big data audit log of application delivery equipment |
CN112882808A (en) * | 2021-02-08 | 2021-06-01 | 上海弘积信息科技有限公司 | Method for collecting and sending big data audit log of application delivery equipment |
CN113111071A (en) * | 2021-05-11 | 2021-07-13 | 星辰天合(北京)数据科技有限公司 | Object processing method, device, nonvolatile storage medium and processor |
CN113111071B (en) * | 2021-05-11 | 2024-05-07 | 北京星辰天合科技股份有限公司 | Object processing method, device, nonvolatile storage medium and processor |
CN114780348A (en) * | 2022-04-28 | 2022-07-22 | 四川虹魔方网络科技有限公司 | Method for asynchronously monitoring client operation log based on distributed deployment environment |
CN114780348B (en) * | 2022-04-28 | 2023-02-07 | 四川虹魔方网络科技有限公司 | Method for asynchronously monitoring client operation log based on distributed deployment environment |
CN115333800A (en) * | 2022-07-27 | 2022-11-11 | 中国第一汽车股份有限公司 | Vehicle-mounted vehicle-cloud integrated log collecting and analyzing method, vehicle and cloud server |
CN117312101A (en) * | 2023-11-28 | 2023-12-29 | 苏州元脑智能科技有限公司 | Method and device for determining structure log, storage medium and electronic equipment |
CN117312101B (en) * | 2023-11-28 | 2024-02-27 | 苏州元脑智能科技有限公司 | Method and device for determining structure log, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107590182B (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107590182A (en) | A kind of distributed information log collection method | |
CN105959151B (en) | A kind of Stream Processing system and method for High Availabitity | |
CN109492040B (en) | System suitable for processing mass short message data in data center | |
CN101388844B (en) | Data flow processing method and system | |
CN103024014B (en) | By the method and system of the mass data distribution processor of message queue | |
CN104639374B (en) | A kind of application deployment management system | |
WO2020215532A1 (en) | System and method for data synchronization between heterogeneous databases, and storage medium | |
CN101519078B (en) | Method for synchronizing multi-region data of comprehensive monitoring system | |
CN106339509A (en) | Power grid operation data sharing system based on large data technology | |
CN103870297B (en) | The performance data collection system and method for virtual machine in cloud computing environment | |
CN107679192A (en) | More cluster synergistic data processing method, system, storage medium and equipment | |
CN107038162A (en) | Real time data querying method and system based on database journal | |
CN107332719A (en) | A kind of method that daily record is analyzed in real time in CDN system | |
CN104391930A (en) | Distributed file storage device and method | |
CN105518641B (en) | Point-to-Point Data clone method, equipment and system and host node switching method, equipment and system | |
CN107480251A (en) | A kind of system for managing data access | |
CN107018042A (en) | Method for tracing and tracing system for online service system | |
CN104104738A (en) | FTP-based (file transfer protocol-based) data exchange system | |
CN109710731A (en) | A kind of multidirectional processing system of data flow based on Flink | |
CN104778188A (en) | Distributed device log collection method | |
CN103634411B (en) | A kind of marketing data real time broadcasting system and method with state consistency | |
CN110266783A (en) | A kind of railway CTC system communications platform based on DDS | |
CN109677465A (en) | Distributed real-time systems framework for track traffic synthetic monitoring system | |
CN107733696A (en) | A kind of machine learning and artificial intelligence application all-in-one dispositions method | |
CN106375480A (en) | Electric energy data real-time acquisition system and method based on distributed system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |