[specific embodiment]
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall in the protection scope of this application.
Fig. 1 is the flow diagram for the daily record data processing method that one embodiment of the application provides.As shown in Figure 1, the party
Method includes:
101, the association pretreatment subtask in current log data processing task is executed in daily record data processing system
Mapping node sends inquiry request to the corresponding destination client agent apparatus of the mapping node, and receives destination client generation
Manage device target journaling data according to needed for the current log data processing task that inquiry request returns;Wherein, target customer
End agent apparatus is the Client Agent device being deployed on the log host for generating target journaling data, inquiry request packet
Include: journal file mark and time segment identification, target journaling data be the journal file identify in identified journal file
The daily record data generated in the period that the time segment identification is identified.
If 102, above-mentioned mapping node receives the mesh that the corresponding all destination client agent apparatus of mapping node return
Daily record data is marked, pretreatment is associated to all target journaling data received, and association pre-processed results are sent to
The conclusion node of the association process subtask in current log data processing task is executed in daily record data processing system.
Locate in advance if 103, concluding node and receiving the association that all mapping nodes for executing association pretreatment subtask are sent
Reason is as a result, be associated processing to all association pre-processed results received, and export association process result.
Method provided in this embodiment can be executed by daily record data processing system, and daily record data processing system refers to responsible
It may include mapping node (Mapper) and conclusion node (Reducer), mapping node to the system that daily record data is handled
Daily record data processing task is completed with node mutual cooperation is concluded.For example, the daily record data processing system can be but not limited to:
Optimization and improved spark system, or optimization and improved Hadoop system.
Wherein, log host refers to the various equipment for generating daily record data, such as can be computer, production system
Server etc..In the present embodiment, a daily record data processing task can be appointed by association pretreatment is any with association process
Business is constituted.
Specifically, mapping node is mainly responsible for association pretreatment during executing current log data processing task
Task, association pretreatment subtask mainly include log data acquisition, daily record data parsing, locate in advance to the association of daily record data
It manages and association pre-processed results is sent to conclusion node;Here association pretreatment is primarily referred to as closing daily record data
And handle, to obtain the process for concluding processing result needed for first layer logical node in node.And it concludes node and is mainly responsible for
Association process subtask, association process subtask, which specifically includes that, receives the association pre-processed results that mapping node is sent, and to pass
Connection pre-processed results are associated processing, export association process result.The association process for concluding intra-node is primarily referred to as concluding
The process that the multilayer logic processing node that intra-node includes is handled.
Since in the prior art, the rear calculating stream either by representative of Hadoop is still using Storm as the pre- of representative
Stream is calculated, it is all here not can guarantee daily record data to be treated, so as to cause processing result inaccuracy.The problem can be with
Referred to as " the complete degree problem of daily record data ", the present embodiment specifically use the following to solve daily record data of the existing technology
Complete degree problem:
First point: guaranteeing complete degree when acquisition daily record data using Client Agent device.
Daily record data can be stored in the journal file on log host after generating.In general, daily record data is in log place
It is not permanently to retain on host, usually daily rolls, retaining fixed number of days can be deleted later, but day under normal circumstances
It is the fields such as main syllabus target monitoring calculating that retention time of the will data on log host, which has met enough with real-time analysis,
Scape.
Log host itself can provide the query service of log.But log host is usually directly to final
User provides service, and excessive performance cannot be consumed because of log analysis.In order to not influence the performance of log host as far as possible,
The present embodiment disposes Client Agent device on log host, provides log query function by Client Agent device, and
And log query function bring performance consumption is confined on required network bandwidth as far as possible, other performance consumptions are not generated.
That is, the Client Agent device being deployed on log host in the present embodiment can be provided accurately in designated time period
The query function of the daily record data of generation, target journaling data needed for ensuring that current log data processing task in this way
It is exactly complete when entering daily record data processing system from log host.
The query function of the daily record data generated in designated time period can be accurately provided in order to realize, Client Agent dress
It sets and needs to be used to indicate the daily record data in journal file in the daily record data creation that different time generates for log host
Position location index.In this way, Client Agent device can be quickly found out log number to be checked according to location index
According to performance consumption of the reduction to log host.
For example, by taking the application scenarios handled in real time preiodic type daily record data as an example, in the application scenarios, log
Data processing task is all periodically that periodicity here refers to that data processing task will handle same time in per next day
The daily record data generated in section, such as the daily record data that each duty cycle is both needed to generate in processing one minute is with this week of output
The log analysis of phase is as a result, what this task needed to inquire was generated in certain minute, such as this minute of 2014-11-1111:11
Daily record data.Based on this, the present embodiment sets unit period, using the daily record data generated in unit period as minimum data list
Position is stored and is inquired.Wherein, unit period can be 1 second, 1 minute, 2 minutes, 3 minutes, 1 hour etc., specifically can root
Depending on user demand.Based on this, the period that the time segment identification that above-mentioned inquiry request includes is identified includes at least one
Unit period.It is meant that mapping node one query request can inquire the daily record data generated at least one unit period.
Based on above-mentioned, Client Agent device should be daily record data that log host generates in each unit period
Create location index.
It is worth noting that location index can use starting byte position and log number of the daily record data in journal file
According to byte length indicate;Alternatively, location index can also with starting byte position of the daily record data in journal file and
End byte position indicates.
Specifically, for each unit period, Client Agent device is to the upper unit period of the unit period
Tail data is parsed, and determines the initial position of the daily record data generated in the unit period, and to the tail of the unit period
Portion's data are parsed, and the length of the daily record data generated in the unit period are determined, as the day generated in the unit period
The location index of will data.Wherein, the tail data of the unit period refers to the day generated when the unit period closes to an end
Will data.
In simple terms, the characteristics of log printing is mainly utilized in above embodiment, i.e. time increasing printing (is meant
The daily record data that later generation can not occur is printed upon before the daily record data earlier generated), replace in the adjacent cells period
At the time of, the end for the daily record data that Client Agent device is generated out of unit period is read a small amount of byte and is parsed, really
The starting of the daily record data generated in the end point and next unit period of the daily record data generated in settled preceding unit period
Point.
For example, for unit period T and unit period T+1, if the end to the daily record data generated in unit period T
It is analyzed, determines that next byte of the end data is the daily record data generated in unit period T+1, that explanation is in this position
Setting the daily record data generated in unit period T all terminates, and the daily record data generated in unit period T+1 is just opened
Begin, this end data is exactly the alternating point in two adjacent cells periods.After finding alternately point, Client Agent device can
It (wherein, can according to initial point position and end point position to obtain the end point position of the daily record data generated in unit period T
To calculate the byte length of daily record data) and unit period T+1 generate daily record data initial point position;So follow
Ring repeatedly, can obtain the initial point position and byte length of the daily record data generated in each unit period, i.e. location index.
Based on above-mentioned, when there is daily record data to handle task, it is pre- to be responsible for the association executed in the daily record data processing task
The mapping node for handling subtask can be by sending inquiry request to Client Agent device, and day is carried in inquiry request
Will file identification and time segment identification get the log number of daily record data processing required by task from Client Agent device
According to.For ease of description, by taking current log data processing task as an example, and the client that current log data processing task is related to
Agent apparatus is known as destination client agent apparatus, and daily record data needed for current log data processing task is known as target day
Will data come from every log if daily record data needed for current log data processing task comes from more log hosts
The daily record data of host can be referred to as target journaling data.Target journaling data needed for current log data processing task
Namely the journal file identifies the day generated within the period that the time segment identification is identified in identified journal file
Will data.
It, can be according to therein after receiving the inquiry request that mapping node is sent for the agent apparatus of destination client
Journal file mark, determines the journal file for storing the target journaling data, and then find in the time according to time segment identification
The daily record data generated in the period that segment identification is identified, and returned to the daily record data found as target journaling data
Mapping node.
For example, above-mentioned location index is based on, when mapping node is divided with period 14:02 as a parameter to inquiring destination client
When agent apparatus, destination client agent apparatus can return to neither too much nor too little log number according to the location index of daily record data
According to.Complete degree problem when daily record data is acquired which solves mapping node.
It is worth noting that since Client Agent device has the ability of complete degree when guaranteeing acquisition daily record data,
So mapping node can start the operation of acquisition daily record data at any time, start the behaviour for acquiring daily record data without worrying
The opportunity of the work improper problem for causing log data acquisition incomplete.
For example, still by taking the application scenarios handled in real time preiodic type daily record data as an example, mapping node in fact can be with
Just the starting " daily record data that 14:02 points of this periods of acquisition generate " at 14:03:00 seconds of the task, because when mapping node hair
Rise inquiry request arrive destination client agent apparatus when, destination client agent apparatus can according to current location index building
Situation is dealed on the merits of each case: if the location index of daily record data that generates of 14:02 point this period output, that can directly be examined
Rope is to corresponding daily record data and returns;If also non-output, that is returned to inquiry failure, and the acquisition tasks of mapping node do failure
Processing.
Second point: guarantee the complete degree of daily record data pretreatment stage data by the processing logic of mapping node.
Mapping node is matched with destination client agent apparatus, it is ensured that the mesh from destination client agent apparatus
Mark daily record data is all the daily record data generated in designated time period.It is contemplated that a daily record data processing task may be same
When be related to the daily record data of at least two log hosts generation, that is to say, that mapping node may need to be responsible for from least two
Client Agent device acquires daily record data, this needs to solve at least two Client Agent devices and return for mapping node
Complete degree problem when the target journaling data returned reach.
In the present embodiment, mapping node receives target visitor after sending inquiry request to destination client agent apparatus
The target journaling data that family end agent apparatus returns, judge whether to receive the institute that current log data processing task is related to later
The target journaling data for having destination client agent apparatus to return, are only receiving what current log data processing task was related to
In the case where the target journaling data that all destination client agent apparatus return, just to all target journaling data received
It is parsed, and pretreatment is associated to the target journaling data after parsing, and then association pre-processed results are sent to and are returned
Receive node.
In an optional embodiment, daily record data processing system further includes task management node, is responsible for carry out task volume
Row.Task management node meeting timing driving task layout, and send mapping node for programmed task and conclude node.This
In task layout relate generally to daily record data processing task be related to how many a target journaling data (correspond to destination client
The number of agent apparatus), how many association pretreatment subtask (number corresponding to mapping node) and target journaling data
Mapping relations between association pretreatment subtask are (namely corresponding between destination client agent apparatus and mapping node
Relationship, an exactly mapping node is responsible for several destination client agent apparatus and which in destination client generation in simple terms
Manage device etc.) etc..
For mapping node, the mark of the corresponding destination client agent apparatus of the mapping node can be pre-configured with
Information.The identification information of destination client agent apparatus can be its IP address, title etc..A kind of determining mapping node and target
The embodiment of corresponding relationship between Client Agent device includes: that task management node determines that current log data processing is appointed
The number for the number for the log host being related to and the mapping node of execution current log data processing task of being engaged in;According to log place
The number of the mapping node of the number and execution current log data processing task of host, determines that each mapping node respectively corresponds
Destination client agent apparatus, and the identification information of the corresponding destination client agent apparatus of each mapping node is supplied to
Each mapping node.
Assuming that the log host that current log data processing task is related to has N platform, that is, the destination client agency being related to
Device has N number of, and the mapping node for being responsible for daily record data association pretreatment subtask in daily record data processing system has M, can be with
The corresponding destination client agent apparatus of each mapping node is determined using preset allocation algorithm.Allocation algorithm can be to mesh
Mark Client Agent device is ranked up, such as can be according to the IP address of destination client agent apparatus, by IP address conversion
For one long whole (long) type numerical value, long type numerical value is pressed into sort ascending, obtains the sequence of destination client agent apparatus;It adopts
Mapping node is ranked up in the same way;According to preset algorithmic formula, for example, m=n/ (N/M) obtain it is each
The corresponding destination client agent apparatus of mapping node, in the algorithmic formula, n indicates the sequence of destination client agent apparatus
Number, m indicates that mapping node needs the serial number of corresponding destination client agent apparatus, than acting on behalf of if any 100 destination clients
Device, 20 mapping nodes, that the 3rd destination client agent apparatus should be corresponding by the 0th mapping node, i.e., and (3/ (100/
20)=0), the 6th destination client agent apparatus should be corresponding by the 1st mapping node, i.e., (6/ (100/20)=1), with this
Analogize, obtains the corresponding destination client agent apparatus of each mapping node.It is worth noting that being removed for above-mentioned algorithmic formula
Not to the utmost situations such as, can do and adjust slightly.
The corresponding destination client agent apparatus of each mapping node is determined through the above way, thus can be every
A mapping node configures the identification information of corresponding destination client agent apparatus.Based on this, when mapping node receives target
When daily record data, record returns to the identification information of the destination client agent apparatus of the target journaling data, and returning record
Return the identification information of the destination client agent apparatus of target journaling data and the mark of preconfigured Client Agent device
Information is compared;If the identification information of the destination client agent apparatus of the return target journaling data recorded with match in advance
The identification information for the Client Agent device set is identical, and determination receives the corresponding all destination clients of the mapping node
The target journaling data that agent apparatus returns.
Illustrate herein, in the above-described embodiment, destination client agent apparatus can be disposably by target journaling data
It is sent to corresponding mapping node.Alternatively, destination client agent apparatus can also be using serial mode repeatedly in batches by mesh
Mark daily record data is transferred to corresponding mapping node.Such as remote procedure call protocol (Remote can be used
Procedure Call Protocol, RPC) transmission mode.For actually, either disposable transmission or repeatedly in batches
Transmission, destination client agent apparatus will guarantee that target journaling data are to guarantee target journaling data tool with " task-driven "
There are transaction attributes or whole transmission success or all transmission failures.Based on this, mapping node can be according to each target
The end of identification of daily record data come judge target journaling data whether the end of transmission, and determining target journaling the data transfer ends
When, it just will record the identification information for transmitting the destination client agent apparatus of the target journaling data.The knot of target journaling data
Beam identification can be the self-contained timestamp of target journaling data, or can be destination client agent apparatus in target day
The specific identity added in will data.
It is worth noting that, being returned even if destination client agent apparatus does not get any target journaling data
For sky, it is also desirable to normally " sky output " is returned to mapping node, in order to which mapping node is able to record that the destination client is acted on behalf of
The identification information of device.
Thirdly: the processing logic by concluding node guarantees the complete degree of association process phase data.
Conclude node receive mapping node send pre-processed results, it is possible to current log data processing task be related to
Few two mapping nodes then conclude node and judge whether to receive all mapping sections that current log data processing task is related to
The pre-processed results that point is sent, only receive all mapping nodes that current log data processing task is related to send it is pre-
In the case where processing result, processing just is associated to all pre-processed results received, and generate association process result.
In an optional embodiment, for concluding for node, current log data processing task can be pre-configured with
The identification information for the mapping node being related to.The identification information of mapping node can be its IP address, title etc..Based on this, Radix Angelicae Sinensis
When node of receiving receives association pre-processed results, record returns to the identification information of the mapping node of the association pre-processed results, and
By the identification information of the identification information of the mapping node of the return association pre-processed results of record and preconfigured mapping node
It is compared;If the identification information and preconfigured mapping node of the mapping node of the return association pre-processed results recorded
Identification information it is identical, determine receive it is all execute current log data processing tasks mapping nodes return association
Pre-processed results.
In the present embodiment, concluding node may include multilayer logic processing node, wherein all first layer logical process
The merging treatment of node is as a result, input as second layer logical processing nodes, the merging treatment of all second layer logical nodes
As a result, the input ... ... as third layer logical processing nodes, until the processing of all logical processing nodes terminates and exports
Final association process result.
From the foregoing, it can be seen that in the present embodiment, executing the mapping node of current log data processing task and being deployed in production
Client Agent device on the log host of target journaling data needed for raw current log data processing task matches,
The daily record data generated in designated time period can be accurately obtained, allows daily record data processing system in time to daily record data
It is handled, realizes the real-time processing to daily record data;In addition, Client Agent device allows mapping node to obtain specified time
The daily record data generated in section, to guarantee that daily record data handles the whole here daily record data processing system of daily record data of required by task
System lays the foundation, and further inside daily record data processing system, mapping node is only coming from mapping node by judgement
When the daily record data of corresponding all clients agent apparatus is all here, just executes association pretreatment and association is pre-processed into knot
Fruit is sent to conclusion node, concludes node by judgement, only in the mapping section from all execution journal data processing tasks
After the association pre-processed results of point are all here, processing, and final output pass just are associated to the relevant pre-processed results of institute
Join processing result, since the processing of each step can guarantee to execute in the case where required data are all here, can mention
The reliability of high disposal result.
In an optional embodiment, in order to further increase the real-time of processing daily record data, using simplified log
Data processing system structure, as shown in Fig. 2, the daily record data processing system is double-layer structure, i.e., from the point of view of physical structure angle
Including one layer of mapping node 21 (being physical node) and one layer of conclusion node 22 (being physical node), mapping node 21 can be to
It is one few, and concluding node 22 is one.
It is repeated no more about mapping node 21 and the working principle for concluding node 22.At daily record data provided in this embodiment
Reason system is similar with Spark, but difference is, the process for according to keywords converging (groupByKey) has been placed into a conclusion section
The inside of point 22, and mapping node 21 can pre-process to conclude node 22.I.e. when mapping node 21 gets each target
Target journaling data that Client Agent device returns and after being parsed, are not that parsing result is directly transmitted to conclusion section
Point 22, but association pretreatment is first executed in memory, output concludes the data of the first logical layer of intra-node, is worth explanation
Be that association pre-processed results here are not necessarily complete data, for example, may the association that exports of mapping node 21 pre-process knot
Fruit, which is only merged by the target journaling data of first object Client Agent device and the second destination client agent apparatus, to be generated
Data, do not include third destination client agent apparatus, the 4th destination client agent apparatus and the 5th destination client generation
Manage the target journaling data that device returns.The association pre-processed results of mapping node 21 converge to the same conclusion node 22, return
The association pre-processed results of all mapping nodes 21 are imported its internal pedigree chart and calculate stream by node 22 of receiving.
It include that multilayer logic handles node concluding inside node 22, calculating process is very similar to the range time of tree
Going through algorithm, (breadth first traversal is with layer for sequence, just to next layer after all nodes on a certain layer have all been searched
Search).
It further illustrates, distributed physical node is become to the logical process section in single machine memory due to the present embodiment
Point can be solved the problems, such as further " the complete degree of daily record data ", be made a concrete analysis of as follows:
If the breadth traversal calculating process of tree is single-threaded serial, that triggering calculates the item for flowing to lower continuation
Part is exactly the implementation progress of this single thread, and each logical processing nodes do not need to have the ability for independently detecting complete degree to drive
Dynamic downstream calculates.Even if being made into multithreading, all data subset states all in local memory rather than it is distributed be scattered in it is each
On platform machine, the derivation of " data subset metadata " also can be easily assisted very much;It need not be done at all in fact in actual scene
At multithreading, a server may have 4 CPU, and it may be responsible for the conclusion node of 100 log hosts simultaneously, often
A 22 single thread of conclusion node execution can also make full use of CPU, will not waste performance, greatly simplify system design instead.
This way continues to have ensured the associated ability of calculating and performance advantage first, and secondly it can be with the realization " log of very comfortable
The complete degree of data ".
Further, it is contemplated that log host, destination client agent apparatus, mapping node or conclusion in practical applications
Node is likely to break down, and the present embodiment also provides a kind of method for solving failure.
The case where breaking down for log host or target customer's degree agent apparatus: it is acted on behalf of and is filled due to destination client
It is stateless for setting, i.e., without caching any status information in its memory, the location index of daily record data is persistent storage,
Daily record data is also persistent storage, so the case where breaking down for destination client agent apparatus, it only need to be in failure
Continue to start index delta building after restarting, continue to provide log query service.
For mapping node, the thread of execution journal data acquisition session can be specifically created, and the thread is put
Enter thread pool, is acted on behalf of with being performed to send inquiry request to destination client agent apparatus and receive destination client in thread
The target journaling data that device returns.Based on this, when in destination client, agent apparatus breaks down, such as mapping node is referring to
The target journaling data of destination client agent apparatus return are not received in fixing time, or receive destination client agency
The daily record data that device returns obtains failed message, above-mentioned thread can be reentered into thread according to preset delay parameter
Pond, until the thread receives the target journaling data of destination client agent apparatus return.Wherein, delay parameter is used for
Determine for above-mentioned thread to be reentered into time of thread pool, between the time that actually span is performed from the thread last
Every if above-mentioned thread is reentered into thread pool too early, it is possible to which the thread is performed again can not still get target
Daily record data;If above-mentioned thread is just reentered into thread pool too late, the timeliness for obtaining target journaling data will be reduced.
It is lost to avoid association to pre-process subtask failure bring as far as possible, delay parameter can be set to the adjustable value of dynamic,
Under normal circumstances 5s can be set by delay parameter.
The case where for mapping node and concluding nodes break down: since mapping node and conclusion node are stateful
, i.e., its memory can be stored with the status information that daily record data treatment process is related to, these status informations are just after failure is restarted
It can lose.Due in the embodiment of the present application mapping node and conclude node break down probability it is lower, so It is not necessary to pair
Daily record data process flow does redundancy backup, and a kind of technical solution for solving failure is provided below in the present embodiment, to drop as far as possible
Low failure solves consumed cost.
The present embodiment uses the thinking of distributed task scheduling control, and daily record data processing system provides a task management section
Point, the task management node are located in independent equipment, can be responsible for mapping node and conclude node progress troubleshooting.
Specifically, mapping node is in advance to task management when current log data processing task is assigned to mapping node
The metadata that Node registry association pretreatment subtask and association pretreatment subtask are related to.Based on this, task management node
The metadata that can be related to according to association pretreatment subtask determines mapping node, and whether successful execution association pretreatment is appointed
Business, and when determining that mapping node executes association pretreatment subtask not successfully, pre-process what subtask was related to according to the association
Metadata sends first to mapping node and retries instruction, to indicate that mapping node re-executes association pretreatment subtask.Association
The metadata that pretreatment subtask is related to includes but is not limited to: the log host or client that association pretreatment subtask is related to
The information (such as identification information, location information, ability information etc.) of agent apparatus registers the mapping of association pretreatment subtask
The information (such as identification information, location information, ability information etc.) of node, and association pretreatment subtask the execution time,
Execute duration etc..
For example, mapping node can report implementing result to task management node after executing association pretreatment subtask, this
Sample task management node can know whether successful execution association pre-processes subtask to mapping node.In addition, task management node
On a duration thresholding can also be set, when some mapping node be more than the duration thresholding after do not return yet task action result letter
Breath, task management node think that the mapping node executes association pretreatment subtask not successfully.
Specifically, current log data processing task is assigned to when concluding node, node is concluded in advance to task management
The metadata that Node registry association process subtask and the association process subtask are related to.Based on this, task management node can be with
According to association process subtask determine conclude node whether the successful execution association process subtask, and determine conclude node not
When successful execution association process subtask, according to the metadata that the association process subtask is related to, to mapping node and section is concluded
Point sends second retry instruction respectively, with indicate mapping node and conclude node re-execute respectively be associated with pre-process subtask with
Association process subtask.The metadata that association process subtask is related to includes but is not limited to: what association process subtask was related to reflects
The information (such as identification information, location information, ability information etc.) for penetrating node, registers the conclusion node of the association process subtask
Information (such as identification information, location information, ability information etc.) and association process subtask the execution time, execute when
It is long etc..
In conclusion method provided by the embodiments of the present application properly settles the complete degree problem of daily record data, one
Performance is caused to access effective guarantee, and also without therefore losing real-time or increasing additional cost.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, related actions and modules not necessarily the application
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
Fig. 3 is the structural schematic diagram for the daily record data processing system that one embodiment of the application provides.As shown in figure 3, this is
System includes: mapping node 31 and concludes node 32, concludes node 32 and connect with mapping node 31.
Mapping node 31 when pre-processing subtask for the association in execution journal data processing task, is saved to mapping
The corresponding destination client agent apparatus of point 31 sends inquiry request, receives destination client agent apparatus and is returned according to inquiry request
The target journaling data returned, and in the target day for receiving the corresponding all destination client agent apparatus returns of mapping node 31
When will data, pretreatment is associated to all target journaling data received, and association pre-processed results are sent to and are returned
Receive node 32.
Wherein, destination client agent apparatus is target journaling needed for being deployed in generation current log data processing task
Client Agent device on the log host of data, inquiry request include: journal file mark and time segment identification, target
Daily record data is that journal file identifies the day generated within the period that time segment identification is identified in identified journal file
Will data;
Node 32 is concluded, for when executing the association process subtask in current log data processing task, reception to be reflected
The association pre-processed results of the transmission of node 31 are penetrated, and are sent out receiving all mapping nodes 31 for executing association pretreatment subtask
The association pre-processed results sent are associated processing to all association pre-processed results received, and export association process knot
Fruit.
In an optional embodiment, mapping node 31 is also used to:
Record returns to the identification information of the destination client agent apparatus of target journaling data, and by the return target of record
The identification information of the identification information of the destination client agent apparatus of daily record data and preconfigured Client Agent device into
Row compares;
If identification information and the preconfigured visitor of the destination client agent apparatus of the return target journaling data of record
The identification information of family end agent apparatus is identical, determines and receives the corresponding all destination client agency dresses of mapping node 31
Set the target journaling data of return.
It is also used to correspondingly, concluding node 32:
Record sends the identification information of the mapping node 31 of association pre-processed results, and the transmission of record is associated with and is pre-processed
As a result the identification information of mapping node 31 is compared with the identification information of preconfigured mapping node 31;
If the identification information and preconfigured mapping node of the mapping node 31 of the transmission association pre-processed results of record
31 identification information is identical, determines and receives what all mapping nodes 31 for executing current log data processing task were sent
It is associated with pre-processed results.
In an optional embodiment, mapping node 31 is specifically used for: the thread of creation execution journal data acquisition session,
And thread is put into thread pool, to be performed in thread to destination client agent apparatus transmission inquiry request and receive target
The target journaling data that Client Agent device returns.
Based on above-mentioned, mapping node 31 is also used to: not receiving the return of destination client agent apparatus within a specified time
Target journaling data, or receive destination client agent apparatus return daily record data obtain failed message, according to pre-
If delay parameter, thread is reentered into thread pool, until thread receive destination client agent apparatus return target
Until daily record data.
In an optional embodiment, mapping node 31 is at least one, and concluding node 32 is one, at this time daily record data
The structural schematic diagram of processing system is as shown in Figure 2.
In an optional embodiment, as shown in figure 3, daily record data processing system further include: task management node 33.
Mapping node 31 is also used to: being registered association pretreatment subtask to task management node 33 and is appointed with pretreatment is associated with
The metadata that business is related to;
Task management node 33, for according to the metadata that is related to of association pretreatment subtask determine mapping node whether at
Function executes association pretreatment subtask, and when determining that mapping node 31 executes association pretreatment subtask not successfully, according to pass
The metadata that connection pretreatment subtask is related to, sends first to mapping node 31 and retries instruction, to indicate mapping node 31 again
Execute association pretreatment subtask.
Further, it concludes node 32 to be also used to: registering association process subtask and association process to task management node 33
The metadata that subtask is related to;
Task management node 33 is also used to: the metadata being related to according to association process subtask determine conclude node whether at
Function executes association process subtask, and when determining that concluding node 32 executes association process subtask not successfully, at association
The metadata that reason subtask is related to sends second and retries instruction, respectively to indicate to map to mapping node 31 and conclusion node 32
Node 31 re-executes respectively with conclusion node 32 and is associated with pretreatment subtask and association process subtask.
Further, task management node 33 can also be used in, and carry out task layout and distribution.Specifically, task management node
33 can also be used in:
Determine that the number of log host is associated with the number of the mapping node of pretreatment subtask with execution;
It is each reflected according to the number of log host with the number for being associated with the mapping node of pretreatment subtask, determination is executed
Penetrate the corresponding destination client agent apparatus of node, and by the mark of the corresponding destination client agent apparatus of each mapping node
Information is supplied to each mapping node.
Daily record data processing system provided in this embodiment, wherein executing the mapping node of current log data processing task
Be deployed in generate current log data processing task needed for target journaling data log host on Client Agent
Device matches, and can accurately obtain the daily record data generated in designated time period so that daily record data processing system can and
When daily record data is handled, realize real-time processing to daily record data;In addition, Client Agent device allows mapping node
The daily record data generated in designated time period is obtained, to guarantee that daily record data handles daily record data whole here day of required by task
Will data processing system lays the foundation, and further inside daily record data processing system, mapping node is only existed by judgement
When daily record data from the corresponding all clients agent apparatus of mapping node is all here, just execution association pre-processes and will
Association pre-processed results are sent to conclusion node, conclude node by judgement, are only coming from all execution journal data processings
After the association pre-processed results of the mapping node of task are all here, processing just is associated to the relevant pre-processed results of institute,
And final output association process is as a result, since the processing of each step can guarantee to hold in the case where required data are all here
Row, therefore the reliability of processing result can be improved.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the application
The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various
It can store the medium of program code.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.