Embodiment
Graphic operation
Realize that the basis is a graphic operation for one of the present invention.Graphic operation is the operation of on the execution graph of the operation conditions of reaction profile formula system, carrying out.
Defined execution graph example is as shown in Figure 1 in the present invention, in execution graph, with the incident in the vertex representation distributed system, uses the association between the limit presentation of events that connects the summit.Two keys in the defined execution graph of the present invention, have been comprised: causality and polymerization.So the present invention proposes two kinds of graphic operations on the basis of execution graph: burst (slicing) and polymerization (aggregation).
In the diagnosis to distributed system, the user can attempt to carry out following operation: 1) concern according to causalnexus and seek relevant information; 2) sum up operation information with the suitable polymerization or the conclusion of graduation form, further analyse in depth with the trouble-shooting point.
In an execution graph, be represented as a summit (vertex) from each run case of the distributed mark system of order.Such as, in Fig. 1, incident (being the summit) uses small rectangle to represent.Such as, first thread 102 that runs on first machine has comprised incident b 104, incident c 106, and incident b 104 is wherein used by pintf daily record (printf log), and incident c 106 is defined by asynchronous request.Second thread 110 that runs on second machine has comprised incident a 112, incident d 114, and wherein incident d 114 is defined by asynchronous request.The 3rd thread 120 that runs on the 3rd machine has comprised incident e 122, and incident e 122 is used by request.
Run case in the distributed system is to be mutually related.In execution graph, point to limit (directive limit) and be used to indicate this association.Dissimilar limits can be used to define dissimilar associations.As described above, in execution graph, incident is by fixed-point representation.For example; Use limit (use edge) connection one source event (source event) and a terminal point incident (destination event) and have the direction of pointing to the terminal point incident from source event; The definition of this source event or the object that passes on, the destination of this object are terminal point incidents and by this terminal point incident consumption.The example of the object of this type is internet message or cross-thread request.In Fig. 1, arrow < c, the e>130 and the arrow < d, e>132 of band direction are to use the example on limit.
Limit (sync edge) expression synchronously from the different event of two different threads synchronously, to guarantee unique visit or the operation of sequential cross-thread to shared object.
Order limit (fall-through) is connected two continuous incidents in the same thread, and the direction on order limit is that the incident of before execution is pointed to the incident that carry out the back.Limit in Fig. 1 <b, c>140 is exactly the order limit.
Fragmentation
The burst of the definition among the present invention (slicing) operation not only uses special label (tag) to mark operation information.Fragmentation utilizes execution graph: at first selected root summit, a corresponding root incident (root event), this root incident possibly be the relevant or interested especially points of user of phenomenon direct and fault, and the root incident is selected by the user.(the root event) is sought the incident that has the cause and effect dependence with this incident from selected root incident.The process of seeking realizes that through directive Fragmentation Fragmentation comprises forward direction Fragmentation (forward slice) and reverse Fragmentation (backward slice).The forward direction Fragmentation will be sought all direct incidents that perhaps exist with ... the root incident indirectly, and reverse Fragmentation will be sought the directly perhaps indirect interdependent incident of this root incident.So-called interdependent, be meant to have causality between the incident.
In reality realized, searching accurate and complete cause and effect dependence was very to expend cost and be difficult to realize.Therefore, in practical application, reasonably approximate just enough.In a kind of idea, consider that all use limits and order limit just can be considered to a kind of reasonably being similar to the cause and effect dependence.But discover that the cause and effect dependence can not be implied in the order limit in fact.For example, in a kind of typical case of message handling system realized, a thread can be accepted the message of new arrival continuously and call corresponding Message Processing person.Also be to connect between the calling of two Message Processing persons, but in fact, do not have any significant cause and effect dependence between these two Message Processing persons by the order limit.For this situation, similarly the order limit can roll up invalid operation, makes the burst of cause and effect dependence become inefficient.Therefore, in the present invention, proposed the notion of cause and effect scope (causal scope), the cause and effect scope is based on the notion on " use limit ".The cause and effect scope is meant: for the use limit of an appointment, all incidents in the cause and effect scope all are the source events (source event) with this use limit, and promptly there is dependence in the summit, source.More exactly; This uses the terminal point incident (destination event) on limit; Be destination vertex and with related all incidents of this terminal point incident, promptly begin, afterwards all incidents associated therewith from this terminal point incident; No matter be through using the limit or connect, all will be considered to be within the cause and effect scope and have causality with the source event on this use limit through the order limit.Such as, in Fig. 1, big rectangle 150,152 and 154 has been represented the example of the cause and effect scope among the present invention.
The cause and effect scope that all is based on forward direction Fragmentation among the present invention and reverse Fragmentation realizes, such as, in Fig. 1, use shadow part to assign to represent an example of the forward direction Fragmentation that carries out as the root incident with incident a 112.
The interface prototype of Fragmentation is following:
Graph<TV,TE>Slicing(
this?Vertex<TV,TE> srcVertex,
Slice.Type type);
Converging operation
Polymerization is effective ways of effectively managing a large amount of letter datas, especially adopts graduate polymerization.In distributed system, just have hierarchy naturally: program is made up of module usually, and module is made up of class, and each type comprises one group of function, the hierarchy of Here it is nature.At first polymerization on the thread level of the execution of distributed system, polymerization on the process level afterwards, and then polymerization on the machine level.A distributed system generally includes a plurality of logical levels; These logical levels can be specified according to concrete should being used for: can on the RPC level, analyze such as system action, perhaps in the OS of low level system, use jack interface (socket interface) to analyze.
One content is relevant with incident, representes the polymerized unit (aggregation unit) that this incident is affiliated.Can define the polymerized unit of multilayer.Such as, one deck of polymerized unit can be a static structure, such as module, class and function.Another layer of polymerized unit can be an operating structure, such as machine, process and thread.Utilize above-mentioned content and polymerized unit, just can carry out converging operation, can effectively reduce the data volume and the amount of information that need processing after the polymerization, improve the diagnosis efficiency of distributed system incident.
The crucial idea of converging operation of the present invention is on one of hierarchical structured system suitable level, to carry out the gathering of execution graph, so that system action is summarized.Continuous fragment (segment) or the several incidents that have the incident of identical polymerized unit in the execution graph are summed up and are gathered in the single incident (summit) in the execution graph (also becoming the figure as a result of the execution graph of current level) of a high level.Meeting of the present invention automatically with code position and running time the position label append to automatically in all incidents, with this foundation as base polymer operation.Converging operation is the operation of the personalization of an opening, and the user can define the parameter of polymerization and the degree of polymerization voluntarily.That is to say, the condition of converging operation, promptly polymerization need be carried out in which incident (summit) and limit, and the number of times of converging operation, promptly carries out converging operation several times, is aggregated to which kind of degree and all can be set up on their own as required by the user.
Fig. 2 has disclosed an example of converging operation, the debugging (debugging replication) of the corresponding reproducer of this converging operation, the lateral shaft express time time among Fig. 2.Wherein the latter half of Fig. 2 is represented the converging operation on the process level; Rectangle wherein is illustrated in the incident after the polymerization on the process level; Namelist in the rectangle is shown on the process level title of incident after the polymerization, the quantity of the incident in the next level that is comprised in the incident after this polymerization of the numeral in the rectangle.Such as, in network process Network 202, comprise incident Message::DoExecution204, wherein comprised 12 (thread-level) incidents.In duplicating process Replication 210, comprise incident ReplicateWrite 212, comprise 149 (thread-level) incidents and incident WriteRequestFailed 214, comprise 24 (thread-level) incidents.In I I/O220, comprise incident SerializedOIWrite 222, comprise 17 (thread-level) incidents.The first half of Fig. 2 is represented the title of incident after the polymerization on the machine level, the quantity of the incident in the next level that is comprised in the incident after this polymerization of the numeral in the rectangle.Such as, in No. 0 machine Machine0 230, comprise incident Primary 232, wherein comprised 440 (process level) incidents.In No. 1 machine Machine1 240, comprise incident Secondary1 242, comprised 144 (process level) incidents.In No. 2 machine Machine2 250, comprise incident Secondary2 252, comprised 202 (process level) incidents.When copy error took place, through converging operation, the machine that makes a mistake was No. 2 machines; Corresponding incident Secondary2 250; At this moment, can be further the incident Secondary2 250 of No. 2 machines be decomposed, search wrong source in an enterprising step of process level.
The interface prototype of converging operation is following:
Label function wherein can automatically or add the label as the polymerization foundation according to customer requirements in incident.
Realize the method for graphic operation
Based on above-mentioned graphic operation, the present invention proposes a kind of method that realizes graphic operation.
With reference to shown in Figure 3, Fig. 3 has disclosed the flow chart according to the method for the realization graphic operation of one embodiment of the invention.This method 300 comprises:
S302. define the execution graph of distributed system, the incident in the execution graph in the vertex representation distributed system, the association between the limit presentation of events on connection summit.Introduce as top, limit in the execution graph comprises uses limit and order limit.Use the limit to point to a destination vertex from summit, a source, this definition of summit, source or the object that passes on, the destination of this object is this destination vertex and by this destination vertex consumption.The order limit connects two continuous incidents in the same thread, and the incident of before execution is pointed to the incident that carry out the back.
S304. Fragmentation, a selected summit from execution graph begins, search all summits that exist with ... this summit and this summit interdependent summit.In one embodiment, described mode above Fragmentation adopts at first selects a summit in the execution graph as the root summit, the corresponding root incident in this root summit.Confirming the direction of Fragmentation then, is that forward direction Fragmentation or back are to Fragmentation.The forward direction Fragmentation is searched all summits that exist with ... the root summit, then to Fragmentation search the root summit interdependent all summits.In one embodiment, this Fragmentation is according to the trip of cause and effect scope, as above-mentioned definition, use the corresponding cause and effect scope in limit, and all summits in this cause and effect scope all are the summits, source that exists with ... this use limit.
S306. converging operation; Several summits that have dependence in the execution graph and the limit that connects these summits are gathered into condensate; Generate figure as a result by execution graph than the high one deck of execution graph, the vertex representation condensate among the figure as a result, the association between the condensate is represented on the limit among the figure as a result; Converging operation can repeat repeatedly, generates the figure as a result of higher level each time.Like what introduced in the above-mentioned converging operation, condensate comprises the mark of the quantity on the summit of embodying low one deck that this condensate comprised.
Realize the device of graphic operation
The present invention also proposes a kind of device of realizing graphic operation, it will be appreciated that, the device of this realization graphic operation is realized by software.Fig. 4 has disclosed the structure chart according to the device of the realization graphic operation of one embodiment of the invention.This device is realized method shown in Figure 3, and this device 400 comprises: execution graph definition device 402, slicing apparatus 404 and polyplant 406.
The execution graph of execution graph definition device 402 definition distributed systems, the incident in the execution graph in the vertex representation distributed system, the association between the limit presentation of events on connection summit.Introduce as top, limit in the execution graph comprises uses limit and order limit.Use the limit to point to a destination vertex from summit, a source, this definition of summit, source or the object that passes on, the destination of this object is this destination vertex and by this destination vertex consumption.The order limit connects two continuous incidents in the same thread, and the incident of before execution is pointed to the incident that carry out the back.
Slicing apparatus 404 is carried out Fragmentation, and a selected summit from execution graph begins, search all summits that exist with ... this summit and this summit interdependent summit.Slicing apparatus 404 is connected to execution graph definition device 402.In one embodiment, described mode above Fragmentation adopts at first selects a summit in the execution graph as the root summit, the corresponding root incident in this root summit.Confirming the direction of Fragmentation then, is that forward direction Fragmentation or back are to Fragmentation.The forward direction Fragmentation is searched all summits that exist with ... the root summit, then to Fragmentation search the root summit interdependent all summits.In one embodiment, this Fragmentation is according to the trip of cause and effect scope, as above-mentioned definition, use the corresponding cause and effect scope in limit, and all summits in this cause and effect scope all are the summits, source that exists with ... this use limit.
Polyplant 406 is connected to slicing apparatus 404 and execution graph definition device 402; Polyplant 406 is carried out converging operation; According to the result of Fragmentation, several summits that have dependence in the execution graph and the limit that connects these summits are gathered into condensate, generate figure as a result by execution graph than the high one deck of execution graph; Vertex representation condensate among the figure as a result; The association between the condensate is represented on limit among the figure as a result, and converging operation can repeat repeatedly, generates the figure as a result of higher level each time.Like what introduced in the above-mentioned converging operation, condensate comprises the mark of the quantity on the summit of embodying low one deck that this condensate comprised.
The diagnosis of distributed system
By means of above-mentioned graphic operation, the operation of binding data storehouse just can realize the diagnosis for distributed system.Database manipulation wherein is mainly used in and positions, and such as " where " in database manipulation operation, can be used for root incident, perhaps fault location point or interested point in the Fragmentation location.
The code of the diagnostic method of the integrated database manipulation of one section example and the distributed system of graphic operation is following:
Where operation table wherein shows the positioning action in the database manipulation, the Fragmentation in the Slicing operation table diagrammatic sketch operation wherein.Where is operating as the Slicing operation and confirms root fixed point (root incident).
Error analysis in the following coded representation distributed system, integrated database manipulation, Fragmentation and converging operation:
CPA in the following coded representation distributed system, integrated database manipulation, Fragmentation and converging operation:
The diagnostic method of distributed system
The present invention also proposes the diagnostic method of the distributed system of a kind of integrated graphic operation and database manipulation, and with reference to shown in Figure 5, this method 500 comprises:
S502. define the execution graph of distributed system, the incident in the execution graph in the vertex representation distributed system, the association between the limit presentation of events on connection summit.Introduce as top, limit in the execution graph comprises uses limit and order limit.Use the limit to point to a destination vertex from summit, a source, this definition of summit, source or the object that passes on, the destination of this object is this destination vertex and by this destination vertex consumption.The order limit connects two continuous incidents in the same thread, and the incident of before execution is pointed to the incident that carry out the back.
S504. Fragmentation, Fragmentation at first utilizes an incident in the positioning distributed system of database manipulation, confirms that according to this incident a selected summit in the execution graph begins, search all summits that exist with ... this summit and this summit interdependent summit.In one embodiment, Fragmentation at first utilizes database manipulation, and for example an incident in the Where operation selection distributed system confirms that in execution graph corresponding summit is as the root summit based on this root incident as the root incident.Confirming the direction of Fragmentation then, is that forward direction Fragmentation or back are to Fragmentation.The forward direction Fragmentation is searched all summits that exist with ... the root summit, then to Fragmentation search the root summit interdependent all summits.In one embodiment, this Fragmentation is according to the trip of cause and effect scope, as above-mentioned definition, use the corresponding cause and effect scope in limit, and all summits in this cause and effect scope all are the summits, source that exists with ... this use limit.
S506. converging operation; According to the result of Fragmentation, several summits that have dependence in the execution graph and the limit that connects these summits are gathered into condensate, generate figure as a result by execution graph than the high one deck of execution graph; Vertex representation condensate among the figure as a result; The association between the condensate is represented on limit among the figure as a result, and converging operation can repeat repeatedly, generates the figure as a result of higher level each time.Like what introduced in the above-mentioned converging operation, condensate comprises the mark of the quantity on the summit of embodying low one deck that this condensate comprised.
The diagnostic device of distributed system
The present invention also proposes a kind of diagnostic device of distributed system, it will be appreciated that, the diagnostic device of this distributed system is realized by software.Fig. 6 has disclosed the structure chart according to the diagnostic device of the distributed system of one embodiment of the invention.This device is realized method shown in Figure 5, and this device 600 comprises: execution graph definition device 602, operation set apparatus for converting 608, slicing apparatus 604 and polyplant 606.
The execution graph of execution graph definition device 602 definition distributed systems, the incident in the execution graph in the vertex representation distributed system, the association between the limit presentation of events on connection summit.Introduce as top, limit in the execution graph comprises uses limit and order limit.Use the limit to point to a destination vertex from summit, a source, this definition of summit, source or the object that passes on, the destination of this object is this destination vertex and by this destination vertex consumption.The order limit connects two continuous incidents in the same thread, and the incident of before execution is pointed to the incident that carry out the back.
Operation set apparatus for converting 608 integrated graphic operation and database manipulations.
Slicing apparatus 604 is carried out Fragmentation, and slicing apparatus 604 is connected to execution graph definition device 602 and operation set apparatus for converting 608.Fragmentation at first utilizes an incident in the positioning distributed system of database manipulation, confirms that according to this incident a selected summit in the execution graph begins, search all summits that exist with ... this summit and this summit interdependent summit.In one embodiment, Fragmentation at first utilizes database manipulation, and for example an incident in the Where operation selection distributed system confirms that in execution graph corresponding summit is as the root summit based on this root incident as the root incident.Confirming the direction of Fragmentation then, is that forward direction Fragmentation or back are to Fragmentation.The forward direction Fragmentation is searched all summits that exist with ... the root summit, then to Fragmentation search the root summit interdependent all summits.In one embodiment, this Fragmentation is according to the trip of cause and effect scope, as above-mentioned definition, use the corresponding cause and effect scope in limit, and all summits in this cause and effect scope all are the summits, source that exists with ... this use limit.
Polyplant 606 is carried out converging operation, and polyplant 606 is connected to execution graph definition device 602, operation set apparatus for converting 608 and slicing apparatus 604.The converging operation of polyplant 606 is according to the result of Fragmentation; Several summits that have dependence in the execution graph and the limit that connects these summits are gathered into condensate; Generate figure as a result by execution graph than the high one deck of execution graph, the vertex representation condensate among the figure as a result, the association between the condensate is represented on the limit among the figure as a result; Converging operation can repeat repeatedly, generates the figure as a result of higher level each time.Like what introduced in the above-mentioned converging operation, condensate comprises the mark of the quantity on the summit of embodying low one deck that this condensate comprised.
The graphic operation that the present invention proposes and utilize the diagnostic techniques of the distributed system of this graphic operation can seek the causality between the run case in the distributed system; Based on this causality incident is carried out burst and polymerization, need the information processed amount and make that the association between the information is clearer and more definite with minimizing.The present invention can effectively promote the diagnosis efficiency of the accuracy rate of diagnosis of distributed system.
The foregoing description provides to being familiar with personnel in this area and realizes or use of the present invention; Being familiar with those skilled in the art can make various modifications or variation to the foregoing description and not break away from invention thought of the present invention; Thereby protection scope of the present invention do not limit by the foregoing description, and should be the maximum magnitude that meets the inventive features that claims mention.