CN101510213B

CN101510213B - Large scale publishing and subscribing pipelined matching method based on noumenon

Info

Publication number: CN101510213B
Application number: CN2009100971391A
Authority: CN
Inventors: 胡昔祥
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Dianzi University; Hangzhou Electronic Science and Technology University
Priority date: 2009-03-23
Filing date: 2009-03-23
Publication date: 2010-07-21
Anticipated expiration: 2029-03-23
Also published as: CN101510213A

Abstract

The invention relates to a matching method for body-based large-scale publish-subscribe pipelines, aiming at solving the problems in the existing matching method which fails to meet the performance requirements of large-scale publish-subscribe middleware systems. The method comprises the following steps of: first establishing an RDF event graph model and an RDF subscribe graph mode; taking each arc in the RDF event graph model and the RDF subscribe graph mode as a basic semantic matching unit to establish a subscribe sentence mode index; and dividing the matching process of the basic semantic units of the RDF event graph model and the RDF subscribe graph mode into six active processes of the pipeline to form matching pipelines. The six active processes comprises reading in of typed sentences; constraining and matching of types; constraining and matching of predication; node mapping; status checking; and matched results outputting. The matching method for the body-based large-scale publish-subscribe pipelines improves the matching efficiency of the body-based large-scale publish-subscribe middleware systems, and the performance thereof is not considerably affected by the subscribed number of the systems. Simultaneously, the matching method also eliminates unnecessary and redundant matches between different subscribe graph modes.

Description

Large scale publishing and subscribing pipelined matching method based on body

Technical field

The invention belongs to field of computer technology, relate to a kind of large scale publishing and subscribing pipelined matching method based on body.This method is introduced the distribution subscription middleware system with ontology and parallel computing, to improve the coupling accuracy and the time efficiency of large scale issuance subscription middleware system.

Background technology

The distribution subscription middleware system is fit to the demand of the extensive distribution of information in the Internet, mobile computing, the loose communication of grid computing distributed heterogeneous platform very much, has a wide range of applications.Traditional distribution subscription middleware system have based on theme, content-based, based on forms such as XML, their great majority depend on specific event type and simple matching mechanisms, as: the predicate comparison of keyword matching, property value, XPath tree schema coupling etc.And can binding events/subscription ontology model based on the distribution subscription middleware system of body, the semantic matches of incident/subscription is provided, thereby greatly improves matched accuracy, also make the user can express it more easily simultaneously and subscribe to interest.In the distribution subscription middleware system based on body, RDF expresses semantic basis.For the semantic information of expression incident/subscription, adopt RDF figure to come presentation of events usually, be called the RDF occurrence diagram.Express the user and subscribe to condition with being based upon chart-pattern on the RDF figure, be called RDF and subscribe to chart-pattern.Thereby, be exactly in fact a kind of RDF chart-pattern matching process based on the matching process of the distribution subscription middleware system of body.Especially, in the large scale issuance subscription middleware system, exist the RDF that has the predicate constraint in a large number and subscribe to chart-pattern.How efficiently, carry out coupling that RDF subscribes to chart-pattern apace and become the main challenge that the large scale issuance subscription middleware system based on body faces.

At present, more existing RDF chart-pattern matching process, as: people such as Wang Jinling propose the method based on expansion metastement (extended meta-statement) array and matching status tree.This method is safeguarded an independently matching status tree for each RDF subscribes to chart-pattern, and matching process need repeat to travel through all state nodes of matching status tree, and calculates and generate new state node.According to the end-state of every matching status tree, judge which RDF subscribes to chart-pattern and is matched to merit at last.The shortcoming of this matching process is can subscribe to the quantity increase and sharp increase along with system match time.In addition, people such as Milenko propose a kind of matching process based on overall RDF chart-pattern, and this method is subscribed to the RDF subscription chart-pattern that chart-pattern is merged into an overall situation with all RDF in the system.Obviously this method only is fit to subscribe to variable number condition of limited in negligible amounts or the subscription, because when each RDF subscription chart-pattern includes than multivariate, because the diversity and the otherness of variable sign and constraint condition, it is very difficult and time-consuming merging these RDF subscription chart-patterns.In general, existing matching process can't satisfy the performance requirement of large scale issuance subscription middleware system far away.Therefore, need a kind of system that more efficiently, fast, is not subjected to of development to subscribe to the matching process quantity appreciable impact, that be fit to the large scale issuance subscription middleware system.

Summary of the invention

Purpose of the present invention is exactly at the deficiencies in the prior art, and a kind of large scale publishing and subscribing pipelined matching method based on body that efficiently, fast, is not subjected to system to subscribe to the quantity appreciable impact is provided.

The concrete steps of the inventive method are:

Step (1) is set up the ontology model of incident/subscription: adopt Resource DescriptionFramework, it is RDF ontology describing language, event table is shown as the RDF occurrence diagram, subscription table is shown as RDF subscription chart-pattern, subscribe in the chart-pattern at RDF occurrence diagram and RDF, each node all has the type identification of unique constant or variable sign and node institute categorical conception class, and every arc all has the attribute-bit of its categorical conception class of constant sign expression.

Step (2) pre-service incident/subscription: decompose RDF occurrence diagram and RDF and subscribe to chart-pattern, subscribe in the chart-pattern every arc as semantic matches unit substantially, specifically with RDF occurrence diagram and RDF:

1. the RDF occurrence diagram is resolved into the set of belt type statement, the five-tuple that the belt type statement is made up of the attribute-bit of two leaf constants sign, node type identification and the arc of corresponding arc, and with in belt type subquery cache to a buffer queue;

2. RDF is subscribed to chart-pattern and resolve into the set of subscribing to statement pattern, subscribe to hexa-atomic group that statement pattern is made up of the attribute-bit of predicate constraint condition expression formula, node type identification and the arc of the two leaf variablees sign of corresponding arc, the variable that is tied, promptly subscribe to statement pattern and comprised that type between node retrains and the predicate constraint condition expression formula of the variable that is tied;

Step (3) is set up and subscribed to the statement pattern index: all are subscribed to statement pattern be organized into three layer index storage organizations, wherein: ground floor is set up index to the arc label knowledge of subscribing to statement pattern; The second layer to the two leaf types of subscribing to statement pattern to setting up index; The 3rd layer of predicate constraint condition expression formula to the subscription statement pattern set up index;

Step (4) is set up the coupling streamline: the matching process of RDF occurrence diagram and RDF subscription chart-pattern is resolved into following six streamline active procedures, and each activity is responsible for processing by the clear and definite thread of the division of labor, and is collaborative mutually between the thread, thereby forms the coupling streamline.Detailed process is:

1. read in the belt type statement: from the buffering formation, read the belt type statement, it is passed to next treatment scheme;

2. type retrains coupling: according to the belt type statement of input, three layer index storage organizations of query subscription statement pattern filter out all subscription statement patterns that the type constraint is mated, and it is passed to next treatment scheme;

3. the predicate constraint is mated: the node constant with the belt type statement is replaced the variable of subscribing to statement pattern, and carries out predicate constraint condition expression formula, finds out predicate constraint condition expression formula result of calculation for really subscribing to statement pattern, and it is passed to next treatment scheme;

4. node mapping: according to the subscription statement pattern of input, generate from its variable nodes to the mapping of the constant node of the belt type statement of its coupling; Specifically: if subscribe to the start node of statement pattern is the major node that RDF subscribes to chart-pattern, then can directly generate corresponding start node mapping and stop the node mapping; Otherwise have only when the start node map record has existed, just can generate corresponding termination node map record, it is passed to next treatment scheme;

5. status checking: write down and safeguard that RDF subscribes to the node mapping status of chart-pattern, subscribed in the chart-pattern all related when the node mapping by RDF and subscribe to statement patterns when sharing, claim this node to be mapped as state of saturation with node; And record RDF subscribes to the node mapping set that has obtained state of saturation in the chart-pattern;

6. export matching result: when all there was the node mapping of a conflict free state of saturation in each node in the RDF subscription chart-pattern, this RDF of decidable subscribed to chart-pattern and is matched to merit, and exported the RDF that is matched to merit in the mode of increment and subscribe to chart-pattern.

The inventive method utilization coupling streamline has improved the matching efficiency based on the large scale issuance subscription middleware system of body, and its performance is not subjected to system to subscribe to the quantity appreciable impact, has eliminated redundancy coupling unnecessary between the different subscription chart-patterns simultaneously.In addition, the inventive method support is exported matching result in the mode of increment.Comprehensive, the inventive method is fit to the semantic matches and the performance requirement of large scale issuance subscription middleware system.

Description of drawings

Fig. 1 subscribes to the synoptic diagram of chart-pattern for the RDF of the present invention's one specific embodiment;

Fig. 2 is the synoptic diagram by three layer index storage organizations of Fig. 1 conversion;

Fig. 3 reads in belt type statement process flowchart for coupling in the streamline;

Fig. 4 is the type constraint matching treatment process flow diagram flow chart in the coupling streamline;

Fig. 5 is the predicate constraint matching treatment process flow diagram flow chart in the coupling streamline;

Fig. 6 is the node mapping process flowchart in the coupling streamline;

Fig. 7 is the status checking process flowchart in the coupling streamline;

Fig. 8 is the output matching result process flowchart in the coupling streamline.

Embodiment

A kind of large scale publishing and subscribing pipelined matching method based on body may further comprise the steps:

Step (1) is set up the ontology model of incident/subscription: adopt Resource DescriptionFramework, i.e. RDF ontology describing language is shown as the RDF occurrence diagram with incident/subscription table or form that RDF subscribes to chart-pattern, specifically:

1. RDF occurrence diagram: (form Object) is expressed objective fact to the RDF language for Subject, property, and each tlv triple is called a RDF statement statement with tlv triple.Wherein, subject (Subject) is that the URI that is described resource quotes, and predicate (property) is that the URI of certain attribute quotes, and object (Object) is the value of this attribute, can be that URI quotes or text.If represent subject and object with node, represent predicate with directed arc, then one or more RDF statements can be expressed as an oriented signature, are called RDF figure.In the methods of the invention, each incident is all represented with the form of RDF figure, and each node all has unique node constant sign and a type identification of representing its affiliated Ontological concept class among the figure, is called the RDF occurrence diagram.

2. RDF subscribes to chart-pattern: be on the basis of RDF occurrence diagram, describe the constraint condition that each node need satisfy.RDF subscribes to the form of chart-pattern, as shown in Figure 1, each node all has the type identification of unique variable sign and its affiliated Ontological concept class of expression among the figure, and the predicate constraint condition expression formula of the variable that is tied, variable name is prefix with *, separates with ": " between type identification and the variable sign.

1. for the RDF occurrence diagram,,, convert every arc in the RDF occurrence diagram and two leafs thereof to corresponding basic semantic matches unit, just following belt type statement according to every arc in the order traversal RDF occurrence diagram of breadth First from its major node:

t(Subject，property，Object)∧ts(SubjectClass，property，ObjectClass)

Wherein Subject is the start node constant sign of arc in the RDF occurrence diagram, Object is the termination node constant sign of arc in the RDF occurrence diagram, property is the constant sign of arc in the RDF occurrence diagram, SubjectClass is the type identification of Ontological concept class under the Subject node, and ObjectClass is the type identification of Ontological concept class under the Object node.T (Subject, property, Object) a simple R DF statement of expression statement; Ts (SubjectClass, property, ObjectClass) type of expression constraint, i.e. restriction relation between the Ontological concept type under two nodes;

The belt type subquery cache that the RDF occurrence diagram decompose is obtained is represented the formation of belt type statement cache with tsQueue below in buffer queue;

2. subscribe to chart-pattern for RDF, equally from its major node, RDF subscribes to every arc in the chart-pattern according to the order of breadth First traversal, RDF is subscribed in the chart-pattern every arc and two leafs convert corresponding basic semantic matches unit to, just following subscription statement pattern:

Subject’，Object’：t(Subject’，property’，Object’)∧ts(SubjectClass’，property’，ObjectClass’)∧θ(Object’)

Wherein Subject ' is the start node variable sign that RDF subscribes to arc in the chart-pattern, Object ' is the termination node variable sign that RDF subscribes to arc in the chart-pattern, property ' is the constant sign that RDF subscribes to arc in the chart-pattern, SubjectClass ' is the type identification of Ontological concept class under the Subject ' node, and ObjectClass ' is the type identification of Ontological concept class under the Object ' node.T (Subject ', property ', Object ') a simple R DF statement of expression statement; Type constraint of ts (SubjectClass ', property ', ObjectClass ') expression, i.e. restriction relation between the Ontological concept type under two nodes; θ (Object ') be the predicate constraint condition expression formula of node variable Object ';

Step (3) is set up and is subscribed to the statement pattern index: all are subscribed to statement pattern be organized into three layer index storage organizations, represent with GM, as shown in Figure 2; The ground floor index uses the Hash table structure, knows as the Hash key assignments with the arc label of subscribing to statement pattern, and corresponding hash table points to second layer index.Second layer index also uses the Hash table structure, and with the start node type of subscribing to statement pattern with stop the node type as the Hash key assignments, corresponding hash table points to the 3rd layer index.The 3rd layer of employing list structure stored the predicate constraint condition expression formula of subscribing to statement pattern, comprises the sign that its affiliated RDF subscribes to chart-pattern.

Step (4) is set up the coupling streamline, and RDF occurrence diagram and the matching process that RDF subscribes to chart-pattern are resolved into following six streamline active procedures, and each activity is responsible for processing by the clear and definite thread of the division of labor, and is collaborative mutually between the thread, mates streamline thereby form.Specifically:

1. read in the belt type statement: from buffering formation tsQueue, read the belt type statement of RDF occurrence diagram, represent, it is passed to next treatment scheme with TS; The detailed process flow process as shown in Figure 3;

2. the type constraint is mated: according to the belt type statement TS of input, know property and two leaf type identification (SubjectClass with the arc label among the belt type statement TS respectively, ObjectClass) be the index key assignments, search for three layer index storage organization GM, filter out the subscription statement pattern that the type constraint is complementary, represent with PS, it is passed to next treatment scheme; The detailed process flow process as shown in Figure 4;

3. the predicate constraint is mated: for the subscription statement pattern PS and the belt type statement TS of input, with the node constant TS.object of belt type statement replace subscribe to statement pattern to dependent variable PS.object ', calculate predicate constraint condition expression formula PS. θ (TS.object), filter out predicate constraint condition result of calculation for really subscribing to statement pattern, it is passed to next treatment scheme; The detailed process flow process as shown in Figure 5;

4. node mapping:, generate mapping from the PS variable nodes to TS constant node according to the subscription statement pattern PS and the belt type statement TS of input; Specifically: if subscribe to statement pattern PS.Subject ' node is the major node that RDF subscribes to chart-pattern, then can directly generate corresponding node mapping (P S.Subject ' ← TS.Subject) and (PS.Object ' ← TS.Object); Otherwise have only when the node map record (P S.Subject ' ← when TS.Subject) having existed, just can generate the node map record (PS.Object ' ← TS.Object); It is passed to next treatment scheme; The detailed process flow process as shown in Figure 6;

5. mapping status inspection: adopt Multidimensional numerical to safeguard that RDF subscribes to the node mapping status of chart-pattern, Multidimensional numerical structure shape such as VertexMap[PS.ID] [PS.Subject '] [TS.Subject], wherein array one dimension subscript PS.ID represents the numbering that the affiliated RDF of node subscribes to chart-pattern, the variable nodes of statement pattern is subscribed in array two dimension subscript PS.Subject ' expression, the constant node of the three-dimensional subscript TS.Subject ' expression of array belt type statement, array entry VertexMap[PS.ID] [PS.Subject '] [TS.Subject] expression the match is successful and share this node mapping (PS.Subject ' ← TS.Subject) subscription statement pattern set;

When share the node mapping (PS.Subject ' ← TS.Subject) coupling subscribes to the statement pattern number when equaling RDF and subscribing to the associated arc number of this node PS.Subject ' in the chart-pattern, then claims this node to be mapped as saturated mapping; The detailed process flow process as shown in Figure 7.

6. export matching result: subscribe to the matching status of chart-pattern for the ease of judging each RDF, adopt array MatchedVertex[PS.ID] write down each RDF and subscribe to the node set that chart-pattern has obtained saturated mapping.All there is a conflict free saturated mapping in each node in the chart-pattern when RDF subscribes to, be MatchedVertex[PS.ID]=Vertex[PS.ID] time, Vertex[PS.ID wherein] expression RDF subscribes to the node set of chart-pattern, then this RDF of decidable subscribes to chart-pattern PS.ID and is matched to merit, and exports the RDF that is matched to merit in the mode of increment and subscribe to chart-pattern PS.ID; The detailed process flow process as shown in Figure 8.

Claims

1. based on the large scale publishing and subscribing pipelined matching method of body, it is characterized in that the concrete steps of this method are:

Step (1) is set up the ontology model of incident/subscription: adopt Resource DescriptionFramework, it is RDF ontology describing language, event table is shown as the RDF occurrence diagram, subscription table is shown as RDF subscription chart-pattern, subscribe in the chart-pattern at RDF occurrence diagram and RDF, each node all has the type identification of unique constant or variable sign and node institute categorical conception class, and every arc all has the attribute-bit of its categorical conception class of constant sign expression;

Step (3) is set up and is subscribed to the statement pattern index: all are subscribed to statement pattern be organized into three layer index storage organizations, wherein: the ground floor index uses the Hash table structure, know as the Hash key assignments with the arc label of subscribing to statement pattern, corresponding hash table points to second layer index; Second layer index uses the Hash table structure, and with the start node type of subscribing to statement pattern with stop the node type as the Hash key assignments, corresponding hash table points to the 3rd layer index; The 3rd layer of employing list structure stored the subscription statement pattern with same type constraint, and the numbering that the affiliated RDF of statement pattern subscribes to chart-pattern is subscribed in storage;

Step (4) is set up the coupling streamline: the matching process of RDF occurrence diagram and RDF subscription chart-pattern is resolved into following six streamline active procedures, and each activity is responsible for processing by the clear and definite thread of the division of labor, and is collaborative mutually between the thread, thereby forms the coupling streamline; Detailed process is:

2. the type constraint is mated: according to the belt type statement of input, be the index key assignments with knowledge of the arc label in the belt type statement and two leaf type identifications respectively, search for three layer index storage organizations, filter out the subscription statement pattern that the type constraint is complementary, it is passed to next treatment scheme;

3. the predicate constraint is mated: for the subscription statement pattern and the belt type statement of input, with the node constant of belt type statement replace subscribe to statement pattern to dependent variable, calculate predicate constraint condition expression formula, filter out predicate constraint condition result of calculation for really subscribing to statement pattern, it is passed to next treatment scheme;

4. node mapping: according to the subscription statement pattern of input, generate from its variable nodes to the mapping of the constant node of the belt type statement of its coupling; Specifically: if subscribe to the start node of statement pattern is the major node that RDF subscribes to chart-pattern, then can directly generate corresponding start node mapping and stop the node mapping; If subscribing to the statement pattern node is not the major node that RDF subscribes to chart-pattern, have only when the start node map record has existed, just generate corresponding termination node map record, it is passed to next treatment scheme;

5. status checking: adopt Multidimensional numerical to write down and safeguard that RDF subscribes to the node mapping status of chart-pattern, when all subscription statement patterns related with node are shared in node shines upon by RDF subscription chart-pattern, this node is mapped as state of saturation, and record RDF subscribes to the node mapping set that has obtained state of saturation in the chart-pattern;

6. export matching result: when all there is the node mapping of a conflict free state of saturation in each node in the RDF subscription chart-pattern, judges that this RDF subscribes to chart-pattern and is matched to merit, and export the RDF that is matched to merit in the mode of increment and subscribe to chart-pattern.