CN102819569B

CN102819569B - Matching method for data in distributed interactive simulation system

Info

Publication number: CN102819569B
Application number: CN201210250067.1A
Authority: CN
Inventors: 王海波; 屈树谦; 吕品; 郑昌文
Original assignee: Institute of Software of CAS
Current assignee: Institute of Software of CAS
Priority date: 2012-07-18
Filing date: 2012-07-18
Publication date: 2015-01-07
Anticipated expiration: 2032-07-18
Also published as: CN102819569A

Abstract

The invention provides a matching method for data in a distributed interactive simulation system. In the method, after simulation nodes receive newly-released subscription conditions, the subscription conditions are inserted into a multi-index structure in accordance with predicates of the subscription conditions, and the subscription conditions are inserted into mode relationship sets in accordance with modes of the subscription conditions. After the simulation nodes receive newly-released simulation data, according to each attribute in the simulation data, all unsuccessfully-matched predicates related to each attribute are searched in the multi-index structure, and subscription conditions belonging to the predicates are set to unsuccessful matching; and all mode sets identical with or covered by simulation data modes are searched in the mode relationship sets, the subscription conditions related to the sets are traversed in sequence, and all matched subscription conditions are obtained finally.

Description

Data matching method in a kind of System for Distributed Interactive Simulation

Technical field

The invention belongs to emulated data distribution technology field, be specifically related to a kind of method that can find all subscription matched with given emulated data efficiently.

Background technology

Emulated data distribution technology is the gordian technique building System for Distributed Interactive Simulation.Communication in primary responsibility distributed emulation process between simulation node.

Along with the expansion of simulation scale and the raising to simulation performance requirement, the requirement of System for Distributed Interactive Simulation to extensibility, fault-tolerance, offered load and real-time is more and more higher.The comparatively simple Data filtration strategy of the many employings of traditional communication pattern, poor filtration effect and the degree of coupling is high, and publish/subscribe pattern adopts the Data filtration strategy based on interest, effectively can alleviate offered load, strengthen real-time, and there is the feature of loose coupling and multi-point, the participant of its communication is full decoupled on space, time and control flow check, be widely applied in Distributed Interactive Simulation System field, such as High Level Architecture (HLA), the test enable architecture of training (TENA) etc.In addition, the DDS standard of OMG also uses the publish/subscribe communication pattern based on theme.

In publish/subscribe traffic model, the producer of information is called publisher, and the consumer of information is called subscriber, and information mutual between publisher and subscriber is called event.Event is sent to event agent by publisher, and subscriber sends subscription condition to event agent, represents interested in which event, and event agent then ensures event in time, to be reliably distributed to all to its interested subscriber.

Emulated data matching problem in System for Distributed Interactive Simulation is exactly the emulated data issued for a publisher, finds all subscription conditions matched with it as far as possible efficiently.The target of emulated data matching algorithm design mainly comprises: the efficiency that the time efficiency of coupling, the space efficiency of coupling and subscription are safeguarded.For large-scale System for Distributed Interactive Simulation, also exist thousands of in system, subscribe to and event for even up to a million, now the performance of matching algorithm will become extremely important, not only affect the real-time of system, and can restrict the expansion of whole system.

Traditional event matches algorithm is exactly travel through all subscription for the new event issued, and mates successively.In addition, the existing event matches algorithm based on MAP is generally divided into two stages: pretreatment stage and matching stage.Pretreatment stage mainly carries out the organization and administration of subscribing to conditioned disjunction predicate, and matching stage is responsible for event and the predicate organized or is subscribed to set of circumstances and mate.According to the organization and administration whether distinguishing predicate and the condition of subscription, current matching algorithm is mainly divided into two classes:

Wherein a class algorithm does not distinguish predicate and the organization and administration of the condition of subscription, adopts single data structure storage predicate and subscription condition.The data structure that this type of algorithm adopts mainly contains tree structure, figure and subscription covering set of relations etc.(the K.J.Gough and G.Smith.Efficient recognition ofevents in distributed systems.Proceedings ofACSC-18 of the algorithm based on search tree of Gough etc. is wherein had based on the representative algorithm of tree structure, 1995.) and (M.K.Aguilera of the algorithm based on Parallel search tree of Aguilera etc., R.E.Strom, D.C.Sturman, M.Astley, and T.D.Chandra.Matching events in a content-based subscription system.Eighteenth ACM Symposium on Principles of Distributed Computing (PODC ' 99), 1999.), algorithm based on figure have Campailla etc. based on binary decision diagram (BDD, binary decisions diagrams) algorithm (Campailla A, Chaki S, Clarke E, et al.Efficient Filtering in Publish/Subscribe Systems Using Binary Decision Diagrams.The 23rd International Conference on Software Engineering, 2001.) and Li G etc. use the logical relation between historical information and predicate of subscribing to improve after BDD algorithm (Li G, Huo S, Jacobsen H.A Unified Approach to Routing, Covering and Merging in Publish/Subscribe Systems Based on Modified Binary Decision Diagrams.The 25th International Conference on Distributed Computing Systems, 2005.6.), subscription condition is organized as paritially ordered set (Partially Ordered SET by the Carzaniga that has based on subscription covering set of relations, POSET) matching algorithm (the Carzaniga A after, Rosenblum D S, WolfA L.Design and Evaluation of a Wide-area Event Notification Service.ACM Transaction on Computer Systems, 2001,19 (3): 332-383.).

Equations of The Second Kind then distinguishes the organization and administration of predicate and the condition of subscription, adopts different data structures to store predicate and subscription condition respectively.Current research is in this respect more, wherein that representative is predicate counting algorithm (YAN T W, GARC H.Index structures for selective dissemination of information under the Boolean model.ACM Trans Database System, 1994,19 (2): 332-334.).On predicate counting algorithm basis, Hanson algorithm by most optionally predicate be made into interval search tree (HANSON E according to its set of properties, CHAABOUNI M, KIM C.A predicate matching algorithm for database rule systems.International Conference of the ACM SIGMOD.1990:271-280.); Fabret etc. are according to being called that the public predicate of entrance predicate organizes subscription condition bunch (FABRET F, A JACOBSEN, LIBAT F.Filtering algorithms and implementation for very fast publish/subscribe systems.International Conference of the ACM SIGMOD.2001.); Carzaniga etc. adopt index structure storage administration predicate (Carzaniga A, Wolf A L.Forwarding in a content based network.Proc of ACM SIGCOMM 2003.2003:163-174.); And Pan also etc. utilize one be called " predicate relational tree " structure organization predicate (Pan also, Zhang Kailong, Pan Jingui. based on publish/subscribe mechanism and the algorithm research of predicate formula soverlay technique. Journal of Computer Research and Development, 2011,48 (5): 765-777.).

In addition, people was had to carry out accelerated events matching process based on hardware performance in recent years.Fabret proposes to adopt hardware cache technology to accelerate event matches; Farroukh proposes to utilize multiprocessor machine performance to adopt coupling (the Amer Farroukh of three kinds of mode executed in parallel events, Elias Ferzli, Naweed Tajuddin, and Hans-Arno Jacobsen.Parallel Event Processing for Content-Based Publish/Subscribe Systems.Proceeding of the Third ACM International Conference on Distributed Event-Based System, 2009.); K.H.Tsoi etc. then directly adopt configurable high performance platform to carry out mating (K.H.Tsoi, I.Papagiannis, M.Migliavacca, W.Luk and P.Pietzuch.Accelerating publish/subscribe matching on reconfigurable supercomputing platform.Many-Core and Reconfigurable Supercomputing Conference, 2010.); And Alessandro Margara proposes event matches algorithm (the Alessandro Margara based on GPU, Gianpaolo Cugola.High Performance Content-Based Matching Using GPUs.Proceeding of the5th ACM International Conference on Distributed Event-Based System, 2011:183-194.).

Above-mentioned algorithm is exact matching algorithm, and has people's pairing approximation matching algorithm to be studied in recent years.Liu etc. develop the prototype system A-ToPSS that is supported approximate match, set up the event model and subscribe model of supporting fuzzy expression, and use fuzzy set theory and theory of probability to process the approximate match (Liu H, Jacobsen HA.A-ToPSS:A publish/subscribe system supporting imperfect information processing.Proc.of the 30th Int ' l Conf.on Very Large Databases.2004.) of subscription and event.

In System for Distributed Interactive Simulation, the data communicated between simulation node mainly comprise two class data: entity information data and entity interaction data.Data type is determined before emulation starts, can not the new data type of dynamic creation in simulation process.Meanwhile, simulation node is comparatively stable in simulation process, and the interpolation of subscription and deletion are also not too frequent, but the generation of emulated data has necessarily sudden, requires higher to emulated data matching algorithm time efficiency.In simulation process, the coupling of system to interaction data is fairly simple, carries out mating, and need to the coupling of entity information data type and the attribute of considering entity simultaneously according to data type.The subscription that the present invention only relates to entity information data with mate.

Consider that the attribute number span of entity information data mutual in Distributed Interactive Simulation System is comparatively large, from tens to up to a hundred, some process at present in existing algorithm are not too applicable.Space complexity based on the algorithm of search tree is higher, when particularly attribute number is more, system will can't bear the heavy load, and show poor when not waiting predicate test based on the algorithm of Parallel search tree, higher based on the structure complexity of BDD in the algorithm of BDD, Carzaniga algorithm subscribe between coverage lower time matching efficiency low, and predicate counting algorithm does not utilize the correlativity between predicate, and there is the matching operation of bulk redundancy.In addition, simulation node is not considered to use high-performance calculation platform, not too applicable based on hardware-accelerated method yet.

Therefore, analytically state on the basis of algorithm, simultaneously in conjunction with the feature of emulated data, can organize subscription and predicate efficiently by special data structure, this point is huge in the potentiality of the time efficiency and space efficiency that improve emulated data coupling.

Summary of the invention

In order to overcome time efficiency and the space efficiency problem of emulated data coupling, the present invention is in conjunction with emulated data feature, propose the data matching method in a kind of System for Distributed Interactive Simulation, consider the covering relation between pattern belonging to correlativity between predicate and subscription condition, under the prerequisite ensureing coupling correctness, the Time and place of emulated data coupling is obviously better than classical predicate counting algorithm, makes all subscription matched by matched and searched and emulated data become a practical technique.

Technical scheme of the present invention is as follows:

A data matching method in System for Distributed Interactive Simulation, comprises the following steps:

One, after receiving the new subscription condition issued, first subscription condition is inserted subscription condition indexed set.Travel through each predicate in subscription condition afterwards, insert many index structures successively.Finally by the pattern intercalation model set of relations of the condition of subscription.

Two, after receiving the new emulated data issued, initialization the match is successful result set and unmatched predicate set be empty, and to arrange subscription conditional bit vector be 0, and namely giving tacit consent to all, the match is successful.

Three, for each attribute in data, search the unsuccessful all predicates of the coupling relevant to this attribute in many index structures, all positions corresponding to subscription collection arranging the association of these predicates afterwards in bit vector are 1, namely mate unsuccessful.

Four, concentrate in modes relationships and search all or all set of modes of being entirely covered identical with this emulated data pattern, travel through the subscription condition that all patterns are relevant successively, if relevant position is 0, then means that the match is successful, otherwise mate unsuccessful.All subscription conditions matched can be obtained thus.

The above-mentioned emulated data matching process covered based on pattern, has subscription store structure new as follows and coupling thinking:

A) utilize many index structure tissues and store predicate, and propose modes relationships collection come storage administration subscribe to condition;

B) utilize meaning word mismatch that the unmatched characteristic of subscription condition belonging to it can be caused to mate.

Above-mentioned subscription store structure and coupling thinking are that inventor passes through the research and design of hardships out, will describe above subscription store structure and coupling thinking in detail below.

First the related definition related in the present invention is provided:

Each emulated data or subscription condition must belong to a specific entity information type, i.e. theme.On this basis, emulated data is by multiple < attribute, type, value > tlv triple represents, predicate and < attribute, type, operational character, value > four-tuple, and the condition of subscribing to is expressed as the conjunction expression of multiple predicate, i.e. predicate 1^ predicate 2^ ...In addition, pattern is defined as < attribute 1, attribute 2 ... many first group of >, wherein each emulated data or subscription condition belong to and only belong to some patterns.If comprise attributes all in Mode B in Mode A, then say that A covers B.

Attribute information has five types: enumeration type, Boolean type, integer type, floating point type and character string type etc.Operational character in predicate comprises "=" (equaling), " unequal to " (being not equal to), " > " (being greater than), " >=" (being more than or equal to), " < " (being less than), " <=" (being less than or equal to), " [" (prefix), "] " (suffix) and " ∝ " (substring) etc., wherein " [", "] " and " ∝ " be only present in string operation, and only there are "=" and " unequal to " two kinds of operational characters for enumeration type, only there is "=" operational character in Boolean type.

Secondly describe the key data structure used in the present invention, see Fig. 1:

The present invention has used subscription condition indexed set, has subscribed to conditional bit vector, many index structures and modes relationships collection, makes introductions all round below to it:

Subscription condition indexed set: this indexed set is that each subscription condition distributes unique numbering, sets up the one-to-one relationship between numbering and subscription condition.This method is by the subscription condition of this numbering quick indexing to correspondence.

Subscribe to conditional bit vector: each of this bit vector stores the matching result of corresponding numbering subscription condition.If the condition of subscription is mated unsuccessful with emulated data, then arranging this position is 1, otherwise will be set to 0.

Many index structures: this many index structure stores each entity type and all predicates corresponding to theme.This structure is organized according to theme, attribute type, operational character and attribute-name, and first order index is the theme; Second level index is attribute type, comprises Boolean type, enumeration type, integer, floating type and character string type; Third level index is operational character, comprises above-mentioned nine kinds of operational characters; The fourth stage is Property Name; Multiple values that the predicate that afterbody storage class, operational character and attribute are all identical is corresponding, wherein the numerical value Adelson-Velskii-Landis tree of enumeration type, integer and floating point type stores, in character string, prefix operation uses prefix trees to store, postfix operation stores with suffix tree, and the same Adelson-Velskii-Landis tree of other operations stores; Node corresponding to this value also associates the set comprising the subscription condition of this predicate simultaneously.In addition, in view of the value of every one-level index is all a fixing scope, the form that Hash all can be adopted to show stores.

Modes relationships collection: modes relationships collection stores all patterns relevant to the condition of subscription, and each pattern is associated with a subscription set of circumstances meeting this pattern.Set of relations is organized according to the covering relation between pattern.Fig. 2 shows the example of a modes relationships set.

Consider that in simulation process, data type remains unchanged, modes relationships hubbed mode can adopt bit vector to represent, in bit vector, each corresponds to an attribute, if a certain position is " 1 ", then represents to there is corresponding attribute, otherwise does not exist.Therefore the covering relation between pattern judges to adopt position or computing to carry out, and for Mode A and Mode B, if " A or B " result is A, then illustrates that A covers B; If result is B, then illustrate that B covers A; Otherwise illustrate there is no covering relation between A and B.If such as Mode A is 11001, Mode B is 10001, A|B and 11001|10001=11001 is that A then illustrates that A covers B; If Mode A is 11001, Mode B is that 01011, A|B and 11001|01011=11011 then illustrates do not have covering relation between A and B.

In using forestland set of relations matching process, if it is identical with emulated data pattern and there is not covering relation to run into node mode, then without the need to traveling through the child node of this node, thus can the matching process of aero mode collection.

Finally, the present invention adopts a kind of coupling thinking newly, namely utilizes meaning word mismatch that the unmatched characteristic of subscription condition belonging to it can be caused to mate.By subscription conditional definition, subscription condition is the conjunction between multiple predicate, so a predicate matching is unsuccessful in subscription condition, can cause the unsuccessful of whole subscribing matching.

When matching process initialization, the position that subscription conditions all in bit vector is relevant is set to 0, namely giving tacit consent to all, the match is successful, and the first stage of mating, search the unsuccessful predicate of all couplings relevant to emulated data attribute, and the subscription condition setting associated by predicate is 1, namely unsuccessful.Now Problems existing is exactly the matching result that shows of the subscription that has nothing to do of those attributes and emulated data attribute is still successfully, if at this moment directly carry out traveling through the matching result that will lead to errors to subscribing to collection.So in the subordinate phase of coupling, the present invention only tests the subscription meeting condition below: its pattern or simulated data pattern identical with emulated data pattern covers, corresponding positions be 0 subscription condition then the match is successful.Therefore this coupling thinking is while guarantee coupling correctness, also improves the time efficiency of coupling.

Advantage of the present invention and good effect:

The advantage of the present invention in performance is mainly reflected in (1) from the unsuccessful predicate of coupling, and a unsuccessful subscription Condition Matching that can cause of predicate matching is unsuccessful, thus eliminates predicate judgement unnecessary when subscribing to condition test; (2) adopt many index structures to accelerate the Search and Orientation of predicate, value stores structures such as adopting Adelson-Velskii-Landis tree, prefix suffix tree or Hash table, accelerates the matching process of predicate further; (3) adopt modes relationships collection to eliminate irrelevant coupling of subscribing to, be only concerned about identical with emulated data pattern or have the subscription condition of covering relation; (4) adopt many index structures unified management predicate, if there is same predicate in multiple subscription condition, predicate only can be stored once, adopts bit vector to store the matching result of subscription condition simultaneously, further reduces memory consumption of the present invention.

Accompanying drawing explanation

Fig. 1 is the key data structure used in the present invention;

Fig. 2 is the example of the modes relationships collection that the present invention relates to, and arrow represents covering relation;

To be method of the present invention subscribe to comparing of time performance when condition number changes with adopting the method for predicate counting algorithm to fix in emulated data attribute number variation range to Fig. 3;

To be method of the present invention subscribe to condition number and change comparing of time space performance with adopting the method for predicate counting algorithm to fix in emulated data attribute number variation range Fig. 4;

Fig. 5 is method of the present invention and the comparing of time performance when adopting the method for predicate counting algorithm emulated data attribute number variation range changes the condition of subscription number is fixed.

Embodiment

The first step, after receiving the new subscription condition issued, first inserts subscription condition indexed set by subscription condition.Travel through each predicate in subscription condition afterwards, insert many index structures successively.Finally by the pattern intercalation model set of relations of the condition of subscription.

Second step, after receiving the new emulated data issued, initialization the match is successful result set and unmatched predicate set be empty, and to arrange subscription conditional bit vector be 0, and namely giving tacit consent to all, the match is successful.

3rd step, for each attribute in data, searches the unsuccessful all predicates of the coupling relevant to this attribute in many index structures, and all positions corresponding to subscription collection arranging the association of these predicates afterwards in bit vector are 1, namely mate unsuccessful.

4th step, concentrates in modes relationships and searches all or all set of modes of being entirely covered identical with this emulated data pattern, travel through the subscription condition that all patterns are relevant successively, if relevant position is 0, then means that the match is successful, otherwise mate unsuccessful.All subscription conditions matched can be obtained thus.

The data structure specific design wherein related to is as follows:

Subscription condition indexed set: this indexed set adopts the mode of index to store the one-to-one relationship of numbering between subscription condition.

Modes relationships collection: modes relationships collection stores all patterns relevant to the condition of subscription, and each pattern is associated with a subscription set of circumstances meeting this pattern.Set of relations is organized according to the covering relation between pattern.

The pattern that modes relationships is concentrated can adopt bit vector to represent, in bit vector, each corresponds to an attribute, if a certain position is " 1 ", then represents to there is corresponding attribute, otherwise does not exist.Therefore the covering relation between pattern judges to adopt position or computing to carry out, and for Mode A and Mode B, if " A or B " result is A, then illustrates that A covers B; If result is B, then illustrate that B covers A; Otherwise illustrate there is no covering relation between A and B.

The present invention is mainly used in the Data dissemination in Distributed Interactive Simulation System field, is specifically related to a kind of method that can find all subscription matched with given emulated data efficiently.Citing is below described its embodiment.

For certain System for Distributed Interactive Simulation, emulate the scene of air-ground operation, some of them simulation node simulated aircraft model, some nodes simulation ground forces model.For the simulation node of certain simulation ground forces model, need to subscribe to and meet the data of such condition: be highly greater than 5000Km and speed is greater than the aircraft of 400km/h, distribution subscription condition is then needed to be <type, string,=, Flight>^<height, integer, >, 5000>^<velocity, integer, >, 400>; And the simulation node of certain simulated aircraft model is issued highly for 8000Km speed is the airplane data of 400Km/h, then can represent that the emulated data of issue is <type, string, Flight>^<height, integer, 8000>^<velocity, integer, 400>.After simulation node issues emulated data to agent node, agent node needs the problem solved is exactly how to find all subscription conditions of mating with this airplane data efficiently, and obviously for above-mentioned airplane data, above-mentioned subscription condition is unmatched.Agent node calls method in the present invention can head it off, and its concrete implementation step is as follows:

The first step, after receiving the new subscription condition issued, first inserts subscription condition indexed set by subscription condition.Travel through each predicate in subscription condition afterwards, insert many index structures successively.Finally by the pattern intercalation model set of relations of the condition of subscription.Be <type for subscription condition in upper example, string,=, Flight>^<height, integer, >, 5000>^<velocity, integer, >, 400>, predicate is wherein respectively <type, string,=, Flight>, <height, integer, >, 5000> and <velocity, integer, >, 400>, the pattern of subscription condition is <type, height, velocity >.

The time and space performance of inventor's contrived experiment to the emulated data matching process covered based on pattern and predicate counting algorithm compares.Experiment has Intel(R) Core (TM) 2Duo2.93GHz processor, 2GB internal memory, Windows XP SP2 operating system PC on carry out.The acquisition of each data point in experimental result obtains after repetitive measurement is averaged.With reference to the feature of emulated data, there is 20 kinds of object types and theme in supposing the system, there are 10 ~ 100 attributes in each theme, the Property Name of each theme and number are fixed simultaneously, and each attribute all can adopt enumerated value to represent.In the type of attribute, integer type and floating point type account for 50% altogether, and enumeration type and Boolean type account for 20% altogether, and character string type accounts for 30%.The probability that each attribute appears in subscription condition is equal, and the predicate number that each subscription condition comprises is 10 ~ 20.Wherein various types of numerical value is all random selecting in the sample space of the respective type of 10000 in size.In addition, ensure that test the Data Matching success ratio selected is 50% at every turn.

Carry out two groups of experiments according to above-mentioned parameter situation, experiment one record is when the number variation range that emulated data comprises attribute is fixed, and along with the increase of the condition of subscription quantity, emulated data mates the Time and place situation of change consumed; Experiment two record when subscription condition quantity is fixed, along with the expansion of number variation range comprising attribute in emulated data, the time variations situation of emulated data coupling consumption.

In experiment one, setting the number that each emulated data comprises attribute is 10 ~ 100, and the subscription condition quantity of insertion rises to 10000 from 1000, adopts a sample every 1000, the Time and place that record matching 1000 emulated datas consume, experimental result is shown in Fig. 3 and Fig. 4.

In experiment two, the number of setting subscription condition is 1000, and emulated data comprises the number variation range of attribute from [10,20] expand to gradually [10,100], often expand 10 and adopt a sample, the time that record matching 1000 emulated datas consume, experimental result is shown in Fig. 5.

From the experimental result of experiment one, the present invention subscribes to condition due to store and management predicate efficiently and availability model relational organization, spatiotemporal efficiency apparently higher than predicate counting algorithm, particularly when subscription condition number is more.When including 10000 in system and subscribing to, the present invention is lower by 83.4% on time overhead than predicate counting algorithm, low by 33.92% in space expense; And from the experimental result of experiment two, the number variation range comprising attribute along with emulated data expands gradually, the time that emulated data consumes is mated in the present invention to be increased comparatively mild, and the time variations that predicate counting algorithm consumes is comparatively large, has and significantly increases.

To sum up, for comprising multiple theme, and the emulated data field that in theme, attribute number span is larger, the present invention is more applicable.

Claims

1. the data matching method in System for Distributed Interactive Simulation, is characterized in that comprising the following steps:

(1), after receiving the new subscription condition issued, first subscription condition is inserted subscription condition indexed set; Travel through each predicate in subscription condition afterwards, insert many index structures successively; Finally by the pattern intercalation model set of relations of the condition of subscription;

(2) after receiving the new emulated data issued, initialization the match is successful result set and unmatched predicate set be empty, and to arrange subscription conditional bit vector be 0, and namely giving tacit consent to all, the match is successful;

(3) for each attribute in emulated data, search the unsuccessful all predicates of the coupling relevant to this attribute in many index structures, all positions corresponding to subscription collection arranging the association of these predicates afterwards in bit vector are 1, namely mate unsuccessful;

(4) concentrate in modes relationships and search all or all set of modes of being entirely covered identical with this emulated data pattern, travel through the subscription condition that all patterns are relevant successively, if relevant position is 0, then means that the match is successful, otherwise mate unsuccessful; Obtain all subscription conditions matched thus;

The many index structures adopted in described step (1), it carrys out organization and administration predicate according to theme, attribute type, operational character and attribute-name, specific as follows:

First order index is the theme;

Second level index is attribute type, comprises Boolean type, enumeration type, integer, floating type and character string type;

Third level index is operational character, comprise equal "=", be not equal to " unequal to ", be greater than " > ", be more than or equal to " >=", be less than " < ", be less than or equal to " <=", prefix " [", suffix "] " and substring " ∝ " nine kinds of operational characters;

The fourth stage is Property Name, multiple values that the predicate that afterbody storage class, operational character and attribute are all identical is corresponding, wherein the numerical value Adelson-Velskii-Landis tree of enumeration type, integer and floating point type stores, and in character string, prefix operation uses prefix trees to store, and postfix operation stores with suffix tree, other comprise and equaling, be greater than, be more than or equal to, be less than, be less than or equal to, the same Adelson-Velskii-Landis tree of substring operation stores; Node corresponding to each value also associates the set comprising the subscription condition of this predicate simultaneously; Every one-level index adopts the form of Hash table to store;

The coupling thinking that it adopts utilizes meaning word mismatch that the unmatched characteristic of subscription condition belonging to it can be caused to mate, and is specially:

A (), when matching initial, is set to 0 position relevant for subscription conditions all in bit vector, namely give tacit consent to all that the match is successful;

B (), in the coupling first stage, searches the unsuccessful predicate of all couplings relevant to emulated data attribute, and the subscription condition setting associated by predicate is 1, namely unsuccessful;

C subordinate phase that () mates, test meets the subscription of condition below: its pattern or simulated data pattern identical with emulated data pattern covers, corresponding positions be 0 subscription condition then the match is successful.

2. data matching method as claimed in claim 1, is characterized in that, the modes relationships collection adopted in described step (1), and it is organized according to the covering relation between pattern, specific as follows:

Modes relationships hubbed mode adopts bit vector to represent, in bit vector, each corresponds to an attribute, if a certain position is " 1 ", then represents to there is corresponding attribute, otherwise does not exist; Covering relation between pattern judges to adopt operation of bits to carry out, and for Mode A and Mode B, if " A and B " result is A, then illustrates that A covers B; If result is B, then illustrate that B covers A; Otherwise illustrate there is no covering relation between A and B.