CN113722332B

CN113722332B - Method and system for improving efficiency and robustness of matching algorithm based on data structure

Info

Publication number: CN113722332B
Application number: CN202111056560.5A
Authority: CN
Inventors: 钱诗友; 廖政宇; 曹健; 薛广涛
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2024-03-26
Anticipated expiration: 2041-09-09
Also published as: CN113722332A

Abstract

The invention provides a method and a system for improving the efficiency and the robustness of a matching algorithm based on a data structure, wherein the method comprises the following steps: indexing the subscription based on a matching algorithm by using a preset data structure; among the preset data structures, the preset data structure comprises two levels of index layers and a storage layer; the first-level index layer is based on mapping of attributes, and predicates with the same attributes are mapped into the same attribute units; the second-stage index layer is based on mapping of interval predicate widths, and predicates are mapped into different width units according to the interval predicate widths, so that interval predicates with the same width but different centers can be mapped into the same width units; the storage layer is used for storing subscription; the width units are divided in a uniform manner.

Description

Method and system for improving efficiency and robustness of matching algorithm based on data structure

Technical Field

The invention relates to a matching algorithm in an event distribution network, in particular to a method and a system for improving the efficiency and the robustness of the matching algorithm based on a data structure, and more particularly relates to a preset data structure of the matching algorithm supporting Gao Xiaolu rods in a publish/subscribe system.

Background

The publish/subscribe system initially appears as a news subsystem. It achieves complete decoupling of both parties in terms of time, space and synchronization. Because of its attractive nature, publish/subscribe systems are widely deployed in many areas, such as system monitoring and management, real-time stock updates, online gaming, online advertising, and social media messaging. In particular, content-based publish/subscribe systems allow subscribers to express their interest in events using boolean expressions, enabling fine-grained selective information distribution.

The matching algorithm is a key module of a large-scale publish/subscribe system. To improve matching performance, researchers have proposed many matching algorithms based on different data structures. However, the performance of the matching algorithm is affected by various factors, so that the performance and the robustness of the existing matching algorithm in a dynamic environment are poor.

Patent document CN110427217B (application number 201910672885.2) discloses a content-based publish-subscribe system matching algorithm lightweight parallel method and system, in which an index structure of a storage data structure is layered to form a plurality of levels, each level corresponds to a storage unit set of the storage data structure, the plurality of levels are grouped, and each level group simultaneously comprises a level and a storage unit set corresponding to the level; matching threads are set for each hierarchical group, matching events are independently distributed to a single matching thread for processing, and a plurality of matching threads update an indicator at the same time, and the indicator performs synchronous operation when updating. And the matching performance is improved, and the parallelism is dynamically adjusted according to the performance requirement, so that the rapid and reliable distribution of the events is ensured. The optimal parallelism is determined by using an iterative optimization method, so that task allocation of threads is improved, and time overhead is very efficient.

In order to cope with this problem, the present invention proposes a new data structure. The data structure can simultaneously support a plurality of matching algorithms, and realizes matching by using the optimal matching algorithm under different environments, thereby reducing the influence of dynamic environments on the matching performance and obtaining better matching performance and stability.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a method and a system for improving the efficiency and the robustness of a matching algorithm based on a data structure.

The method for improving the efficiency and the robustness of the matching algorithm based on the data structure provided by the invention comprises the following steps: indexing the subscription based on a matching algorithm by using a preset data structure;

among the preset data structures, the preset data structure comprises two levels of index layers and a storage layer;

the first-level index layer is based on mapping of attributes, and predicates with the same attributes are mapped into the same attribute units;

the second-stage index layer is based on mapping of interval predicate widths, and predicates are mapped into different width units according to the interval predicate widths, so that interval predicates with the same width but different centers can be mapped into the same width units;

the storage layer is used for storing subscription;

the width units are divided in a uniform manner.

Preferably, the storage layer adopts B+ trees for storage, two B+ trees are arranged for each width unit, the low value and the high value of the interval predicate are respectively corresponding to the two B+ trees, and the low value tree is provided with a link for the high value tree; the B+ tree can realize the self-balance of the tree and ensure the order of the inserted elements.

Preferably, in the matching process, two sets of markers and one set of recorders are also included; one group of markers uses a bit set to mark unmatched subscriptions, and the other group uses a counter to count the number of matched predicates in the subscriptions; the recorder is used for recording task partitions in hybrid matching.

Preferably, the matching algorithm comprises: forward matching AFM, reverse matching ABM, and hybrid matching AHM;

the forward matching AFM is used for checking all low-value trees in the data structure when matching by adopting a mode of counting matched predicates;

the reverse matching ABM is a mode of marking unmatched predicates;

the hybrid matching AHM combines forward matching and reverse matching methods, performs task division on width units, uses forward matching on width units used for indexing narrow interval predicates, and uses reverse matching on width units except narrow interval predicates.

Preferably, the forward matching employs:

step S1: adding one operation to a subscribed counter corresponding to the predicate indexed in the [ v', v ] space; wherein v represents an event value; in one width unit, assume that the width range of the interval predicate of the width unit index is [ w, w '], v' =v-w;

step S2: for predicates indexed in the [ v ', v' ] space, checking the high value of the predicate on the high value tree through pointers arranged on the B+ tree; when the high value of the predicate is greater than or equal to v, matching, and adding one to a counter containing the corresponding subscription of the current predicate; wherein v "=v-w'

Step S3: after all low value tree operations in the data structure are completed, the counter is checked, and when the value of the counter is the same as the number of predicates of the corresponding subscription, the current subscription is matched.

Preferably, the reverse matching employs:

step S4: marking all predicates with low values larger than the event value v on the low value tree;

step S5: marking all predicates with high values smaller than the event value v on the high value tree;

step S6: the taggant is checked and untagged subscriptions are matched.

Preferably, the hybrid matching employs: performing task division on the width units, using forward matching on the width units for indexing the predicates of the narrow intervals, and using reverse matching on other width units; recording the quantity of predicates divided into forward matching in each subscription through a recorder; after the forward matching and the reverse matching are completed, checking whether the values of the counter and the recorder corresponding to the untagged subscription are equal to each other or not for the untagged subscription in the reverse matching, and if so, matching the current subscription;

the narrow-interval predicate is a predicate with interval width smaller than a preset value;

the division points of the mixed matching to the width unit task allocation need to be as long as possible so that the forward matching and the reverse matching have the same matching time; assuming that the width of the dividing point is kappa, when kappa satisfies the following equation, the forward and reverse matches have similar matching times;

where v represents the event value,and->Representing the unit cost of performing the marking and counting, respectively; Γ (x) represents the probability that the low or high value of the predicate equals x; x represents a random variable.

Preferably, the entire search space is divided into a matching space, a non-matching space, a candidate space, and an empty space according to the information of each width unit; all predicates in the matching space meet the condition, and when a forward matching algorithm is used, adding one operation to a subscribed counter corresponding to the conditional predicate is directly performed; all predicates in the unmatched space are unsatisfied, and when an inverse matching algorithm is used, subscription corresponding to the unsatisfied predicates is marked directly; the empty space does not contain any predicates, and no check is needed in the matching process, so that the traversing cost is reduced.

Preferably, string type matching and fuzzy matching are supported;

the character string type matching is realized by converting the character type into the form of interval predicates;

the fuzzy matching considers that all predicates in the candidate space meet the condition, and the high value of the predicates is not further checked on the high value tree;

in forward matching, the matching efficiency is further improved by omitting the checking of possible matching predicates; however, certain errors are brought to the matching result, and certain false positive subscriptions are contained in the matched subscriptions;

given the number ζ of width units per attribute, the predicate maximum error rate f on a single attribute is:

given the maximum error rate F allowed, the number of width cells divided on each attribute is calculated.

The system for improving the efficiency and the robustness of the matching algorithm based on the data structure provided by the invention comprises the following components: indexing the subscription based on a matching algorithm by using a preset data structure;

the storage layer is used for storing subscription;

the width units are divided in a uniform manner.

Compared with the prior art, the invention has the following beneficial effects:

the preset data structure provided by the invention has the main advantages that a plurality of matching algorithms can be simultaneously supported, and the efficient and stable matching performance can be realized in a dynamic environment. First, the data structures of existing matching algorithms can mostly only support one matching algorithm, which makes it difficult for their data structures to support other matching algorithms to further improve performance. Secondly, the invention can mix a plurality of matching algorithms, thereby solving the defect of single algorithm performance fluctuation under dynamic environment. This feature enables the present invention to maintain more efficient and stable matching performance in a dynamic environment, thereby enabling quality of service (QoS) of event distribution services to be guaranteed in some more diverse scenarios.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is an abstract view of a data structure of the present invention.

FIG. 2 is a schematic of an AFM algorithm of the present invention.

FIG. 3 is a schematic representation of the ABM algorithm of the present invention.

FIG. 4 is a schematic view of AFM algorithm optimization according to the present invention.

FIG. 5 is a schematic diagram of the ABM algorithm optimization of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

Example 1

The technical solution of the invention is as follows: the event matching performance is crucial to the performance of a content-based publish/subscribe system, and the preset data structure and the three matching algorithms based on the event matching performance can cope with matching requirements in a dynamic environment, so that more efficient and stable matching performance is realized.

The methods adopted by the existing matching algorithm can be divided into two types: forward matching and reverse matching. One of the parameters affecting the performance of the matching algorithm is the matching probability of the subscription, with the increasing of the matching probability of the subscription, the performance of the forward matching algorithm will decrease, while the performance of the reverse matching algorithm will increase. Therefore, the invention provides a preset data structure for index subscription, which can support three methods of forward matching, backward matching and hybrid matching. The hybrid method uses two methods, namely a forward method and a reverse method, in the event matching process, and the advantages of the two methods are fully exerted, so that the event matching efficiency and the event matching robustness are improved.

In order to achieve efficient and stable matching performance. The invention firstly proposes a new index structure. The index structure adopts multi-stage division, and interval predicates are mapped according to the width of the interval predicates, so that the support of various matching algorithms is realized. And by analyzing the subscription in each index unit, the invention provides three efficient matching algorithms which can be suitable for different environmental requirements. Finally, the performance requirement and the stability under the dynamic environment are ensured.

the storage layer is used for storing subscription;

the width units are divided in a uniform manner.

Specifically, the storage layer adopts B+ trees for storage, two B+ trees are arranged for each width unit, the low value and the high value of the interval predicate are respectively corresponding to the two B+ trees, and links to the high value tree are arranged on the low value tree; the B+ tree can realize the self-balance of the tree and ensure the order of the inserted elements.

Specifically, in the matching process, the method also comprises two groups of markers and one group of recorders; one group of markers uses a bit set to mark unmatched subscriptions, and the other group uses a counter to count the number of matched predicates in the subscriptions; the recorder is used for recording task partitions in hybrid matching.

Specifically, the matching algorithm includes: forward matching AFM, reverse matching ABM, and hybrid matching AHM;

the reverse matching ABM is a mode of marking unmatched predicates;

Specifically, the forward matching employs:

Specifically, the reverse matching employs:

step S6: the taggant is checked and untagged subscriptions are matched.

Specifically, the hybrid matching employs: performing task division on the width units, using forward matching on the width units for indexing the predicates of the narrow intervals, and using reverse matching on other width units; recording the quantity of predicates divided into forward matching in each subscription through a recorder; after the forward matching and the reverse matching are completed, checking whether the values of the corresponding counter and the recorder of the untagged subscription are equal or not for the untagged subscription in the reverse matching, and if so, matching the current subscription;

Specifically, according to the information of each width unit, dividing the whole search space into a matching space, a non-matching space, a candidate space and an empty space; all predicates in the matching space meet the condition, and when a forward matching algorithm is used, adding one operation to a subscribed counter corresponding to the conditional predicate is directly performed; all predicates in the unmatched space are unsatisfied, and when an inverse matching algorithm is used, subscription corresponding to the unsatisfied predicates is marked directly; the empty space does not contain any predicates, and no check is needed in the matching process, so that the traversing cost is reduced.

Specifically, supporting string type matching and fuzzy matching;

The system for improving the efficiency and the robustness of the matching algorithm based on the data structure can be realized through the step flow in the method for improving the efficiency and the robustness of the matching algorithm based on the data structure. Those skilled in the art can understand the method for improving the efficiency and the robustness of the matching algorithm based on the data structure as a preferred example of the system for improving the efficiency and the robustness of the matching algorithm based on the data structure.

Example 2

Example 2 is a preferred example of example 1

Existing matching algorithms can be broadly divided into two categories. One type is forward matching, which focuses on finding matching predicates to determine which subscriptions are matching. Such matching algorithms can be further divided into count-based matching and tree-structure filtering-based matching. Another type of matching algorithm is reverse matching. Their main idea is to indirectly determine matching subscriptions by determining which predicates are not matching. The data structures of these algorithms can only support a single matching method. For forward matching, the efficiency of the matching algorithm decreases as the number of matching predicates increases, and reverse matching is the opposite. Therefore, the dynamic environment cannot be gracefully adapted with a single matching method.

There are also exact and fuzzy matches for different matching algorithms. Unlike exact matches, fuzzy matches may identify some false positive subscriptions as matching in a nano-pseudo manner. In this way, the improvement of the matching performance is obtained on the premise of ensuring that a certain misjudgment rate is tolerated. In addition, the support of different event types according to the matching algorithm can be further divided into a single event type and a multi-event type. Compared with a single event type, the multi-event type support can provide richer subscription expression and ensure event matching under a high-dimensional space.

In order to solve the defect that the existing matching algorithm adopts a single matching method, a publishing/subscribing system with stronger universality, higher matching efficiency and more stable performance is realized. The overall framework of the data structure is shown in fig. 1. The entire data structure may be divided into a two-level index layer and a storage layer. Wherein the first level index is attribute-based. Predicates with the same attributes will map into the same attribute units. The second level index is a mapping based on interval predicate width. We first compute each predicate width and then map the predicates into different width units according to the width. This mapping approach enables interval predicates of the same width but different centers to be mapped into the same width units. The width units are divided in a uniform manner, for example: dividing the value range space of [0,1] into 5 width units, mapping the width range of each width unit to be 0.2, wherein the width range of the first width unit is [0,0.2], the width range of the second width unit is [0.2,0.4], and the like.

The storage layer is used for storing subscription, the storage layer is used for storing B+ trees, two B+ trees are arranged for each width unit, the low value and the high value of the interval predicate are respectively corresponding to the B+ trees, and links to the high value trees are arranged on the low value trees. The B+ tree can realize the self-balance of the tree and ensure the order of the inserted elements. In addition, two sets of markers and one set of recorders are required in the matching process. One set of markers marks non-matching subscriptions using a set of bits, and the other set counts the number of matching predicates in the subscriptions using a counter. The recorder is used for recording task partitions in hybrid matching.

And mapping each predicate into a corresponding width unit according to the corresponding attribute and the width of the predicate, and then respectively inserting a low value and a high value of each predicate into two B+ trees to complete the subscription insertion process.

Three matching algorithms based on preset data structure

Based on a preset data structure, the invention provides three matching algorithms, namely forward matching (AFM), reverse matching (ABM) and hybrid matching (AHM). The three matching algorithms are based on the same data structure.

(1) Forward matching employs a way of counting the predicates of the match. Upon a match, all low value trees in the data structure need to be checked. As shown in fig. 2, the search space on the low value tree is matched in the forward direction. In one width unit, the width range of the interval predicate of the width unit index is assumed to be [ w, w' ]. For an event value v, let v ' =v-w, v "=v-w ', then the subscriptions in the v ' v space are all matched. The remaining possible matching subscriptions are contained within the v ", v' ] space. The forward matching thus involves three steps. And step 1, adding one operation to the subscribed counter corresponding to the predicate indexed in the [ v', v ] space. And 2, checking the high value of the predicate in the [ v ', v' ] space through the pointer arranged on the B+ tree. If the high value is greater than or equal to v, then the counter of the corresponding subscription containing the predicate is matched and also incremented. And 3, checking the counter after all the low value tree operations in the data structure are completed. If the value of the counter is the same as the number of predicates for the corresponding subscription, the subscription is matched.

(2) Reverse matching employs a way of tagging unmatched predicates. As shown in fig. 3, the search space is reverse matched. Within a width cell, for an event v, all predicates with low values greater than v and high values less than v are unmatched. Therefore, three steps can be divided for reverse matching. First, all predicates with low values greater than v are marked on the low value tree. Secondly, on the high value tree, all predicates with high values less than v are marked. Finally, the taggant is checked, and untagged subscriptions are matched.

(3) Hybrid matching combines forward matching and reverse matching methods. By tasking the width units. Forward matching is used for width units used to index narrow-interval predicates (meaning predicates with interval widths less than a given threshold) and reverse matching is used for other width units. The number of predicates divided into forward matches in each subscription is recorded by a logger. After the forward matching and reverse matching sections are completed. For untagged subscriptions in the reverse match, it is checked whether the values of the untagged subscription correspondence counter and logger are equal. If equal, the subscriptions are matched.

The division points of the width unit task assignment by the hybrid matching need to be such that the forward matching and the reverse matching have the same matching time as much as possible. Assuming that the width of the division point is κ, the forward and reverse matches have similar matching times when κ satisfies the following equation.

Where v represents the value of the event,and->Representing the unit cost of performing the marking and counting, respectively, Γ (x) represents the probability that the low or high value of the predicate equals x. By solving the above equation, for example, when predicates are uniformly distributed in the value range space, it is possible to obtain +.>The width unit of kappa is used as the boundary for indexing predicates with small widthWidth units at κ are assigned to forward matches and the remaining width units are assigned to reverse matches to complete task partitioning.

Reduced search space optimization for matching algorithms

As shown in fig. 4 to 5, according to information of each width unit, that is, upper and lower limits of interval predicate widths that can be mapped to the width unit, the entire search space can be divided into a matching space, a non-matching space, and an empty space. All predicates in the matching space meet the condition, and when a forward matching algorithm is used, one adding operation can be directly carried out on the subscribed counters corresponding to the predicates meeting the condition; all predicates in the unmatched space are unsatisfied, and when an inverse matching algorithm is used, subscription corresponding to the unsatisfied predicates can be directly marked; the empty space does not contain any predicates, and no check is needed in the matching process, so that the traversing cost can be reduced.

For both forward and reverse matches, a process of determining the target space is involved. For example, forward matching requires determining the space in which a matching predicate is located, while reverse matching is determining the space in which a non-matching predicate is located. The invention provides an optimization method for reducing search space under partial situations. Let the value range space of the attribute be [0,1]. First, for width units with a given predicate width range of [ w, w' ], predicates are not contained in the [1-w,1] space on the low value tree and in the [0,w ] space on the high value tree.

For forward matching, an event value v is given on the low value tree, and when v is between [1-w, w ], all predicates on the low value tree are matched. Therefore, when v is in the interval, all matching predicate spaces can be determined by only two comparisons, so that the time for searching on the B+ tree is reduced.

For reverse matching, when v is greater than 1-w on a low value tree or less than w on a high value tree, there is no unmatched predicate on the corresponding low or high value tree. Therefore, when v satisfies the above condition, it can be determined that the low value tree or the high value tree does not contain a mismatch predicate by two comparisons.

By the method, the positioning cost of the target space can be shortened, the matching efficiency is further improved, and the method has a good optimizing effect when the width exceeds half of the value range space.

String type matching and fuzzy matching support

Fuzzy matching means that the matching structure is not exact and may contain false positives (false positives), i.e. a non-matching subscription is judged to be matching. Fuzzy matching algorithms generally improve matching performance by sacrificing certain false positives.

The invention realizes the support of character type matching by converting the character type into the form of interval predicates, and the supported operators comprise: <, +, =, +,. The transformation mode is as follows:

Ai<“abcde”-→Ai∈[“”,“abcde”)

Ai≤“abcde”-→Ai∈[“”,“abcde”]

Ai＝“abcde”-→Ai∈[“abcde”,“abcde”]

Ai>“abcde”-→Ai∈(“abcde”,“INF”]

Ai≥“abcde”-→Ai∈[“abcde”,“INF”]

Ai＝“abcd＊”-→Ai∈[“abcd”,“abce”)

the present invention also provides support for fuzzy matching, i.e., all predicates in the candidate space (candidate space) shown in FIG. 2 are considered satisfied, and no further checking of the high values of these predicates is done on the high value tree). In forward matching, the matching efficiency is further improved by omitting the checking of possible matching predicates. But this will introduce some error into the matching result, i.e. the matching subscription contains some false positive subscription. Given the number ζ of width units per attribute, the predicate maximum error rate f on a single attribute is:

given the maximum error rate F allowed, equation (2) can be used to calculate the number of divided width cells on each attribute.

Those skilled in the art will appreciate that the systems, apparatus, and their respective modules provided herein may be implemented entirely by logic programming of method steps such that the systems, apparatus, and their respective modules are implemented as logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the systems, apparatus, and their respective modules being implemented as pure computer readable program code. Therefore, the system, the apparatus, and the respective modules thereof provided by the present invention may be regarded as one hardware component, and the modules included therein for implementing various programs may also be regarded as structures within the hardware component; modules for implementing various functions may also be regarded as being either software programs for implementing the methods or structures within hardware components.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the invention. The embodiments of the present application and features in the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. A method for improving the efficiency and robustness of a matching algorithm based on a data structure, comprising: indexing the subscription based on a matching algorithm by using a preset data structure;

the two-stage index layer comprises a first-stage index layer and a second-stage index layer;

the storage layer is used for storing subscription;

the width units are divided in a uniform manner;

the matching algorithm comprises the following steps: forward matching AFM, reverse matching ABM, and hybrid matching AHM;

the reverse matching ABM is a mode of marking unmatched predicates;

the hybrid matching AHM combines forward matching and reverse matching methods, performs task division on width units, uses forward matching on the width units used for indexing the narrow interval predicates, and uses reverse matching on the width units except the narrow interval predicates;

the mixing matching adopts: performing task division on the width units, using forward matching on the width units for indexing the predicates of the narrow intervals, and using reverse matching on other width units; recording the quantity of predicates divided into forward matching in each subscription through a recorder; after the forward matching and the reverse matching are completed, checking whether the values of the corresponding counter and the recorder of the untagged subscription are equal or not for the untagged subscription in the reverse matching, and if so, matching the current subscription;

wherein v represents an event value, and I and J represent unit costs of performing marking and counting, respectively; Γ (x) represents the probability that the low or high value of the predicate equals x; x represents a random variable.

2. The method for improving the efficiency and the robustness of a matching algorithm based on a data structure according to claim 1, wherein the storage layer uses b+ trees for storage, two b+ trees are provided for each width unit, the low value and the high value of interval predicates are respectively corresponding to the low value tree and the link to the high value tree is arranged on the low value tree; the B+ tree can realize the self-balance of the tree and ensure the order of the inserted elements;

and mapping the corresponding attribute of each predicate and the width of each predicate into a corresponding width unit, and respectively inserting the low value and the high value of each predicate into two B+ trees to complete the subscription insertion process.

3. The method for improving the efficiency and robustness of a matching algorithm based on a data structure according to claim 1, further comprising two sets of markers and one set of recorders during the matching process; one group of markers uses a bit set to mark unmatched subscriptions, and the other group uses a counter to count the number of matched predicates in the subscriptions; the recorder is used for recording task partitions in hybrid matching.

4. The method for improving efficiency and robustness of a matching algorithm based on a data structure according to claim 1, wherein the forward matching employs:

step S2: for predicates indexed in the [ v ', v' ] space, checking the high value of the predicate on the high value tree through pointers arranged on the B+ tree; when the high value of the predicate is greater than or equal to v, matching, and adding one to a counter containing the corresponding subscription of the current predicate; wherein v "=v-w';

5. The method for improving efficiency and robustness of a matching algorithm based on a data structure according to claim 1, wherein the reverse matching employs:

step S6: the taggant is checked and untagged subscriptions are matched.

6. The method for improving the efficiency and the robustness of a matching algorithm based on a data structure according to claim 1, wherein the whole search space is divided into a matching space, a non-matching space and an empty space according to the information of each width unit; all predicates in the matching space meet the condition, and when a forward matching algorithm is used, adding one operation to a subscribed counter corresponding to the conditional predicate is directly performed; all predicates in the unmatched space are unsatisfied, and when an inverse matching algorithm is used, subscription corresponding to the unsatisfied predicates is marked directly; the empty space does not contain any predicates, and no check is needed in the matching process, so that the traversing cost is reduced.

7. The method for improving the efficiency and the robustness of a matching algorithm based on a data structure according to claim 1, wherein character string type matching and fuzzy matching are supported;

the fuzzy matching considers that all predicates in the candidate subspace meet the conditions, and the high value of the predicates is not further checked on the high value tree;

8. A system for improving the efficiency and robustness of a matching algorithm based on a data structure, comprising: indexing the subscription based on a matching algorithm by using a preset data structure;

the storage layer is used for storing subscription;

the width units are divided in a uniform manner;

the reverse matching ABM is a mode of marking unmatched predicates;