CN116089663A - Rule expression matching method and device and computer readable storage medium - Google Patents

Rule expression matching method and device and computer readable storage medium Download PDF

Info

Publication number
CN116089663A
CN116089663A CN202211515709.6A CN202211515709A CN116089663A CN 116089663 A CN116089663 A CN 116089663A CN 202211515709 A CN202211515709 A CN 202211515709A CN 116089663 A CN116089663 A CN 116089663A
Authority
CN
China
Prior art keywords
rule
matching
expression
rule expression
binary code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211515709.6A
Other languages
Chinese (zh)
Inventor
李�瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202211515709.6A priority Critical patent/CN116089663A/en
Publication of CN116089663A publication Critical patent/CN116089663A/en
Priority to PCT/CN2023/134854 priority patent/WO2024114655A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a rule expression matching method, a device and a computer readable storage medium, wherein the method comprises the following steps: receiving a rule text string, carrying out grammar verification on the rule text string, and outputting a rule expression; the rule expression is converted into the simplest rule expression in a lossless mode based on a simplification algorithm of the cyclic binary code; equivalent conversion of the simplest rule expression into a rule expression matching tree based on a predicate calculation algorithm; merging the multiple rule expression matching trees into a merged matching network, and identifying a common rule segment; and carrying out feature matching on the data to be matched by utilizing the combined matching network and the common rule fragments. By using the method, the matching efficiency can be improved.

Description

Rule expression matching method and device and computer readable storage medium
Technical Field
The invention belongs to the field of feature matching, and particularly relates to a rule expression matching method, a device and a computer readable storage medium.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
A part of the matching device in the prior art adopts a row-column table mode, a more complex rule expression mode cannot be configured, and the expansibility and the flexibility are low. Another part of devices adopts a regular expression mode, but for whether the regular expression meets a set grammar, a text character analysis hard coding mode or a regular matching mode is often adopted (the regular matching algorithm is based on a finite state machine and cannot operate on infinite elements needing to be calculated). These verification methods are only applicable to simple scenes, and cannot perform completeness verification on all combined scenes of the regular expression mode. For the rule expression of service configuration, the lack of effective equivalent predicate operation simplifies the rule expression configured by a user into the simplest expression, and the lack of the public segment of the identification rule expression influences the subsequent matching efficiency.
Therefore, how to improve the matching efficiency is a problem to be solved.
Disclosure of Invention
In order to solve the problems in the prior art, a method, a device and a computer readable storage medium for matching regular expressions are provided.
The present invention provides the following.
In a first aspect, a rule expression matching method is provided, including: receiving a rule text string, carrying out grammar verification on the rule text string, and outputting a rule expression; the rule expression is converted into the simplest rule expression in a lossless mode based on a simplification algorithm of the cyclic binary code; equivalent conversion of the simplest rule expression into a rule expression matching tree based on a predicate calculation algorithm; merging the multiple rule expression matching trees into a merged matching network, and identifying a common rule segment; and carrying out feature matching on the data to be matched by utilizing the combined matching network and the common rule fragments.
In one embodiment, the method further comprises the step of performing lossless conversion of the rule expression into a simplest rule expression based on a reduction algorithm of the cyclic binary code, and the method further comprises the steps of: acquiring all key elements in the rule expression, and generating all combinations of the plurality of key elements based on the positive and negative values of each key element; acquiring a combination value range which enables the rule expression to be true from all combinations, and acquiring binary code combinations of the combination value range; performing parity cyclic binary code combination on a plurality of binary codes in the binary code combination to obtain a simplified binary code combination; and converting each binary bit in the simplified binary code combination back to the key element, and outputting the simplest rule expression.
In one embodiment, performing parity-cyclic binary code combining on the plurality of binary codes includes: comparing binary codes in the binary code combination in pairs, and merging to generate a new binary code; comparing the new binary code with the original binary codes which are not combined in pairs, combining to generate a new binary code and removing repeated binary codes; and repeating the merging step until the new binary numbers cannot be merged again.
In one embodiment, the parity-cycling binary code combining further comprises: when only one different binary bit exists in the two binary codes, the different binary bit is set as a set symbol, and the rest identical binary bits are kept unchanged as a new binary code.
In one embodiment, each binary bit in the reduced binary code combination is converted back to the key element, comprising: converting each binary code in the simplified binary code combination into a corresponding key element according to the position of the binary bit; according to the value of each binary bit, performing non-taking operation or non-taking operation on the key element; and if the binary code comprises binary digits with the value of the set symbol, the corresponding key elements are ignored.
In one embodiment, the grammar checking of the regular text string further includes: and performing completeness grammar check on the regular text strings by using a context-free grammar and a recursion descent algorithm.
In one embodiment, the grammar checking of the regular text string further includes: reading in the regular text string, and dividing the regular text string according to a preset separator to obtain a plurality of morphemes; sequencing each morpheme according to the morpheme sequence in the regular text string to generate a lexical unit sequence; traversing the lexical unit sequence and checking the grammar of the regular text string.
In one embodiment, the morphemes are divided into key element types and logical operation types.
In one embodiment, the simplest rule expression is equivalently converted into a rule expression matching tree; further comprises: repeatedly executing one or more of the following predicate deduction algorithms until stable, and obtaining the rule expression matching tree: acquiring a rule tree corresponding to the simplest rule expression; if there are a plurality of non-operation sub-nodes of the rule tree, pushing the non-operation down to the sub-nodes, and exchanging the non-operation sub-nodes with operators and/or operators; if the current operator of the simplest rule expression is consistent with the parent node operator, the child node of the current operator is moved upwards, and the current operator is deleted; and aiming at the leaf nodes of the same layer, ordering according to the unique attribute of the node.
In one embodiment, merging a plurality of the rule expression matching trees into a merged matching network and identifying common rule segments includes: selecting a rule expression matching tree to transpose up and down, and marking rules of the rule expression matching tree as root nodes to serve as an initial state of the merging matching network; traversing other rule expression matching trees one by one to transpose up and down, and merging into the merging matching network one by one; after traversing, forming a complete merging matching network, and extracting a public rule segment.
In one embodiment, the merging matching network is merged one by one, and the method further comprises: for element nodes in a single rule expression matching tree, newly adding or multiplexing element nodes in the merging matching network; and/or, for logical symbol nodes in a single rule expression matching tree, adding or multiplexing logical symbol nodes in the merged matching network; and/or, for the completely coincident logical symbol nodes, extracting the public rule fragments and the rule expression matching tree to which the public rule fragments belong through reverse search; and/or splitting the logical symbol nodes in the merging matching network for the partially overlapped logical symbol nodes.
In one embodiment, performing feature matching on data to be matched by using the combined matching network and the common rule segment includes: priority ordering is carried out on all rule identifiers in the merging matching network in advance; and sequentially matching the element set related to each rule identifier in the merging matching network with the data to be matched according to the priority until the matching is successful or the matching is finished.
In one embodiment, the feature matching is performed on the data to be matched by using the combined matching network and the common rule segment, including one or more operations of: and matching the data to be matched with the element node set of the rule expression matching tree related to each rule identifier, entering from an entrance of the merging matching network, and if the element node set is matched with the element node of the rule expression matching tree, caching element matching results.
In one embodiment, the feature matching is performed on the data to be matched by using the combined matching network and the common rule segment, and the method further includes: if a logical node of the regular expression matching tree is matched, querying a cache whether a parent element node of the logical node has been hit, wherein: if no cache result exists, a factor node is taken down from the factor node set and matched with the data to be matched; if the cache result is obtained, directly taking the cache result to carry out logic operation; and if the logic node belongs to the common rule segment, caching the logic matching result.
In one embodiment, the feature matching is performed on the data to be matched by using the combined matching network and the common rule segment, and the method further includes: if the rule identifier node of the rule expression matching tree is matched, returning the hit rule identifier.
In a second aspect, there is provided a regular expression matching apparatus configured to perform the method of any of claims 1-13, the apparatus comprising: the grammar checker is used for receiving the regular text strings, carrying out grammar check on the regular text strings and outputting rule expressions; the feature converter is used for nondestructively converting the rule expression into a simplest rule expression based on a simplification algorithm of the cyclic binary code; the predicate calculus device is used for equivalently converting the simplest rule expression into a rule expression matching tree based on a predicate calculus algorithm; the network combiner is used for combining the plurality of rule expression matching trees into a combined matching network and identifying a public rule segment; and the feature matcher is used for carrying out feature matching on the data to be matched by utilizing the combined matching network and the public rule fragments.
In a third aspect, there is provided a rule expression matching apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform: the method of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium storing a program which, when executed by a multi-core processor, causes the multi-core processor to perform the method as in the first aspect.
One of the advantages of the above embodiment is that the matching efficiency can be significantly improved.
Other advantages of the present invention will be explained in more detail in connection with the following description and accompanying drawings.
It should be understood that the foregoing description is only an overview of the technical solutions of the present invention, so that the technical means of the present invention may be more clearly understood and implemented in accordance with the content of the specification. The following specific embodiments of the present invention are described in order to make the above and other objects, features and advantages of the present invention more comprehensible.
Drawings
The advantages and benefits described herein, as well as other advantages and benefits, will become apparent to those of ordinary skill in the art upon reading the following detailed description of the exemplary embodiments. The drawings are only for purposes of illustrating exemplary embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic structural view of a rule expression matching apparatus according to an embodiment of the present invention;
FIG. 2 is a flow chart of a rule expression matching method according to an embodiment of the invention;
FIG. 3 is a rule tree schematic of a rule expression according to an embodiment of the present invention;
FIG. 4 is a rule tree conversion schematic according to an embodiment of the present invention;
FIG. 5 is a rule tree conversion schematic according to an embodiment of the present invention;
FIG. 6 is an inverted schematic of a rule tree according to an embodiment of the present invention;
FIG. 7 is a rule tree merge diagram according to one embodiment of the invention;
FIG. 8 is a rule tree merge schematic according to another embodiment of the invention;
FIG. 9 is a schematic diagram of a rule tree according to an embodiment of the present invention;
FIG. 10 is a rule tree merge diagram according to an embodiment of the present invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of embodiments of the present application, it should be understood that terms such as "comprises" or "comprising" are intended to indicate the presence of features, numbers, steps, acts, components, portions or combinations thereof disclosed in the present specification, and are not intended to exclude the possibility of the presence of one or more other features, numbers, steps, acts, components, portions or combinations thereof.
Unless otherwise indicated, "/" means or, e.g., A/B may represent A or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone.
The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
Fig. 1 illustrates an exemplary rule expression matching apparatus comprising: a grammar checker 110 for receiving a regular text string, performing grammar check on the regular text string, outputting a rule expression, and ensuring the rationality of matching the rule expression; a feature converter 120 for non-destructively converting the rule expression into a simplest rule expression based on a reduction algorithm of the cyclic binary code; a predicate operator 130 for equivalently converting the simplest rule expression into a rule expression matching tree based on a predicate algorithm, ready for the identification of the subsequent common segment; a network merger 140 for merging the plurality of rule expression matching trees into a merged matching network and identifying common rule segments; and the feature matcher 150 is used for performing feature matching on the data to be matched by utilizing the combined matching network and the common rule fragments. Thus, the matching efficiency can be significantly improved by a series of simplification steps.
Fig. 2 shows a flowchart for performing a rule expression matching method according to an embodiment of the present disclosure. It should be understood that method 200 may also include additional blocks not shown and/or that the blocks shown may be omitted, the scope of the disclosure being not limited in this respect.
Step 210, receiving a regular text string, carrying out grammar verification on the regular text string, and outputting a rule expression;
in one embodiment, the regular text strings are checked for completeness grammar using a context-free grammar and a recursive descent algorithm.
In one embodiment, to implement grammar checking for regular text strings, the following steps may be further specifically performed: reading in the regular text string, and dividing the regular text string according to a preset separator to obtain a plurality of morphemes, wherein the morphemes can be divided into key element types and logic operation types; sequencing each morpheme according to the morpheme sequence in the regular text string to generate a lexical unit sequence; traversing the lexical unit sequence and checking the grammar of the regular text string.
In particular, each rule that may be applicable may be configured as a corresponding text string, i.e. a regular text string. By way of example, if there are currently two regular text strings:
rule 1 [ "North Kanan use") "&"Australian-door acceptance area", "&"," debit card ", a" debit card "&"transaction amount exceeds 2 ten thousand yuan"&"merchant type is a lottery class"]
Rule 2 [ "North Kanan use") "&"Australian-door acceptance area", "&"," credit card "", and "credit card") "&"transaction amount exceeds 1 ten thousand yuan", and "&"merchant type is a lottery class"]
The electronic device may receive regular text strings configured by a developer and the like, and may segment the regular text strings according to the agreed separator for each received regular text string, remove redundant spaces and line-wrapping characters, and obtain each individual morpheme. For example, according to comma and segmentation, each morpheme can be obtained, the attribute table of the lexical unit is queried, and morpheme attribute information of each morpheme is obtained:
Figure BDA0003970430060000061
the morphemes are divided into key element types and logical operation types: the logical operation type is mainly composed of AND &, OR |, NOT ≡! Brackets (); the key element types are related to the meaning of a specific business scenario, such as external card internal use, non-gold institutions, and the like.
Each morpheme can be ordered according to the order of each morpheme in the regular text string, and the ordered morpheme is called a lexical unit sequence for convenience of description. The model created based on the context-free grammar may be referred to as a generative model g= (N, Σ, P, S).
By way of example, the following illustrates a schematic representation of a generative model provided by some embodiments that may parse a regular text string and generate a rule tree corresponding to the regular text string based on a generative model created in accordance with a context free grammar of the extended bachelus-agate normal form (EBNF).
Figure BDA0003970430060000062
Figure BDA0003970430060000071
Illustratively, when parsing a regular text string based on a generative model created by a context-free grammar, each morpheme in a sequence of lexical elements corresponding to the regular text string may be read sequentially, and a top-down recursive descent algorithm may be employed, and each time a symbol is looked forward, the choice of grammar rules (analysis functions) is guided by the look-forward symbol, and an analysis function to which each morpheme is applicable is determined. For example, there are 5 analysis functions, respectively: the expression analysis function expr (), or analysis function or (), and left analysis function and right analysis function and non-analysis function notCond (). Each analysis function can be processed according to a conventional recursion descent algorithm, then the parsed lexical unit sequence is traversed once, the grammar of the rule expression is checked, and a rule text string corresponding rule tree is generated.
Specifically, the process of creating the rule tree corresponding to the rule text string may include:
each key element contained in the rule text string is respectively used as one node contained in the rule tree corresponding to the rule text string, the logic operator in the rule text string is also respectively used as one node contained in the rule tree, and the nodes with association relations in the rule text string can be connected, so that the rule tree corresponding to the rule text string is established.
In one possible implementation manner, when a rule tree corresponding to the rule text string is established, if an analysis function applicable to morphemes in the lexical unit sequence of the rule text string is or is an analysis function or (), an additional matching intermediate node or (or), and/or (or), or (or) or the like is simultaneously established in the rule tree with a right analysis function and a left analysis function, a non-analysis function notCond (), respectively, and if morphemes (elements) in the lexical unit sequence of the rule text string are encountered as a terminator (TOK_COND), a corresponding leaf node can be newly added in a lower layer of the associated intermediate node. After all lexical unit traversals are completed, if the grammar is satisfied, the rule tree as shown in fig. 3 is completed simultaneously.
In this embodiment, a complete grammar checking method suitable for logical operation of a regular expression is created, and grammar checking of the regular expression is completed only once by adopting a recursion descent algorithm, so that grammar checking can be performed on infinite logical operation combinations.
Step 220, based on a simplification algorithm of the cyclic binary code, the rule expression is converted into a simplest rule expression in a lossless manner;
in one embodiment, the step 220 may further include:
step 221, obtaining all key elements in the rule expression, and generating all combinations of the plurality of key elements based on the positive and negative values of each key element;
for example, based on the rule expression output in the foregoing step 210:
F=(!A&C&D)|(!A&!C&D)|(A&!B&D)|(A&!B&!D)|(A&!B&C&D)
wherein the key element is A, B, C, D, each key element takes a value of 0 or 1 respectively, and assuming that the key element represents 1, the key element is represented by ≡! The value of the key element (the inverse) is 0, the value range of the key element is 2≡ 2^4 =16 (the N power of 2, N is the number of the key elements), and the binary string format is adopted: 0000. 0001, 0010, 0011, 0100,..1111. The correspondence is as follows: the following is carried out A-! B-! C-! D=0000, ++! A-! B-! Cd=0001, ++! A-! BC-! D=0010,..abcd=1111.
The binary character string is expressed by decimal system, namely:
!A!B!C!D=0000=0、!A!B!CD=0001=1、!A!BC!D=0010=2、...、ABCD=1111=15。
step 222, obtaining a combination value range for making the rule expression true from all combinations, and obtaining binary code combinations of the combination value range;
for example, the range of values that can make the original regular expression true is taken out of all combinations, namely:
f(A,B,C,D)=(!A&!B&C&D)||(!A&B&C&D)||(!A&!B!C&D)||(!A&B!C&D)||(A&!B&!C&!D)||(A&!B&!C&D)||(A&!B&C&D)||(A&!B&C&!D)
expressed in decimal terms as follows:
f(ABCD)=∑(1,3,5,7,8,9,10,11)
step 223, performing parity cyclic binary code combination on the plurality of binary codes in the binary code combination to obtain a simplified binary code combination;
in one embodiment, the step 223 may specifically include:
(1) Comparing binary codes in the binary code combination in pairs, and merging to generate a new binary code;
more specifically, when there is only one different binary bit in the two binary codes, the different binary bit is set as a set symbol, and the remaining same binary bits are kept unchanged as new binary codes. For example, for "0000" and "0010", there is a 1-bit distinction, which can be combined to "00 x 0", where "x" is the set symbol.
(2) Comparing the new binary code with the original binary codes which are not combined in pairs, combining to generate a new binary code and removing repeated binary codes;
and (3) repeating the merging steps (1) and (2) until the new binary numbers cannot be merged again.
Alternatively, other parity-cycle binary code combining methods may be used, which is not particularly limited in this application.
And 224, converting each binary bit in the simplified binary code combination back to the key element, and outputting the simplest rule expression.
In one embodiment, the step 224 may specifically include one or more of the following operations:
(1) Converting each binary code in the simplified binary code combination into a corresponding key element according to the position of the binary bit;
(2) According to the value of each binary bit, performing non-taking operation or non-taking operation on the key element; the method comprises the steps of,
(3) And if the binary code comprises binary digits with the value of the set symbol, the corresponding key elements are ignored.
For example, translate "0000" into "+|! A-! B-! C-! D ", converting" 0011 "to" ≡! A-! BCD ", converting" 11 x 1 "to" ABD ", and the like.
Illustratively, the merging process is as follows:
Figure BDA0003970430060000091
the original rule expression can be equivalently simplified:
the original rule expression: f= (|A & C & D) | (|A |C & D) | (A |B |D) | (A |B & C & D)
The simplest rule expression: f= (| AD) | (a| B)
In this embodiment, through the above simplification algorithm based on the cyclic binary code, the rule expression configured by the user is simplified, the redundant rule expression segment is automatically removed and simplified, lossless conversion is performed to the simplest rule expression, and a matching tree is generated based on the rule expression after simplification.
In the embodiment, through the simplest lossless conversion of the expression, the text rule configured by the user is simplified into the simplest rule expression in a lossless manner, so that the subsequent matching process is greatly simplified, and the matching efficiency is improved.
Step 230, equivalently converting the simplest rule expression into a rule expression matching tree based on a predicate calculation algorithm;
in one embodiment, the step 230 may specifically include: firstly, a rule tree corresponding to the simplest rule expression is obtained, and then one or more of the following predicate deduction algorithms are repeatedly executed until the predicate deduction algorithms are stable, so that the rule expression matching tree is obtained:
(1) If there are a plurality of non-operation sub-nodes of the rule tree, pushing the non-operation down to the sub-nodes, and exchanging the non-operation sub-nodes with operators and/or operators;
for example, referring to fig. 4, if there are a plurality of child nodes of the non-operation, the non-operation is pushed down into the child nodes, with the operation "&" and or "|" being interchanged, and brackets added after the pushing down.
(2) If the current operator of the simplest rule expression is consistent with the parent node operator, the child node of the current operator is moved upwards, and the current operator is deleted;
for example, referring to FIG. 5, when the current operator is consistent with the parent node operator, the child node of the current operator moves up and deletes the current operator.
(3) And aiming at the leaf nodes of the same layer, ordering according to the unique attribute of the node.
For example, intra-node ordering: the leaf nodes of the same layer are ordered according to the unique attribute of the node, such as the ascending order or the descending order of the field IDs, and the non-leaf nodes of the same layer are arranged at the back. The unique attribute of the non-leaf nodes is composed of the unique attribute of the child nodes, so that the non-leaf nodes in the same layer are also ordered according to the unique attribute.
Step 240, merging the plurality of rule expression matching trees into a merged matching network, and identifying a common rule segment;
in one embodiment, the step 240 may further include:
(1) Selecting a rule expression matching tree to transpose up and down, and marking rules of the rule expression matching tree as root nodes to serve as an initial state of the merging matching network;
for example, referring to fig. 6, 1 rule network is chosen to transpose up and down, the original leaf node is taken as an entry point, the final matching hit rule is taken as a terminal point, and the network is the initial state of the merging network. After the upper and lower transposition, the element nodes are at the uppermost layer.
(2) Traversing other rule expression matching trees one by one to transpose up and down, and merging into the merging matching network one by one;
further, to merge into the merged matching network one by one, the method further comprises: for element nodes in a single rule expression matching tree, newly adding or multiplexing element nodes in the merging matching network; and/or, for logical symbol nodes in a single rule expression matching tree, adding or multiplexing logical symbol nodes in the merged matching network; and/or, for the completely coincident logical symbol nodes, extracting the common rule segment and the rule expression matching tree to which the common rule segment belongs through reverse search, for example, see fig. 7, wherein "condition 5& condition 6" is a common segment, and then rule 1 and rule 2 are shared; and/or for partially overlapping logical symbol nodes, splitting logical symbol nodes in the merged matching network, see, e.g., fig. 8.
(3) After traversing, forming a complete merging matching network, extracting a public rule segment and a rule to which the public rule segment belongs, and simultaneously recording a single element or a derivative element set of each rule.
For example, the merging of fig. 9 generates the merged matching network shown in fig. 10, wherein elements within the dashed box are common rule segments, and the element set related to each rule identifier (rule 1 and rule 2) is recorded.
And step 250, performing feature matching on the data to be matched by utilizing the combined matching network and the common rule fragments.
In one embodiment, the priority ranking may be performed in advance for each rule identifier in the merged matching network; and sequentially matching the element set related to each rule identifier in the merging matching network with the data to be matched according to the priority until the matching is successful or the matching is finished.
In one embodiment, performing feature matching on data to be matched by using the combined matching network and the common rule segment includes:
(1) Element matching, namely identifying element node sets of a related rule expression matching tree according to each rule, entering from an inlet of the merging matching network, matching the element sets with the data to be matched, and caching element matching results.
(2) Logic matching, if the cache of the father node is matched with the logic node, inquiring whether the cache of the father node is hit or not; the results can be further divided into two cases: if i) the cache result is not available, taking down one element node from the element node set, and matching with the data to be matched; ii) if the cache result exists, directly taking the cache result to carry out logic operation; and if the logic node belongs to the common rule segment, caching the logic matching result. Otherwise, no buffer is needed to save space.
(3) If the final node is matched, returning a hit rule identification; if the return is not satisfied halfway, the next rule continues to be matched in priority order.
It should be noted that, the steps not described in detail in this embodiment may refer to descriptions of related steps in the embodiment shown in fig. 1, and are not described herein.
In the description of the present specification, reference to the terms "some possible embodiments," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiments or examples is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the various embodiments or examples described in this specification and the features of the various embodiments or examples may be combined and combined by those skilled in the art without contradiction.
With respect to the method flowcharts of the embodiments of the present application, certain operations are described as distinct steps performed in a certain order. Such a flowchart is illustrative and not limiting. Some steps described herein may be grouped together and performed in a single operation, may be partitioned into multiple sub-steps, and may be performed in an order different than that shown herein. The various steps illustrated in the flowcharts may be implemented in any manner by any circuit structure and/or tangible mechanism (e.g., by software running on a computer device, hardware (e.g., processor or chip implemented logic functions), etc., and/or any combination thereof).
It should be noted that, the apparatus in the embodiments of the present application may implement each process of the foregoing method embodiment and achieve the same effects and functions, which are not described herein again.
According to some embodiments of the present application, there is provided a non-transitory computer storage medium having stored thereon computer executable instructions configured to, when executed by a processor, perform: the method according to the above embodiment.
In this application, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment is mainly described as different from other embodiments. In particular, for apparatus, devices and computer readable storage medium embodiments, the description thereof is simplified as it is substantially similar to the method embodiments, as relevant points may be found in part in the description of the method embodiments.
The apparatus, the device, and the computer readable storage medium provided in the embodiments of the present application are in one-to-one correspondence with the methods, and therefore, the apparatus, the device, and the computer readable storage medium also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the apparatus, the device, and the computer readable storage medium are not repeated herein.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus (device or system), or computer readable storage medium. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer-readable storage medium embodied in one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices or systems) and computer-readable storage media according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Furthermore, although the operations of the methods of the present invention are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments nor does it imply that features of the various aspects are not useful in combination, nor are they useful in any combination, such as for convenience of description. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (18)

1. A method for matching regular expressions, comprising:
receiving a regular text string, carrying out grammar verification on the regular text string, and outputting a rule expression;
the rule expression is converted into a simplest rule expression in a lossless mode based on a simplification algorithm of the cyclic binary code;
equivalently converting the simplest rule expression into a rule expression matching tree based on a predicate calculation algorithm;
merging the multiple rule expression matching trees into a merged matching network, and identifying a common rule segment;
and carrying out feature matching on the data to be matched by utilizing the combined matching network and the common rule fragments.
2. The method of claim 1, wherein the lossless conversion of the regular expression into a simplest regular expression is based on a reduction algorithm of a cyclic binary code, further comprising:
acquiring all key elements in the rule expression, and generating all combinations of the plurality of key elements based on the positive and negative values of each key element;
acquiring a combination value range which enables the rule expression to be true from all combinations, and acquiring binary code combinations of the combination value range;
performing parity cyclic binary code combination on a plurality of binary codes in the binary code combination to obtain a simplified binary code combination;
and converting each binary bit in the simplified binary code combination back to the key element, and outputting the simplest rule expression.
3. The method of claim 2, wherein performing parity-cycling binary code combining on the plurality of binary codes comprises:
comparing binary codes in the binary code combination in pairs, and merging to generate a new binary code;
comparing the new binary code with the original binary codes which are not combined in pairs, combining to generate a new binary code and removing repeated binary codes;
and repeating the merging step until the new binary numbers cannot be merged again.
4. The method of claim 2, wherein the parity-cycling binary code combining further comprises:
when only one different binary bit exists in the two binary codes, the different binary bit is set as a set symbol, and the rest identical binary bits are kept unchanged as a new binary code.
5. A method according to claim 3, wherein each binary bit in the reduced binary code combination is translated back to the key element, comprising:
converting each binary code in the simplified binary code combination into a corresponding key element according to the position of the binary bit;
according to the value of each binary bit, performing non-taking operation or non-taking operation on the key element; the method comprises the steps of,
and if the binary code comprises binary digits with the value of the set symbol, the corresponding key elements are ignored.
6. The method of claim 1, wherein the grammar checking of the regular text string further comprises: and performing completeness grammar check on the regular text strings by using a context-free grammar and a recursion descent algorithm.
7. The method of claim 1, wherein the grammar checking of the regular text string further comprises:
reading in the regular text string, and dividing the regular text string according to a preset separator to obtain a plurality of morphemes;
sequencing each morpheme according to the morpheme sequence in the regular text string to generate a lexical unit sequence;
traversing the lexical unit sequence and checking the grammar of the regular text string.
8. The method of claim 1, wherein the morphemes are divided into key element types and logical operation types.
9. The method of claim 1, wherein the simplest rule expression is equivalently translated into a rule expression matching tree; further comprises:
repeatedly executing one or more of the following predicate deduction algorithms until stable, and obtaining the rule expression matching tree:
acquiring a rule tree corresponding to the simplest rule expression;
if there are a plurality of non-operation sub-nodes of the rule tree, pushing the non-operation down to the sub-nodes, and exchanging the non-operation sub-nodes with operators and/or operators;
if the current operator of the simplest rule expression is consistent with the parent node operator, the child node of the current operator is moved upwards, and the current operator is deleted;
and aiming at the leaf nodes of the same layer, ordering according to the unique attribute of the node.
10. The method of claim 1, wherein merging a plurality of the regular expression matching trees into a merged matching network and identifying common rule segments comprises:
selecting a rule expression matching tree to transpose up and down, and marking rules of the rule expression matching tree as root nodes to serve as an initial state of the merging matching network;
traversing other rule expression matching trees one by one to transpose up and down, and merging into the merging matching network one by one;
after traversing, forming a complete merging matching network, and extracting a public rule segment.
11. The method of claim 1, wherein merging into the merged matching network one by one further comprises:
for element nodes in a single rule expression matching tree, newly adding or multiplexing element nodes in the merging matching network; and/or the number of the groups of groups,
for the logic symbol nodes in the single rule expression matching tree, adding or multiplexing the logic symbol nodes in the combined matching network; and/or the number of the groups of groups,
for the completely overlapped logical symbol nodes, the public rule fragments and the rule expression matching tree to which the public rule fragments belong can be extracted through reverse search; and/or the number of the groups of groups,
and splitting the logical symbol nodes in the merging matching network for the partially overlapped logical symbol nodes.
12. The method of claim 1, wherein feature matching the data to be matched using the merged matching network and common rule segments comprises:
priority ordering is carried out on all rule identifiers in the merging matching network in advance;
and sequentially matching the element set related to each rule identifier in the merging matching network with the data to be matched according to the priority until the matching is successful or the matching is finished.
13. The method of claim 1, wherein feature matching the data to be matched using the merged matching network and common rule segments comprises one or more of:
and matching the data to be matched with the element node set of the rule expression matching tree related to each rule identifier, entering from an entrance of the merging matching network, and if the element node set is matched with the element node of the rule expression matching tree, caching element matching results.
14. The method of claim 1, wherein feature matching the data to be matched using the merged matching network and common rule segments, further comprising:
if a logical node of the regular expression matching tree is matched, querying a cache whether a parent element node of the logical node has been hit, wherein:
if no cache result exists, a factor node is taken down from the factor node set and matched with the data to be matched;
if the cache result is obtained, directly taking the cache result to carry out logic operation; and if the logic node belongs to the common rule segment, caching the logic matching result.
15. The method of claim 1, wherein feature matching the data to be matched using the merged matching network and common rule segments, further comprising:
if the rule identifier node of the rule expression matching tree is matched, returning the hit rule identifier.
16. A regular expression matching apparatus configured to perform the method of any of claims 1-15, the apparatus comprising:
the grammar checker is used for receiving the regular text strings, carrying out grammar check on the regular text strings and outputting rule expressions;
the feature converter is used for nondestructively converting the rule expression into a simplest rule expression based on a simplification algorithm of the cyclic binary code;
the predicate calculus device is used for equivalently converting the simplest rule expression into a rule expression matching tree based on a predicate calculus algorithm;
the network combiner is used for combining the plurality of rule expression matching trees into a combined matching network and identifying a public rule segment;
and the feature matcher is used for carrying out feature matching on the data to be matched by utilizing the combined matching network and the public rule fragments.
17. A regular expression matching apparatus, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform: the method of any one of claims 1-15.
18. A computer readable storage medium storing a program which, when executed by a multi-core processor, causes the multi-core processor to perform the method of any of claims 1-15.
CN202211515709.6A 2022-11-29 2022-11-29 Rule expression matching method and device and computer readable storage medium Pending CN116089663A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211515709.6A CN116089663A (en) 2022-11-29 2022-11-29 Rule expression matching method and device and computer readable storage medium
PCT/CN2023/134854 WO2024114655A1 (en) 2022-11-29 2023-11-28 Rule expression matching method and apparatus, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211515709.6A CN116089663A (en) 2022-11-29 2022-11-29 Rule expression matching method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116089663A true CN116089663A (en) 2023-05-09

Family

ID=86198195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211515709.6A Pending CN116089663A (en) 2022-11-29 2022-11-29 Rule expression matching method and device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN116089663A (en)
WO (1) WO2024114655A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114142A (en) * 2023-10-23 2023-11-24 深圳市华傲数据技术有限公司 AI-based data rule expression generation method, apparatus, device and medium
WO2024114655A1 (en) * 2022-11-29 2024-06-06 中国银联股份有限公司 Rule expression matching method and apparatus, and computer-readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101551757A (en) * 2009-01-09 2009-10-07 南京大学 Matching method of heuristic events based on predicate covering
US10185720B2 (en) * 2016-05-10 2019-01-22 International Business Machines Corporation Rule generation in a data governance framework
CN112463819A (en) * 2020-11-26 2021-03-09 北京宏景世纪软件股份有限公司 Computing method, device and equipment based on Chinese expression and storage medium
CN114564624A (en) * 2022-02-11 2022-05-31 中国银联股份有限公司 Feature matching rule construction method, feature matching device, feature matching equipment and feature matching medium
CN116089663A (en) * 2022-11-29 2023-05-09 中国银联股份有限公司 Rule expression matching method and device and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024114655A1 (en) * 2022-11-29 2024-06-06 中国银联股份有限公司 Rule expression matching method and apparatus, and computer-readable storage medium
CN117114142A (en) * 2023-10-23 2023-11-24 深圳市华傲数据技术有限公司 AI-based data rule expression generation method, apparatus, device and medium
CN117114142B (en) * 2023-10-23 2024-05-03 深圳市华傲数据技术有限公司 AI-based data rule expression generation method, apparatus, device and medium

Also Published As

Publication number Publication date
WO2024114655A1 (en) 2024-06-06

Similar Documents

Publication Publication Date Title
CN116089663A (en) Rule expression matching method and device and computer readable storage medium
CN110399104B (en) Data storage method, data storage device, electronic apparatus, and storage medium
CN110502227B (en) Code complement method and device, storage medium and electronic equipment
CN111249736B (en) Code processing method and device
CN109299086B (en) Optimal sort key compression and index reconstruction
CN110222194B (en) Data chart generation method based on natural language processing and related device
CN109614499B (en) Dictionary generation method, new word discovery method, device and electronic equipment
JP6447161B2 (en) Semantic structure search program, semantic structure search apparatus, and semantic structure search method
CN110532347A (en) A kind of daily record data processing method, device, equipment and storage medium
CN112511629A (en) Data compression method and system for account tree of MPT structure
WO2023087702A1 (en) Text recognition method for form certificate image file, and computing device
CN115577147A (en) Visual information map retrieval method and device, electronic equipment and storage medium
CN113779200A (en) Target industry word stock generation method, processor and device
CN114168581A (en) Data cleaning method and device, computer equipment and storage medium
CN114297046A (en) Event obtaining method, device, equipment and medium based on log
CN113780467A (en) Model training method and device, computer equipment and storage medium
JP5349193B2 (en) Language model compression device, language model access device, language model compression method, language model access method, language model compression program, language model access program
Nasr et al. A simple string-rewriting formalism for dependency grammar
CN113821211B (en) Command parsing method and device, storage medium and computer equipment
Li et al. Dynamic dictionary with subconstant wasted bits per key
EP4383644A1 (en) Data processing method and apparatus based on merkle tree
CN111626585B (en) Script data extraction method and device, computer equipment and storage medium
JP3972310B2 (en) Information conversion apparatus and program
CN113987785B (en) Management method and device for complete information of algorithm block of nuclear power station DCS system
CN118210645A (en) JSON data verification method and related device based on tree structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40092185

Country of ref document: HK