CN110955683A - Regular expression-based data stream searching method, device and system - Google Patents

Regular expression-based data stream searching method, device and system Download PDF

Info

Publication number
CN110955683A
CN110955683A CN201911186851.9A CN201911186851A CN110955683A CN 110955683 A CN110955683 A CN 110955683A CN 201911186851 A CN201911186851 A CN 201911186851A CN 110955683 A CN110955683 A CN 110955683A
Authority
CN
China
Prior art keywords
data stream
rule
matching
regular expression
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911186851.9A
Other languages
Chinese (zh)
Other versions
CN110955683B (en
Inventor
张兵
钟济
闫振林
熊冰
潘泽跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU XEL TECHNOLOGY Inc
Original Assignee
SUZHOU XEL TECHNOLOGY Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU XEL TECHNOLOGY Inc filed Critical SUZHOU XEL TECHNOLOGY Inc
Priority to CN201911186851.9A priority Critical patent/CN110955683B/en
Publication of CN110955683A publication Critical patent/CN110955683A/en
Application granted granted Critical
Publication of CN110955683B publication Critical patent/CN110955683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method, a device and a system for searching data stream based on regular expression, wherein the method comprises the following steps: marking a plurality of data streams obtained from a data stream generation system with a data stream ID and a rule ID, and obtaining at least one rule corresponding to the rule ID from a rule cache subsystem; and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information. Compared with the prior art, the method has the advantages that after the data streams are obtained, the data streams ID and the rule ID are marked on the data streams, each data stream can select at least one rule through the rule ID in the searching process, and the searching process of the data streams is carried out simultaneously, so that the searching method can search the data streams in parallel without external intervention, the number of the data streams and the number of the searching rules are increased, and the searching efficiency is remarkably improved.

Description

Regular expression-based data stream searching method, device and system
Technical Field
The present disclosure relates to the field of data communication technologies, and in particular, to a method, an apparatus, and a system for searching for data streams based on regular expressions.
Background
At present, the technology of searching for data streams by a regular expression matching search technology is widely used in the industry. The regular expression is a logic formula for operating on character strings, namely, specific characters defined in advance and a combination of the specific characters are used for forming a 'regular character string', and the 'regular character string' is used for expressing a filtering logic for the character strings. By matching the data stream to be searched with the rule through the regular expression, the content (key information) matched with the rule in the data stream can be extracted.
The efficiency of the regular expression search engine matching search is related to the search mode, the common mode is serial search, the serial search is to match the data stream with all the rules in sequence through the regular expressions, the matched rule is quitted if the matched rule is met, and the next rule is continuously matched if the matched rule is not met. Generally, the search rules supported by the regular expression search engine can reach dozens or even hundreds of rules, and because of many rules, if a serial search mode is adopted to search data streams, the efficiency is low.
Disclosure of Invention
In order to overcome the problem that the search rules supported by the regular expression search engine in the related art can reach dozens or even hundreds of rules, and due to the fact that the rules are many, if a serial search mode is adopted for searching data streams, the data stream searching method, device and system based on the regular expression are provided, and the data stream searching efficiency can be improved.
In a first aspect of the present application, a method for searching for a data stream based on a regular expression is provided, where the method includes: marking a plurality of data streams acquired from a data stream generation system with a data stream ID and a rule ID, wherein the data stream ID is used for indicating the number of the data streams, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream ID; obtaining at least one rule corresponding to the rule ID from a rule cache subsystem; and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information.
With reference to the aspect, in a first possible implementation manner, matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information includes: judging whether the rule needs to be maintained according to whether the current data stream meets the rule maintaining condition; if the current data stream meets the rule keeping condition, keeping the rule, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the kept rule to obtain final key information; and if the current data stream does not meet the rule keeping condition, updating the rule ID, acquiring at least one rule corresponding to the updated rule ID from a rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID to obtain final key information.
With reference to the first possible implementation manner, in a second possible implementation manner, keeping state information is further marked on a plurality of data streams acquired from the data stream generation system, where the keeping state information is used to indicate whether the rule is kept; the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
With reference to the aspect, in a third possible implementation manner, occurrence information is further marked on a plurality of data streams acquired from a data stream generation system, where the occurrence information is used to indicate the occurrence times of the data streams; matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information, wherein the step of matching the data stream corresponding to the data stream ID comprises the following steps: dividing the data stream into N data stream slices, wherein occurrence information is marked from the first data stream slice to the Nth data stream slice, and the occurrence information is respectively as follows: first to nth occurrence; judging the matching mode of the data stream slice according to the occurrence information; if the data stream slice appears for the first time, matching the data stream slice according to a regular expression defined in the rule to obtain first-time key information; and if the data stream slice appears from the second time to the Nth time, matching the data stream slice according to a regular expression defined in the rule on the basis of key information obtained by the previous data stream slice until the matching of the N data stream slices is completed, and obtaining final key information.
In a second aspect of the present application, a regular expression-based data stream searching apparatus is provided, where the apparatus includes: a marking unit configured to mark a plurality of data streams acquired from a data stream generation system with a data stream ID indicating a number of the data stream and a rule ID indicating a number of at least one rule corresponding to the data stream ID; a rule obtaining unit, for obtaining at least one rule corresponding to the rule ID from a rule cache subsystem; and the matching unit is used for matching the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain final key information.
With reference to the second aspect, in a fourth possible implementation manner, the matching unit includes: the first judgment module is used for judging whether the rule needs to be maintained or not according to whether the current data stream meets the rule maintaining condition or not; the first matching module is used for keeping the rule if the current data stream meets the rule keeping condition, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the kept rule to obtain final key information; and the second matching module is used for updating the rule ID if the current data stream does not accord with the rule keeping condition, acquiring at least one rule corresponding to the updated rule ID from a rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID to obtain final key information.
With reference to the fourth possible implementation manner, in a fifth possible implementation manner, the marking unit further includes marking holding state information on the plurality of data streams acquired from the data stream generation system, where the holding state information is used to indicate whether the rule is held; the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
With reference to the second aspect, in a sixth possible implementation manner, the marking unit further includes marking occurrence information for a plurality of data streams acquired from a data stream generation system, where the occurrence information is used to indicate the occurrence number of the data streams; the matching unit includes: a dividing module, configured to divide the data stream into N data stream slices, where occurrence information is marked in all of the first to nth data stream slices, and the occurrence information is respectively: first to nth occurrence; the second judging module is used for judging the matching mode of the data stream slice according to the occurrence information; the third matching module is used for matching the data stream slices according to the regular expression defined in the rule to obtain first key information if the data stream slices appear for the first time; and the fourth matching module is used for matching the data stream slices according to the regular expression defined in the rule on the basis of the key information obtained by the previous data stream slice if the data stream slices appear from the second time to the Nth time until the matching of the N data stream slices is completed, so as to obtain the final key information.
In a third aspect of the present application, a regular expression-based data stream searching system is provided, where the system includes a data stream generating system, a search engine control system, and a regular expression search engine subsystem; the search engine control system comprises a data caching subsystem and a Context Switch system, wherein the data caching subsystem is used for receiving a data stream sent by a data stream generation system and marking the data stream with a data stream ID and a rule ID, the data stream ID is used for indicating the number of the data stream, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream; the Context Switch system comprises a rule cache subsystem, wherein the rule cache subsystem is used for storing rules; the regular expression search engine subsystem is used for acquiring the data stream marked with the data stream ID and the rule ID from the data cache subsystem, acquiring at least one rule corresponding to the rule ID from the rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information.
With reference to the third aspect, in a seventh possible implementation manner, the data stream is divided into N data stream slices in the data cache subsystem, and the Context Switch system further includes a Context cache subsystem, where the Context cache subsystem is configured to receive key information of the data stream slices sent by the regular expression search engine subsystem, and feed the key information back to the regular expression search engine subsystem.
The embodiment of the application provides a method, a device and a system for searching data stream based on regular expression, wherein the method comprises the following steps: marking a plurality of data streams acquired from a data stream generation system with a data stream ID and a rule ID, wherein the data stream ID is used for indicating the number of the data streams, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream ID; obtaining at least one rule corresponding to the rule ID from a rule cache subsystem; and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information. Compared with the prior art, the method has the advantages that after the data streams are obtained, the data streams ID and the rule ID are marked on the data streams, each data stream can select at least one rule through the rule ID in the searching process, and the searching process of the data streams is carried out simultaneously, so that the searching method can search the data streams in parallel without external intervention, the number of the data streams and the number of the searching rules are increased, and the searching efficiency is remarkably improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic workflow diagram of a regular expression-based data flow searching method according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a regular expression-based data stream searching system according to an embodiment of the present application;
FIG. 3 is a schematic workflow diagram of a regular expression-based data flow searching method according to an embodiment of the present application;
FIG. 4 is a schematic workflow diagram of a regular expression-based data flow searching method according to an embodiment of the present application;
FIG. 5 is a block diagram of a regular expression-based data stream searching apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of a regular expression-based data stream searching apparatus according to a preferred embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but it will be appreciated by those skilled in the art that the present application may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments.
Referring to a workflow diagram of a regular expression-based data flow searching method shown in fig. 1, the method includes the following steps:
step 101, marking a plurality of data streams acquired from a data stream generation system with a data stream ID and a rule ID, wherein the data stream ID is used for indicating the number of the data stream, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream ID.
The regular expression-based data stream searching system shown in fig. 2 in the embodiment of the present application includes a data stream generating system 1, a search engine control system 2, and a regular expression search engine subsystem 3, where the data stream generating system 1 and the search engine control system 2 communicate with each other, and the data stream generating system 1 is configured to generate a data stream and send the data stream to the search engine control system 2.
The search engine control system 2 includes a data cache subsystem 21 and a Context Switch system 22, the data cache subsystem 21 being configured to receive a data stream transmitted from a data stream generation system, and mark the data stream with a data stream ID indicating a number of the data stream and a rule ID indicating a number of at least one rule corresponding to the data stream ID.
In the embodiment of the present application, multiple data streams are supported to search multiple rules at the same time, as shown in fig. 2, the data stream generated by the data stream generation system 1 is sent to multiple buffers of the data cache subsystem 21, that is, bytes included in the data stream are stored in the corresponding buffers according to the data stream ID. n represents the number of data streams which can be searched simultaneously, each data stream is stored in a data buffer, for example, the data stream with the ID of 0 is stored in the buffer 0, the data stream with the ID of 1 is stored in the buffer 1 …, and the data stream with the ID of n is stored in the buffer. Each data stream is also tagged with a rule ID when stored in the data cache buffer, for example, a data stream with an ID of 0 is tagged with a rule ID of B0, and a data stream with an ID of 1 is tagged with a rule ID of B1 … and a data stream with an ID of n is tagged with a rule ID of Bn.
Step 102, at least one rule corresponding to the rule ID is obtained from a rule cache subsystem.
And 103, matching the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain final key information.
The Context Switch system 22 in turn comprises a rule cache subsystem 221, said rule cache subsystem 221 for storing rules. The n rules stored in the rule cache subsystem 221 are represented by rule0, rule1 … rulen. rule0 and rule1 … rulen correspond to the rule ID marked on the data stream, and the rule ID in the embodiment of the present application: b0 and B1 … Bn, where each rule ID corresponds to at least one rule, that is, each data stream may at least match one rule, for example, the rule ID of the data stream with the data stream ID of 1 is B1, and the rule ID B1 may correspond to one rule of rule0, or may correspond to other rules of rule1 … rulen in the rule cache subsystem 221.
The specific search process of the data stream occurs in the regular expression search engine subsystem 3, and the regular expression search engine subsystem 3 is configured to obtain the data stream with the marked data stream ID and the rule ID from the data cache subsystem, obtain at least one rule corresponding to the rule ID from the rule cache subsystem, and match the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain final key information.
Specifically, the regular expression search engine subsystem 3 obtains at least one rule corresponding to the rule ID from the rule cache subsystem 221 when the search process starts. For example, in a certain operation, it is necessary to search for a data stream with a data stream ID of 1, a data stream with a data stream ID of 2, and a data stream with a data stream ID of 3 at the same time, and extract the data stream with the data stream ID of 1, the data stream with the data stream ID of 2, and the data stream with the data stream ID of 3 from buff1, buff2, and buff3, respectively, into the regular expression search engine subsystem 3. And B1, selecting rule0 and rule1 according to the rule IDB1, downloading rule0 and rule1 into the regular expression search engine subsystem 3, and matching the data stream with the data stream ID of 1 according to the regular expressions defined in rule0 and rule1 respectively to obtain the final key information of the data stream with the data stream ID of 1.
When the matching operation of the data stream with the data stream ID of 1 is carried out, the searching operation of the data stream with the data stream ID of 2 and the data stream with the data stream ID of 3 is also carried out at the same time. Similar to the search operation of the data stream with the data stream ID of 1, the rule ID of the data stream label with the data stream ID of 2 is B2, rule2 and rule3 can be selected according to rule ID B2, rule2 and rule3 are downloaded into the regular expression search engine subsystem 3, the data stream with the data stream ID of 2 is matched according to the regular expressions defined in rule2 and rule3, and the final key information of the data stream with the data stream ID of 2 is obtained. And the rule ID of the data stream label with the data stream ID of 3 is B3, rule4 and rule5 can be selected according to the rule ID B3, rule4 and rule5 are downloaded into the regular expression search engine subsystem 3, and the data stream with the data stream ID of 3 is matched according to the regular expressions defined in rule4 and rule5 respectively, so that the final key information of the data stream with the data stream ID of 3 is obtained. In addition, if the rules corresponding to the rule IDs of the data stream with the data stream ID of 1, the data stream with the data stream ID of 2, and the data stream with the data stream ID of 3 are repeated, for example, the rules selectable by the rule ID B1 of the data stream ID of 1 are rule2 and rule3, and the rules selectable by the rule ID B2 of the data stream ID of 2 are rule3 and rule4, since the search processes of the data stream with the data stream ID of 1 and the data stream with the data stream ID of 2 exist simultaneously, the download of rule3 only needs to be performed once, and the search of the two data streams can be completed simultaneously, thereby saving the process.
The embodiment of the application provides a method, a device and a system for searching data stream based on regular expression, wherein the method comprises the following steps: marking a plurality of data streams acquired from a data stream generation system with a data stream ID and a rule ID, wherein the data stream ID is used for indicating the number of the data streams, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream ID; obtaining at least one rule corresponding to the rule ID from a rule cache subsystem; and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information. Compared with the prior art, the method has the advantages that after the data streams are obtained, the data streams ID and the rule ID are marked on the data streams, each data stream can select at least one rule through the rule ID in the searching process, and the searching process of the data streams is carried out simultaneously, so that the searching method can search the data streams in parallel without external intervention, the number of the data streams and the number of the searching rules are increased, and the searching efficiency is remarkably improved.
Matching the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain the final key information may include the following steps, as shown in fig. 3:
step 3001, determining whether the rule needs to be maintained according to whether the current data stream meets the rule maintaining condition;
step 3002, if the current data stream meets the rule holding condition, holding the rule, and matching the data stream corresponding to the data stream ID according to the regular expression defined in the held rule to obtain the final key information;
step 3003, if the current data stream does not comply with the rule holding condition, updating the rule ID, and obtaining at least one rule corresponding to the updated rule ID from a rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID to obtain final key information.
Wherein, the plurality of data streams acquired from the data stream generating system 1 are also marked with holding state information indicating whether the rule is held; the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
Specifically, according to the above embodiment, if the initial marking Rule ID1 of the data flow with the data flow ID of 1 is B1, the selected rules are Rule0 and Rule1, and meanwhile, the marking of the data flow with the data flow ID of 1 also has holding state information, where the holding state information is used to indicate whether the data flow needs to Update the Rule (Rule Update), and the holding state information is represented by a0 and a1 … An, for example, if the data flow with the data flow ID of 1 needs to use a new Rule during searching, a1 is set to be invalid (set to 0), that is, the holding state information is held. Then the current rules rule0 and rule1 are maintained, and then the data stream with the data stream ID of 1 is matched according to the maintained regular expressions defined in rule0 and rule1, so as to obtain the final key information.
In addition, in the application, the case of interactive searching of different data streams is supported, for Rule Update information, it is generally determined whether the holding state information is set to 1 by combining Rule information in a Rule history record when the Rule Update information first appears (a first slice of the data stream), and the holding state information is kept to 0 later, if all searched data streams do not appear for the first time, the holding state information is kept to 0, at this time, whether a Rule needs to be updated cannot be determined only by holding the state information, and only by adding the Rule ID information, the Rule Update information can be distinguished: if the rule ID of the current data stream is the same as the rule ID of the previous data stream, the rule does not need to be updated, and if the rule ID is different, the rule needs to be updated.
If the holding status information a1 of the data stream flag with data stream ID1 is set to be valid (set to 0), that is, the status information is not held or the rule ID of the current data stream is not the same as the rule ID of the previous data stream, the rule ID of the data stream with data stream ID1 needs to be updated, for example, the rule corresponding to the rule ID B1 of the data stream with data stream ID1 before updating is rule0 and rule1, and the rule after updating is rule2 and rule5, the data stream with data stream ID1 is matched according to the regular expression defined in the updated rule2 and rule5, and the final key information is obtained.
The method supports the staggered search of a plurality of data streams, but if the staggered search condition occurs, for example, the search rule is 'ABCD', data stream 1 and data stream 2 need to be processed, data stream 1 is processed at time 1 (processing 50%), data stream 2 is processed at time 2, data stream 1 is processed at time 3 (the remaining 50%), the last two characters of data stream 1 processed at time 1 are 'AB', the first two characters of data stream 1 processed at time 3 are 'CD', the combination of time 1 and time 3 of data stream 1 is in accordance with the hit rule 'ABCD', if the scheme of staggered search is not supported, the characters 'AB' already found at time 1 cannot be recorded, the final search result is displayed as miss when the search at time 3 is finished, and the search result is not in accordance with the actual situation, so that the search is wrong.
In order to prevent errors in the interleaving search process, occurrence information can be further marked on a plurality of data streams acquired from a data stream generation system, and the occurrence information is used for indicating the occurrence times of the data streams; matching the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain the final key information may include the following steps, as shown in fig. 4:
step 3011: dividing the data stream into N data stream slices, wherein occurrence information is marked from the first data stream slice to the Nth data stream slice, and the occurrence information is respectively as follows: first to nth occurrence;
step 3012: judging the matching mode of the data stream slice according to the occurrence information;
step 3013: if the data stream slice appears for the first time, matching the data stream slice according to a regular expression defined in the rule to obtain first-time key information;
step 3014: and if the data stream slice appears from the second time to the Nth time, matching the data stream slice according to a regular expression defined in the rule on the basis of key information obtained by the previous data stream slice until the matching of the N data stream slices is completed, and obtaining final key information.
Specifically, the data stream is divided into N data stream slices in the data cache subsystem 221, the Context Switch system 22 further includes a Context cache subsystem 222, and the Context cache subsystem 22 is configured to receive key information of the data stream slices sent by the regular expression search engine subsystem 3 and feed the key information back to the regular expression search engine subsystem 3.
For example, in the above embodiment, the data stream having a data stream ID of 0 and the data stream having a data stream ID of 1 are divided into three slices, i.e., Streaming 0_0, Streaming 0_1 and Streaming 0_2, where the three slices are all marked with occurrence information C0, a value of 0 in C0 of Streaming 0_0 indicates that the first slice, i.e., the data stream, occurs for the first time, and a value of 1 in C0 of Streaming 0_1 indicates that the second slice, i.e., the data stream, occurs for the second time. A C0 value of 2 for Streaming 0_2 indicates that the third slice, i.e., the third occurrence of the data stream, is searched from the first slice to the third slice.
And judging the matching mode of the data stream according to the occurrence information, wherein the matching mode of the first slice is as follows: the first slice is matched directly according to the regular expression defined in the rule, and the first time criticality information (key _ info0_0) is obtained. After the first slice is searched, the key information (key _ info0_0) needs to be stored in ctx0 buffer of the Context buffer. In the second slice search, the search is continued on the basis of the key _ info0_0 of the first slice. When the second slice arrives, key _ info0_0 is taken out and the second slice is searched based on key _ info0_0 to obtain key information key _ info0_1 of the second slice, and when the second slice search is finished, key _ info0_1 is stored in ctx0 for use by the third slice until key _ info0_2 is generated.
For example, the key information is the number of times of hit (the search content is matched with the rule content, that is, hit) of the search rule, where the number of times of the stream 0_0 is Count0_0, the number of times of the stream 0_1 is Count0_1, and the number of times of the stream 0_2 is Count0_2, then key _ info0_0 stored in ctx0 is Count0_0 when the search of stream 0_0 is finished, key 0_0 is taken out when the search of stream 0_1 is finished, and the number of hits is recalculated on the basis of Count0_0, and key _ info0_1 stored in ctx0 when the search of stream 0_1 is finished is Count0_0+ Count0_1, similarly, key _ info 37 stored in ctx0 when the search of stream 0_2 is finished is Count 3945 _ 3 + 0+ 3638.
The searching process of the data stream with the data stream ID of 1 comprises the following steps: dividing the data stream with the data stream ID of 1 into three slices of Streaming1_0, Streaming1_1 and Streaming1_2, similarly, searching the hit times of the data stream and the rule content as key information, the number of times of stream 1_0 coincidence is Count1_0, the number of times of stream 1_1 coincidence is Count1_1, and the number of times of stream 1_2 coincidence is Count1_2, then key _ info1_0 stored in ctx1 at the end of the stream 1_0 search is Count1_0, when searching for stream 1_1, taking out Count1_0, recalculating the hit times on the basis of Count1_0, when searching for stream 1_1 is ended, key _ info1 stored in ctx 366 _1 is Count1_0+ Count1, and when searching for stream 1_1 is ended, and storing the number of hits in ctx 1_ 462 + Count1 is Count 1+ Count 462.
The above-mentioned search process of the data stream whose data stream ID is 0 and the above-mentioned search process of the data stream whose data stream ID is 1 can be performed simultaneously, and the function of interactively searching different data streams is completed in the Context Switch system 22 through the above-mentioned slice search process, so as to prevent the error of the interleaving search process, and not only can improve the search efficiency, but also can ensure the search accuracy.
In an embodiment of the regular expression-based data stream searching system of the present application, the system may further include a key information processing system 4, where the key information processing system 4 is configured to store key information output by the regular expression search engine subsystem 3, where result _ buff0 and result _ buff1 … result _ buffn are configured to store key information of n data streams (result _ buff0 corresponds to a data stream whose data stream ID is 0, and result _ buff1 corresponds to a data stream whose data stream ID is 1, … result _ buff corresponds to a data stream whose data stream ID is n); d0 and D1 … Dn are used for representing whether the key information in the corresponding result _ buff is valid or not (if the regular expression search engine stores the key information into the corresponding result _ buff, the corresponding Dn is valid (set to be 1), and if the data in the result _ buff is taken away, the corresponding Dn is invalid (set to be 0)).
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application. Referring to the schematic structural diagram shown in fig. 5, an embodiment of the present application provides a regular expression-based data stream searching apparatus, where the apparatus includes:
a marking unit U1001 configured to mark a plurality of data streams acquired from a data stream generation system with a data stream ID indicating a number of the data stream and a rule ID indicating a number of at least one rule corresponding to the data stream ID;
a rule obtaining unit U1002, obtaining at least one rule corresponding to the rule ID from a rule cache subsystem;
and the matching unit U1003 matches the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain final key information.
Further, the matching unit U1003 may include:
the first judgment module U3001 is configured to judge whether the rule needs to be maintained according to whether the current data stream meets a rule maintenance condition;
a first matching module U3002, configured to maintain the rule if the current data stream meets the rule maintenance condition, and match the data stream corresponding to the data stream ID according to a regular expression defined in the maintained rule to obtain final key information;
a second matching module U3003, configured to update the rule ID if the current data stream does not meet the rule holding condition, obtain at least one rule corresponding to the updated rule ID from a rule cache subsystem, and match the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID, to obtain final key information.
The marking unit U1002 may further include marking a plurality of data streams acquired from the data stream generating system with holding state information indicating whether the rule is held; the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
In addition, as shown in fig. 6, the marking unit U1002 may further include marking occurrence information for a plurality of data streams acquired from a data stream generation system, the occurrence information indicating the number of occurrences of the data streams;
the matching unit U1003 may include:
a dividing module U3101, configured to divide the data stream into N data stream slices, where occurrence information is marked in all of the first to nth data stream slices, and the occurrence information is respectively: first to nth occurrence;
a second judging module U3102, configured to judge a matching manner of the data stream slices according to the occurrence information;
a third matching module U3103, configured to, if the data stream slice appears for the first time, match the data stream slice according to a regular expression defined in the rule to obtain first-time key information;
a fourth matching module U3104, configured to, if the data stream slice occurs from the second time to the nth time, match the data stream slice according to a regular expression defined in the rule on the basis of the key information obtained by the previous data stream slice until the matching of the N data stream slices is completed, so as to obtain final key information.
In a specific implementation, the present application further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in each embodiment of the rule-based message detection method provided in the present application when executed. The storage medium may be a magnetic disk, an optical disk, a ROM (read-only memory), a RAM (random access memory), or the like.
Those skilled in the art will clearly understand that the techniques in the embodiments of the present application may be implemented by way of software plus a required general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present application may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the description in the method embodiment.
The present application has been described in detail with reference to specific embodiments and illustrative examples, but the description is not intended to limit the application. Those skilled in the art will appreciate that various equivalent substitutions, modifications or improvements may be made to the presently disclosed embodiments and implementations thereof without departing from the spirit and scope of the present disclosure, and these fall within the scope of the present disclosure. The protection scope of this application is subject to the appended claims.

Claims (10)

1. A regular expression-based data stream searching method is characterized by comprising the following steps:
marking a plurality of data streams acquired from a data stream generation system with a data stream ID and a rule ID, wherein the data stream ID is used for indicating the number of the data streams, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream ID;
obtaining at least one rule corresponding to the rule ID from a rule cache subsystem;
and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information.
2. The method of claim 1,
matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information, wherein the step of matching the data stream corresponding to the data stream ID comprises the following steps:
judging whether the rule needs to be maintained according to whether the current data stream meets the rule maintaining condition;
if the current data stream meets the rule keeping condition, keeping the rule, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the kept rule to obtain final key information;
and if the current data stream does not meet the rule keeping condition, updating the rule ID, acquiring at least one rule corresponding to the updated rule ID from a rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID to obtain final key information.
3. The method of claim 2, wherein a plurality of data streams obtained from a data stream generation system are further tagged with hold state information indicating whether the rule is held;
the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
4. The method of claim 1, wherein a plurality of data streams obtained from a data stream generation system are further marked with occurrence information indicating the number of occurrences of the data streams;
matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information, wherein the step of matching the data stream corresponding to the data stream ID comprises the following steps:
dividing the data stream into N data stream slices, wherein occurrence information is marked from the first data stream slice to the Nth data stream slice, and the occurrence information is respectively as follows: first to nth occurrence;
judging the matching mode of the data stream slice according to the occurrence information;
if the data stream slice appears for the first time, matching the data stream slice according to a regular expression defined in the rule to obtain first-time key information;
and if the data stream slice appears from the second time to the Nth time, matching the data stream slice according to a regular expression defined in the rule on the basis of key information obtained by the previous data stream slice until the matching of the N data stream slices is completed, and obtaining final key information.
5. A regular expression-based data stream lookup apparatus, the apparatus comprising:
a marking unit configured to mark a plurality of data streams acquired from a data stream generation system with a data stream ID indicating a number of the data stream and a rule ID indicating a number of at least one rule corresponding to the data stream ID;
a rule obtaining unit, for obtaining at least one rule corresponding to the rule ID from a rule cache subsystem;
and the matching unit is used for matching the data stream corresponding to the data stream ID according to the regular expression defined in the rule to obtain final key information.
6. The apparatus of claim 5, wherein the matching unit comprises:
the first judgment module is used for judging whether the rule needs to be maintained or not according to whether the current data stream meets the rule maintaining condition or not;
the first matching module is used for keeping the rule if the current data stream meets the rule keeping condition, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the kept rule to obtain final key information;
and the second matching module is used for updating the rule ID if the current data stream does not accord with the rule keeping condition, acquiring at least one rule corresponding to the updated rule ID from a rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the at least one rule corresponding to the updated rule ID to obtain final key information.
7. The apparatus according to claim 6, wherein the marking unit further comprises marking a plurality of data streams acquired from the data stream generating system with holding state information indicating whether the rule is held;
the rule holding condition includes: the keeping state information is keeping and the rule ID of the current data flow is the same as the rule ID of the previous data flow.
8. The apparatus according to claim 5, wherein the marking unit further includes marking occurrence information for a plurality of data streams acquired from a data stream generation system, the occurrence information indicating the number of occurrences of the data streams;
the matching unit includes:
a dividing module, configured to divide the data stream into N data stream slices, where occurrence information is marked in all of the first to nth data stream slices, and the occurrence information is respectively: first to nth occurrence;
the second judging module is used for judging the matching mode of the data stream slice according to the occurrence information;
the third matching module is used for matching the data stream slices according to the regular expression defined in the rule to obtain first key information if the data stream slices appear for the first time;
and the fourth matching module is used for matching the data stream slices according to the regular expression defined in the rule on the basis of the key information obtained by the previous data stream slice if the data stream slices appear from the second time to the Nth time until the matching of the N data stream slices is completed, so as to obtain the final key information.
9. A regular expression-based data stream searching system is characterized by comprising a data stream generating system, a search engine control system and a regular expression search engine subsystem;
the search engine control system comprises a data caching subsystem and a Context Switch system, wherein the data caching subsystem is used for receiving a data stream sent by a data stream generation system and marking the data stream with a data stream ID and a rule ID, the data stream ID is used for indicating the number of the data stream, and the rule ID is used for indicating the number of at least one rule corresponding to the data stream;
the Context Switch system comprises a rule cache subsystem, wherein the rule cache subsystem is used for storing rules;
the regular expression search engine subsystem is used for acquiring the data stream marked with the data stream ID and the rule ID from the data cache subsystem, acquiring at least one rule corresponding to the rule ID from the rule cache subsystem, and matching the data stream corresponding to the data stream ID according to a regular expression defined in the rule to obtain final key information.
10. The system of claim 9, wherein the data stream is partitioned into N data stream slices in the data caching subsystem, and wherein the Context Switch system further comprises a Context caching subsystem, wherein the Context caching subsystem is configured to receive key information of the data stream slices sent by the regular expression search engine subsystem and feed the key information back to the regular expression search engine subsystem.
CN201911186851.9A 2019-11-28 2019-11-28 Regular expression-based data stream searching method, device and system Active CN110955683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911186851.9A CN110955683B (en) 2019-11-28 2019-11-28 Regular expression-based data stream searching method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911186851.9A CN110955683B (en) 2019-11-28 2019-11-28 Regular expression-based data stream searching method, device and system

Publications (2)

Publication Number Publication Date
CN110955683A true CN110955683A (en) 2020-04-03
CN110955683B CN110955683B (en) 2024-01-09

Family

ID=69978670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911186851.9A Active CN110955683B (en) 2019-11-28 2019-11-28 Regular expression-based data stream searching method, device and system

Country Status (1)

Country Link
CN (1) CN110955683B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040258043A1 (en) * 2003-05-28 2004-12-23 International Business Machines Corporation Packet classification
CN101442540A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 High speed mode matching algorithm based on field programmable gate array
CN106503214A (en) * 2016-11-03 2017-03-15 北京中安智达科技有限公司 A kind of complex rule matching process based on Redis memory databases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040258043A1 (en) * 2003-05-28 2004-12-23 International Business Machines Corporation Packet classification
CN101442540A (en) * 2008-12-30 2009-05-27 北京畅讯信通科技有限公司 High speed mode matching algorithm based on field programmable gate array
CN106503214A (en) * 2016-11-03 2017-03-15 北京中安智达科技有限公司 A kind of complex rule matching process based on Redis memory databases

Also Published As

Publication number Publication date
CN110955683B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
US10949442B2 (en) Method and apparatus for accelerated format translation of data in a delimited data format
US7599932B2 (en) Data storage using identifiers
TWI419002B (en) Pattern-recognition processor with matching-data reporting module
US9460196B2 (en) Conditional string search
JP2010530591A5 (en)
US20090327673A1 (en) Estimator, table managing device, selecting device, table managing method, program for allowing computer to execute the table managing method, and recording medium where the program is recorded
CN107704604A (en) A kind of information persistence method, server and computer-readable recording medium
US5841960A (en) Method of and apparartus for automatically generating test program
CN115114599A (en) Method, device and equipment for processing database watermark and storage medium
EP0099404A1 (en) Text comparator.
CN110232071A (en) Search method, device and storage medium, the electronic device of drug data
CN110955683B (en) Regular expression-based data stream searching method, device and system
CN109739854A (en) A kind of date storage method and device
JP2009134609A (en) Variable length data storage device, variable length data storage method, variable length data reading method, and program for the same
CN112613176A (en) Slow SQL statement prediction method and system
CN115525235B (en) Data operation method and system based on storage structure
CN109857740B (en) Character string storage method, matching method, electronic device and readable storage medium
CN116662327A (en) Data fusion cleaning method for database
CN110795617A (en) Error correction method and related device for search terms
CN115470223A (en) Data lake data incremental consumption method based on two-layer time identification
CN113742208B (en) Software detection method, device, equipment and computer readable storage medium
CN107562701A (en) A kind of data analysis method and its system of steel trade industry stock resource
CN113961725A (en) Automatic label labeling method, system, equipment and storage medium
CN114579580A (en) Data storage method and data query method and device
CN111143582A (en) Multimedia resource recommendation method and device for updating associative words in real time through double indexes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant