CN115577067A - Message detection method, device, system, electronic equipment and storage medium - Google Patents

Message detection method, device, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115577067A
CN115577067A CN202110685089.XA CN202110685089A CN115577067A CN 115577067 A CN115577067 A CN 115577067A CN 202110685089 A CN202110685089 A CN 202110685089A CN 115577067 A CN115577067 A CN 115577067A
Authority
CN
China
Prior art keywords
matching
rule
data
prefix
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110685089.XA
Other languages
Chinese (zh)
Inventor
王勇
熊先奎
张启明
强鹏辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202110685089.XA priority Critical patent/CN115577067A/en
Publication of CN115577067A publication Critical patent/CN115577067A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides a message detection method, a message detection device, a message detection system, electronic equipment and a storage medium. The message detection method comprises the following steps: acquiring a message to be matched; prefix matching is carried out on the message to be matched by utilizing prefix matching rule data; and when the prefix matching is successful, carrying out regular expression matching on the message to be matched by utilizing the rule matching data of the finite automaton. The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.

Description

Message detection method, device, system, electronic equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of message processing, in particular to a message detection method, a message detection device, a message detection system, electronic equipment and a storage medium.
Background
Deep Packet Inspection (DPI) technology is widely applied to the fields of network firewalls, intrusion detection, content auditing, bandwidth management and the like, and the key technology for realizing the DPI technology is pattern matching. In the early stage of technology development, accurate string matching was widely applied in deep packet inspection systems. However, as the matching mode becomes more complex, the regular expression is more and more widely applied with its flexible and efficient expression capability, and gradually becomes a core technology for realizing deep packet inspection.
At present, the deep packet inspection method using regular expression finite automata cannot adapt to the requirement of high-efficiency network packet processing due to large storage overhead or low processing speed.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the application provides a message detection method, a message detection device, a message detection system, electronic equipment and a storage medium, and can realize more efficient network message processing.
In a first aspect, an embodiment of the present application provides a packet detection method, including:
acquiring a message to be matched;
prefix matching is carried out on the message to be matched by using prefix matching rule data;
and when the prefix matching is successful, carrying out regular expression matching on the message to be matched by utilizing the rule matching data of the finite automaton.
In a second aspect, an embodiment of the present application further provides a packet detection apparatus, including:
the message acquisition module is used for acquiring a message to be matched;
the prefix matching module is used for carrying out prefix matching on the message to be matched by utilizing prefix matching rule data;
and the finite automata matching module is used for carrying out regular expression matching on the message to be matched by utilizing the finite automata rule matching data when the prefix matching is successful.
In a third aspect, an embodiment of the present application further provides a packet detection system, including:
a message detection apparatus as claimed in the second aspect;
and the external storage module is in communication connection with the message detection device and is used for storing the rule matching data of the finite automaton.
In a fourth aspect, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the message detection method according to the first aspect when executing the computer program.
In a fifth aspect, embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions, where the computer-executable instructions are configured to, when executed by a processor, implement:
the message detection method according to the first aspect.
A first aspect of an embodiment of the present application provides a packet detection method, including: acquiring a message to be matched; prefix matching is carried out on the message to be matched by using prefix matching rule data; and when the prefix matching is successful, carrying out regular expression matching on the message to be matched by using the finite automaton rule matching data. The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
It is to be understood that the advantageous effects of the second aspect to the fifth aspect compared to the related art are the same as the advantageous effects of the first aspect compared to the related art, and reference may be made to the related description of the first aspect, which is not repeated herein.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the related technical descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to the drawings without inventive labor.
Fig. 1 is a schematic diagram of a software system architecture for executing a message detection method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a message detection method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 4 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 5 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 6 is a schematic diagram of a software system architecture for executing a message detection method according to another embodiment of the present application;
fig. 7 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 8 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 9 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 10 is a schematic flowchart of a message detection method according to another embodiment of the present application;
fig. 11 is a schematic structural diagram of a message detection apparatus according to an embodiment of the present application;
fig. 12 is a schematic architecture diagram of a message detection system according to an embodiment of the present application;
fig. 13 is a schematic architecture diagram of a message detection system according to another embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the embodiments of the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the embodiments of the present application with unnecessary detail.
It should be noted that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different from that in the flowcharts. The terms first, second and the like in the description and in the claims, as well as in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
It should also be appreciated that reference throughout the specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.
Deep Packet Inspection (DPI) technology is widely applied to the fields of network firewalls, intrusion detection, content auditing, bandwidth management and the like, and the key technology for realizing the DPI technology is pattern matching. In the early stage of technology development, accurate string matching was widely applied in deep packet inspection systems. However, as the matching pattern becomes more complex, the regular expression is more and more widely applied with its flexible and efficient expression capability, and gradually becomes a core technology for implementing deep packet inspection.
Finite Automata (FA) is used to identify the characteristics of regular expressions, and includes Deterministic Finite Automata (DFA) and non-Deterministic Finite Automata (NFA). One regular expression can be converted into NFA, and then converted into DFA by NFA, and the three are equivalent. The DFA is different from the NFA in that the former has a certain next-time active state for an input and an active state at a certain time; the latter may have one or more next-in-time active states for a certain time-in and active state. That is, for a given state and input character, there may be more successors to the NFA and only successors to the DFA. The NFA-based matching algorithm is low in space complexity and small in occupied space, and the storage space and the scale of the regular expression rule set are in a linear relation; but the time complexity is high, and the matching speed is low. The matching algorithm based on the DFA has high matching speed and stable matching performance, but the DFA algorithm needs much larger storage space than the NFA and grows exponentially in the worst case, so that the DFA algorithm is easy to generate state explosion.
At present, the deep packet inspection method using regular expression finite automata cannot adapt to the requirement of high-efficiency network packet processing due to large storage overhead or low processing speed. For example, the current deep packet inspection method using regular expression finite automata is mostly implemented based on software running in a Central Processing Unit (CPU), and has the following two challenges: on the first hand, the regular expression rule sets are more and more large in number and can reach millions of rule sets, the feature description is more and more complex, the storage cost of the huge finite automata is caused, and even under many conditions, the finite automata cannot be realized; in the second aspect, network traffic is growing by about 50% annually, and currently 100Gbps networks have become popular at the network edge, and over 400Gbps networks have been deployed in core backbone networks. The deep packet inspection method based on the software finite automaton is difficult to adapt to the demand of faster and faster network processing.
Based on this, the embodiment of the application provides a message detection method, device, system, electronic device and storage medium. The message detection method comprises the following steps: acquiring a message to be matched; prefix matching is carried out on the message to be matched by utilizing prefix matching rule data; and when the prefix matching is successful, carrying out regular expression matching on the message to be matched by utilizing the rule matching data of the finite automaton. The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching with the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
It should be noted that the embodiment of the present application may be applied to various application scenarios requiring deep packet inspection, including but not limited to intrusion detection related to network security, such as virus attack protection; network monitoring related keyword retrieval and content filtering; application layer traffic identification and statistics related to mobile network charging, etc.
The message detection method provided in one embodiment of the present application may be executed in a processor, and the processor may include one or more processing units. The different processing units may be separate devices or may be integrated in one or more devices.
The embodiments of the present application will be further explained with reference to the drawings.
As shown in fig. 1, fig. 1 is a schematic diagram of a software system architecture for executing a message detection method according to an embodiment of the present application. In the example of fig. 1, the system architecture includes a message acquisition module 110, a message aggregation module 120, a prefix matching module 130, a finite automaton matching module 140, a scheduling module 150, a rule compiling module 310, a storage control module 160, a storage module 210, and a matching result aggregation module 170.
The message obtaining module 110 may also be referred to as a Media Access Control (MAC) receiving module in some embodiments, and may implement ethernet interface functions of 1G,10G,100G, and the like, so as to implement message transceiving; the message acquiring module 110 may instantiate a plurality of messages according to the actual application scenario.
The message aggregation module 120 is connected to the message acquisition module 110, and the message aggregation module 120 caches the message received by each message acquisition module 110, and then distributes the message to the prefix matching module 130.
The prefix matching module 130 is connected to the message aggregation module 120, and configured to perform prefix matching. In some embodiments, the prefix matching module 130 needs to match some or all bytes of the payload portions of all received messages, which requires high performance. Therefore, in some embodiments, prefix matching module 130 employs a precise string matching algorithm implemented based on an internal RAM (Random Access Memory) of an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit), where the precise string matching algorithm may be a P-AC algorithm (Pipelined Aho-chip, pipelined AC algorithm) or a QSV (quick sampling verification) algorithm, etc. The system implementation may instantiate one or more prefix matching modules 130 according to different processing bandwidth needs.
And the finite automaton matching module 140 is connected with the prefix matching module 130. In the embodiment of the present application, the finite automata matching module 140 is placed at the next stage of the prefix matching module 130, and as long as the message hits the prefix rule in the prefix matching module 130, the message will continue to enter the finite automata matching module 140 for continuous complete matching, so that the matching operation of the finite automata can be saved, and the overall matching performance can be improved. In some embodiments, the matching rules in the finite automaton matching module 140 are dynamically read from the storage module 210 and loaded into the scheduling module 150 according to the matching result of each packet in the prefix matching module 130, so as to implement the subsequent complete matching.
The scheduling module 150, the scheduling module 150 is connected to the prefix matching module 130, the finite automata matching module 140 and the storage control module 160, respectively. In some embodiments, the scheduling module 150 has two main functions, that is, loading the prefix matching rule data compiled by the rule compiling module 310 into the prefix matching module 130, and loading the finite automata rule matching data into the storage module 210; secondly, according to the matching result of the prefix matching module 130, the corresponding finite automata rule matching data is read from the storage module 210 and loaded into the finite automata matching module 140 for each message to be matched.
And the rule compiling module 310 is in communication connection with the scheduling module 150 and is used for generating prefix matching rule data and/or finite automata rule matching data and sending the prefix matching rule data and/or the finite automata rule matching data to the scheduling module 150.
The storage control module 160 is connected to the scheduling module 150, and is configured to manage, read, and write the storage module 210.
And the storage module 210 is connected with the storage control module 160 and is used for storing the finite automata rule matching data. In some embodiments, the memory module 210 may be an external memory module 210, such as a DDR memory module 210.
And the matching result aggregation module 170 is connected to the finite automata matching module 140 and the prefix matching module 130, and is configured to reorganize and output information (including the original message and the matching result) output by the finite automata matching module 140 or the prefix matching module 130 into a format defined by a subsequent application.
The system architecture and the application scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and it is known by those skilled in the art that the technical solution provided in the embodiment of the present application is also applicable to similar technical problems with the evolution of the system architecture and the appearance of new application scenarios.
Those skilled in the art will appreciate that the system architecture shown in FIG. 1 is not intended to be limiting of embodiments of the present application and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
In the system architecture shown in fig. 1, each component may call its stored message detection program to execute the message detection method.
Based on the system architecture, embodiments of the message detection method according to the embodiments of the present application are provided.
As shown in fig. 2, an embodiment of the present application provides a packet detection method, including:
step S1100, obtaining a message to be matched;
step S1200, prefix matching is carried out on the message to be matched by using prefix matching rule data;
and step S1300, carrying out regular expression matching on the message to be matched by utilizing the rule matching data of the finite automaton when the affix matching is successful.
In some embodiments, the optimized message matching algorithm is realized based on the combination of prefix matching and finite automata matching, and the starting point is that the messages are filtered by using the prefix matching, and then the messages needing to be further matched are completely matched by using a priority automata matching method, so that the matching performance is improved.
In some embodiments, step S1100, a message obtaining module (e.g., a MAC receiving module) may be utilized to receive and obtain a network message as a message to be matched. Step S1200, prefix matching is carried out on the message to be matched by using prefix matching rule data by using a prefix matching module, and the number of the prefix matching rule data is multiple. If the prefix matching rule data are not matched and hit, the network message does not enter the matching module of the finite automaton, but is directly sent to the matching result convergence module for processing. Step 1300, if the network message matches one or more pieces of prefix matching rule data in the prefix matching module, the corresponding network message is transmitted to the finite automata matching module for subsequent complete matching, and then the original network message and the matching result are transmitted to the matching result aggregation module for processing.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching with the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In some embodiments, the finite automaton rule matching data is NFA rule matching data or DFA rule matching data, and correspondingly, the regular expression matching is NFA matching or DFA matching.
In some embodiments, the finite automata matching module may be an NFA matching module or a DFA matching module; correspondingly, the finite automata rule matching data can be NFA rule matching data or DFA rule matching data; correspondingly, the regular expression matching is NFA matching or DFA matching. The NFA matching module is based on a matching algorithm of the NFA, space complexity is low, occupied space is small, a linear relation is formed between a storage space and the scale of a regular expression rule set, time complexity is high, and matching speed is low. The characteristic of low storage occupation of the NFA matching algorithm can be utilized, the storage space of the NFA matching algorithm is saved, and the system matching performance is improved. The DFA matching module is fast in matching speed and stable in matching performance based on the matching algorithm of the DFA.
In some embodiments, prefix matching is a string exact match.
In some embodiments, prefix matching employs high processing capabilities of a string exact match approach to filtering messages. Obviously, other matching methods may also be used to perform prefix matching, which is not limited in this application.
Referring to fig. 3, in some embodiments, the prefix matching rule data is data compiled by using a P-AC or QSV algorithm, and correspondingly, performing prefix matching on the packet to be matched by using the prefix matching rule data includes:
and step S1210, performing prefix matching on the message to be matched by using the P-AC or QSV algorithm according to the prefix matching rule data.
In some embodiments, the prefix strings of these regular expressions may be compiled into prefix matching rule data using a P-AC method or a QSV method; and then, performing prefix matching by using the step S1210, namely performing prefix matching on the message to be matched by using a P-AC (peer-to-peer communication) or QSV (quad small form-factor pluggable) algorithm by using the prefix matching rule data. The prefix matching processing speed is high, so that the efficiency of system matching detection can be effectively improved.
Referring to fig. 4, in some embodiments, before performing regular expression matching on a packet to be matched by using finite automata rule matching data, the method further includes:
step S1310, according to the prefix matching rule data hit by prefix matching, obtaining corresponding finite automata rule matching data.
In some embodiments, if the message matches one or more pieces of prefix matching rule data in the prefix matching module (i.e., prefix matching is successful), a prefix matching result (hit prefix matching rule data) is sent to the scheduling module, and the scheduling module reads corresponding finite automata matching rule data from the storage module and loads the finite automata matching rule data into the finite automata matching module by using the prefix matching result as an index. The network message which hits the prefix matching rule in the prefix matching module is also transmitted to the finite automata matching module for subsequent matching.
Referring to fig. 5, in some embodiments, the method further comprises:
step 1410, obtaining a regular expression;
step S1420, prefix character strings in the regular expression are extracted, and prefix matching rule data are generated;
and step S1430, generating the rule matching data of the finite automaton according to the regular expression.
In some embodiments, steps S1410 through S1430 may be performed with a rule compiling module generating prefix matching rule data and finite automata rule matching data. The rule compiling module can compile prefix character strings of the regular expressions into prefix matching rule data by using a P-AC method or a QSV method; the rule compiling module may compile the regular expression rule set into finite automata rule matching data, for example, the rule compiling module may compile the rule set into NFA rule matching data using a MX-NFA (Memory Based NFA, memory Based non-deterministic finite automata) algorithm. In some embodiments, the rule compilation module may be implemented in software off-chip (external to an FPGA or ASIC), such as in an off-chip CPU. The length of the prefix string can be determined as required, and for example, the length of the prefix string can be set to be 4 to 16 bytes.
Referring to fig. 6, in some embodiments, the method is implemented within an FPGA or ASIC or the like. It should be noted that the FPGA or the ASIC is only an example, and other high performance integrated circuits may be used instead, and the present application is not limited thereto.
In some embodiments, a high parallel and high operating frequency of the FPGA or ASIC may be used to further improve the efficiency of message detection. In some embodiments, the message acquisition module 110, the message aggregation module 120, the prefix matching module, the finite automata matching module, the scheduling module, the rule compiling module, the storage control module, and the matching result aggregation module in the software system architecture for executing the message detection method may be implemented in on-chip (inside the FPGA or ASIC) hardware, while the storage module and the rule compiling module are off-chip (outside the FPGA or ASIC), thereby implementing the hardware circuit independent from the matching rule set. Namely, the rule compiling module is outside the chip, so that the on-chip computing resources can be saved; the rule matching data of the finite automaton is stored in an external storage module, so that on-chip storage resources can be saved. The external storage module can adopt a storage module with a large storage space, such as a DDR storage module, so as to be beneficial to realizing the storage of the large-scale regular expression rule set. In some embodiments, the method proposed by the embodiments of the present application can support 100Gbps line speed matching under the condition of one million rule set scales.
Referring to fig. 7, in some embodiments, before performing regular expression matching on a to-be-matched packet by using finite automata rule matching data, the method further includes:
step S1320, according to the prefix matching rule data hit by prefix matching, obtaining corresponding finite automata rule matching data from the external storage module.
In some embodiments, the finite automata rule matching data is stored in an external storage module, which may save on-chip storage resources. The external storage module can adopt a storage module with a large storage space, such as a DDR storage module, so as to be beneficial to realizing the storage of the large-scale regular expression rule set. In step S1320, if the message matches one or more pieces of prefix matching rule data in the prefix matching module (i.e., prefix matching is successful), the prefix matching result (hit prefix matching rule data) is sent to the scheduling module, and the scheduling module reads corresponding finite automata matching rule data from the storage module and loads the finite automata matching rule data into the finite automata matching module by using the prefix matching result as an index.
Referring to fig. 8, in some embodiments, the method further comprises:
step S1510, obtaining the matching data of the finite automaton rule from the rule compiling module;
step S1520, write the rule matching data of the finite automaton into the external storage module.
In some embodiments, finite automata rule matching data is stored in an external storage module, which may save on-chip storage resources. The external storage module can adopt a storage module with a large storage space, such as a DDR storage module, so as to be beneficial to realizing the storage of the large-scale regular expression rule set. The scheduling module loads the finite automata rule matching data compiled by the rule compiling module into an external storage module through the storage control module by executing the step S1510 and the step S1520 for subsequent calling.
Referring to fig. 9, in some embodiments, the method further comprises:
step S1610, prefix matching rule data from the rule compiling module is obtained;
step S1620, writing the prefix matching rule data into an internal storage module of the FPGA or the ASIC.
In some embodiments, the internal storage module of the FPGA or ASIC is an SRAM (Static Random-Access Memory). The efficiency of prefix matching can be further improved by combining the high random lookup performance of the SRAM. The scheduling module loads the prefix matching rule data compiled by the rule compiling module into the prefix matching module by executing step S1610 and step S1620.
The following further explains the embodiments of the present application with reference to the example one. In the first example, a matching algorithm combining string exact matching and NFA matching is taken as an example for explanation, a message is filtered by using high processing performance of a string exact matching method, and then complete matching is performed on a message which needs to be further matched by using an NFA method. On one hand, the matching performance is considered, and on the other hand, a large-capacity rule set is supported.
Example 1
Referring to fig. 10, the message detection method of the first example mainly includes the following steps:
a1, compiling a rule set to generate rule set matching data realized based on an SRAM;
step A1.1, a rule compiling module extracts prefix character strings in the regular expression rules, and if the length of the character strings is 4-16 bytes. Compiling the character strings into prefix matching rule data (prefix rule for short) realized based on FPGA or ASIC by using a P-AC method or a QSV algorithm;
step A1.2, the rule compiling module compiles the rule set into NFA rule matching data (NFA rule for short) by adopting an MX-NFA algorithm.
And step A2, the scheduling module correspondingly loads the prefix matching rule data and the NFA rule matching data generated by the rule compiling module into the prefix matching module and the external storage module respectively.
Step A3, matching and processing the message;
step A3.1, the MAC receiving module receives a message to be matched (hereinafter referred to as message) from the network interface and transmits the message to the message converging module.
And step A3.2, all the received messages enter a prefix matching module for prefix matching, and if no rule is hit in the prefix matching module, the messages do not enter an NFA matching module but are directly sent to a matching result aggregation module for processing. If the message is matched with one or more prefix rules in the prefix matching module, the prefix matching result is sent to the scheduling module, and the scheduling module reads corresponding NFA rule matching data from the storage module and loads the NFA rule matching data into the NFA matching module by taking the prefix matching result as an index. The message of the hit rule in the prefix matching module is also transmitted to the NFA matching module for subsequent matching.
And step A3.3, matching the complete regular expression rule in the NFA matching module for the message hitting the prefix rule in the prefix matching module, and after matching is completed, sending the matching result and the original message to the matching result aggregation module for processing by the NFA matching module.
And step A4, the message matching convergence module reorganizes the original message and the matching result into a required format and provides the required format for application calling of subsequent processing.
The implementation process of the embodiment of the present application is illustrated below by taking as an example that the message detection device is implemented in an FPGA, the storage module is an external DDR storage module, and the regular expression "abcdZTE × suffix" is retrieved in the following actual measurement text.
Actually measuring a text:
“In many distributed hard real-time and safety-critical application domains,abcdZTE001suffix such as automotive and industrial control applications”。
firstly, extracting a regular expression rule and dividing the regular expression rule into two parts, wherein the first part is a pure character string 'abcdZT' called a prefix character string; the second part is the complete rule "abcdZTE x suffix"; and compiling the complete regular expression rule into lookup table data according to the MX-NFA algorithm and storing the lookup table data in the DDR.
The prefix character strings are compiled into a prefix matching module according to a PAC \ QSV algorithm, a text to be detected firstly enters the prefix matching module to carry out prefix character string matching, and in the process, "abcdZT" in the sample text is hit.
After the 'abcdZT' is hit by the prefix matching module, the FPGA dispatches the complete rule 'abcdZTE x suffix' from the DDR, and MX-NFA matching is carried out on the rest texts.
In the MX-NFA matching module, "abcdZTE001suffix" is detected from the text to accord with the rule of "abcdZTE × suffix".
And matching the text to the last byte, and confirming that the text only matches the rule of ' abcdZTE ' suffix '.
The deep packet inspection method facing 100Gbps and supporting million rule sets is realized by combining the advantages of high matching performance of an accurate character string matching method, low storage occupation of an NFA matching method and high random lookup performance of SRAM and large capacity of a DDR storage module.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In addition, referring to fig. 11, the present application further provides a packet detection apparatus, including:
a message obtaining module 110, configured to obtain a message to be matched;
a prefix matching module 130, configured to perform prefix matching on the message to be matched by using prefix matching rule data;
and the finite automata matching module 140 is configured to, when prefix matching is successful, perform regular expression matching on the message to be matched by using the finite automata rule matching data.
It should be noted that the message detection apparatus in this embodiment may be applied to a message detection apparatus in a system architecture of the embodiment shown in fig. 1; in addition, the packet detection apparatus in this embodiment may execute the packet detection method in the embodiment shown in fig. 2. That is, the message detection apparatus in this embodiment, the message detection apparatus in the system architecture of the embodiment shown in fig. 1, and the message detection method in the embodiment shown in fig. 2 all belong to the same inventive concept, so these embodiments have the same implementation principle and technical effect, and are not described in detail here.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In some embodiments, the packet detection apparatus further comprises a scheduling module. The scheduling module has two main functions, namely loading the prefix matching rule data compiled by the rule compiling module into the prefix matching module and loading the complete rule set data into DDR storage particles. And secondly, reading corresponding NFA rule matching data from the DDR storage medium and loading the NFA rule matching data into the NFA matching module for each message to be matched according to the matching result of the prefix matching module.
In some embodiments, the scheduling module is configured to obtain corresponding finite automata rule matching data according to prefix matching rule data hit by prefix matching. The scheduling module may be configured to execute step S1310 or step S1320 in the foregoing embodiment, obtain corresponding finite automata rule matching data according to prefix matching rule data hit by prefix matching, and load the finite automata rule matching data into the finite automata matching module. Please refer to the related description of step S1310 or step S1320, which is not described herein.
In some embodiments, the scheduling module is further configured to obtain the finite automata rule matching data from the rule compiling module and write the finite automata rule matching data to the external storage module. The scheduling module may be configured to perform step S1510 and step S1520 in the above embodiments, and load the finite automata rule matching data compiled by the rule compiling module into the external storage module through the storage control module. Please refer to the related description of step S1310 and step S1320, and details thereof are not repeated herein.
In some embodiments, the scheduling module is further configured to obtain prefix matching rule data from the rule compiling module, and write the prefix matching rule data into an internal storage module of the FPGA or the ASIC. The scheduling module may be configured to perform steps S1610 and S1620 in the above embodiment, and load the prefix matching rule data compiled by the rule compiling module into the prefix matching module. Please refer to the above description of step S1610 and step S1620, and details thereof are not repeated herein.
In some embodiments, an apparatus for packet inspection further comprises:
the rule compiling module is used for acquiring a regular expression; extracting prefix character strings in the regular expression to generate prefix matching rule data; and generating the rule matching data of the finite automaton according to the regular expression.
In some embodiments, steps S1410 to S1430 described above may be performed by a rule compiling module to generate prefix matching rule data and finite automata rule matching data. The rule compiling module can compile prefix character strings of the regular expressions into prefix matching rule data by using a P-AC method or a QSV method and the like; the rule compilation module may compile the regular expression rule set into finite automata rule matching data, e.g., the rule compilation module may compile the rule set into NFA rule matching data using the MX-NFA algorithm. The length of the prefix string can be determined as required, and for example, the length of the prefix string can be set to 4 to 16 bytes.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In addition, referring to fig. 12, the present application further provides a packet detection system, including:
a message detection apparatus 100 as described above;
and the external storage module 200 is in communication connection with the message detection device and is used for storing the rule matching data of the finite automata.
In some embodiments, the message detection system includes the message detection apparatus of the embodiment shown in fig. 11. The message detection device can be realized on-chip (inside an FPGA or an ASIC) through hardware, and the storage module and the rule compiling module are outside the chip (outside the FPGA or the ASIC), so that the hardware circuit and the matching rule set are independent. The finite automata rule matching data is stored in the external storage module, so that on-chip storage resources can be saved. The external storage module can adopt a storage module with a large storage space, such as a DDR storage module, so as to be beneficial to realizing the storage of the large-scale regular expression rule set. In some embodiments, the method proposed by the embodiments of the present application can support 100Gbps line speed matching under the condition of one million rule set scales.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In some embodiments, a message detection system further comprises:
and the rule compiling module is in communication connection with the message detection device and is used for generating prefix matching rule data and/or finite automata rule matching data and sending the prefix matching rule data and/or the finite automata rule matching data to the message detection device.
In some embodiments, the steps S1410 to S1430 may be performed by a rule compiling module to generate prefix matching rule data and finite automata rule matching data. The rule compiling module can compile prefix character strings of the regular expressions into prefix matching rule data by using a P-AC method or a QSV method and the like; the rule compilation module may compile the regular expression rule set into finite automata rule match data, e.g., the rule compilation module may compile the rule set into NFA rule match data using the MX-NFA algorithm. In some embodiments, the rule compilation module may be implemented in software off-chip (external to an FPGA or ASIC), such as in an off-chip CPU. The length of the prefix string can be determined as required, and for example, the length of the prefix string can be set to 4 to 16 bytes.
The following further explains the embodiment of the message detection system according to the present application with example two.
Example two
Considering the implementation requirements of both performance and storage, the starting point of the example is to combine the high matching performance of the precise string matching method and the low storage occupation characteristic of the NFA matching method, and combine the high random lookup performance of the SRAM and the large capacity of the DDR storage medium, so as to implement a deep packet inspection system which is oriented to 100Gbps and supports a million rule sets. The method for processing regular expression matching in this example is to rewrite the regular expression rules in the rule set and separate the prefix (pure string) of each rule. The prefix part is compiled by adopting a high-performance character string matching method (such as a pipeline-based P-AC algorithm or a QSV algorithm), and high-performance prefix matching is realized based on SRAM hardware in an FPGA or an ASIC. By utilizing the advantage of small storage occupation of the NFA matching method, the rule compiling module compiles all rules in the rule set into NFA state transition data and stores the NFA state transition data in an external DDR memory of an FPGA or an ASIC. All messages to be matched are preliminarily matched in the prefix matching module so as to determine that subsequent processing needs to use one or more rules of the rule set for subsequent matching, so that the bandwidth of NFA matching is saved.
Referring to fig. 13, the rule set rule compiling module 310 of the present example is implemented by software off-chip, the storage module adopts the external DDR storage module 210, and the message detection apparatus 100 is implemented in FPGA or ASIC chip. The functions of the respective modules are as follows.
The MAC receiving module 111: the functions of 1G,10G and 100G Ethernet interfaces are realized, and message receiving and sending are realized; the MAC receiving module 111 may be instantiated in plurality according to an actual application scenario. The MAC receiving module 111 is inside the message detection apparatus 100.
The message aggregation module 120: the message aggregation module 120 buffers the message received by each MAC receiving module 111, and then distributes the message to the prefix matching module 130. The message aggregation module 120 is internal to the message detection apparatus 100.
Prefix matching module 130: the prefix matching module 130 needs to match all bytes of the payload portions of all received messages, and needs high performance. Thus, prefix matching module 130 employs an exact string matching algorithm (e.g., a P-AC algorithm or a QSV algorithm) implemented based on FPGA or ASIC internal RAM. The system implementation may instantiate one or more prefix matching modules 130 according to different processing bandwidth needs. The prefix matching module 130 is internal to the message detection apparatus 100.
The NFA matching module 140: the NFA matching method has the advantages of less storage occupation and poor matching performance. Therefore, in the method, the NFA matching module 140 is placed at the next stage of the prefix matching module 130, and as long as the message hits the prefix rule in the prefix matching module 130, the message will continue to enter the NFA matching module 140 to continue complete matching, so that the matching performance of the NFA can be saved. The matching rule in the NFA matching module 140 is dynamically read from the DDR storage module 211 by the scheduling module 150 according to the matching result of each packet in the prefix matching module 130 and loaded into the module, so as to implement subsequent complete matching. The NFA matching module 140 is internal to the message detection apparatus 100.
The scheduling module 150: the scheduling module 150 has two main functions, namely, loading the prefix matching rule data compiled by the rule compiling module into the prefix matching module 130, and loading the complete rule set data into the DDR memory granule. Secondly, according to the matching result of the prefix matching module 130, for each message to be matched, the corresponding NFA rule matching data is read from the DDR storage medium and loaded into the NFA matching module 140. The scheduling module 150 is internal to the message detection apparatus 100.
And the rule compiling module 310 is in communication connection with the scheduling module 150 and is used for generating prefix matching rule data and/or finite automata rule matching data and sending the prefix matching rule data and/or the finite automata rule matching data to the scheduling module 150. The rule compiling module 310 is implemented outside the message detection apparatus 100.
The DDR controller 161: the DDR controller 161 is used to complete the management, reading, and writing of the external DDR memory module 211. The mac ddr controller 161 is internal to the message detection apparatus 100.
The DDR memory module 211: and is connected with the DDR controller 161 for storing finite automata rule matching data. The DDR memory module 211 is an external memory module of the message detection apparatus 100.
The matching result aggregation module 170: the matching result aggregation module 170 reorganizes the original message and the matching result into a format defined by a subsequent application. The matching result aggregation module 170 is inside the message detection apparatus 100.
The embodiment of the application triggers the subsequent finite automata matching by using the prefix matching result through a method of combining the prefix matching and the finite automata matching so as to reduce the bandwidth requirement of the finite automata matching algorithm and improve the processing efficiency of message detection, thereby being suitable for the network message processing requirement with higher efficiency.
In addition, the present application also provides an electronic device including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the message detection method as described above when executing the computer program.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
It should be noted that the electronic device in this embodiment may be applied to an electronic device in a system architecture of the embodiment shown in fig. 1, fig. 6, fig. 11, fig. 12, or fig. 13; in addition, the electronic device in this embodiment may execute the packet detection method in the embodiments shown in fig. 2, fig. 3, fig. 4, fig. 5, fig. 7, fig. 8, fig. 9, or fig. 10. That is, the electronic device in this embodiment, the electronic device in the system architecture of the embodiment shown in fig. 1, fig. 6, fig. 11, fig. 12, or fig. 13, and the message detection method in the embodiment shown in fig. 2, fig. 3, fig. 4, fig. 5, fig. 7, fig. 8, fig. 9, or fig. 10 all belong to the same inventive concept, so that these embodiments have the same implementation principle and technical effect, and are not described in detail here.
The non-transitory software programs and instructions required to implement the message detection method of the above-mentioned embodiment are stored in the memory, and when executed by the processor, the message detection method of the above-mentioned embodiment is executed, for example, the method steps S1100 to S1300 in fig. 2, the method steps S1100 to S1300 in fig. 3, the method steps S1100 to S1300 in fig. 4, the method steps S1410 to S1430 in fig. 5, the method steps S1100 to S1300 in fig. 7, the method steps S1510 to S1520 in fig. 8, the method steps S1610 to S1620 in fig. 9, and the method steps A1 to A4 in fig. 10 are executed.
Additionally, the present application also provides a computer-readable storage medium storing computer-executable instructions for implementing, when executed by a processor:
such as the aforementioned message detection method.
In some embodiments, the computer-readable storage medium stores computer-executable instructions, which are executed by a processor or controller, for example, by a processor in the embodiment of the diagnostic analysis system 100, and cause the processor to execute the message detection method in the embodiment, for example, the method steps S1100 to S1300 in fig. 2, the method steps S1100 to S1300 in fig. 3, the method steps S1100 to S1300 in fig. 4, the method steps S1410 to S1430 in fig. 5, the method steps S1100 to S1300 in fig. 7, the method steps S1510 to S1520 in fig. 8, the method steps S1610 to S1620 in fig. 9, and the method steps A1 to A4 in fig. 10.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described in detail, it will be understood, however, that the invention is not limited to those precise embodiments, and that various other modifications and substitutions may be affected therein by one skilled in the art without departing from the scope of the invention.

Claims (19)

1. A message detection method comprises the following steps:
acquiring a message to be matched;
prefix matching is carried out on the message to be matched by utilizing prefix matching rule data;
and when the prefix matching is successful, carrying out regular expression matching on the message to be matched by utilizing the rule matching data of the finite automaton.
2. The method of claim 1, wherein the finite automata rule matching data is non-deterministic finite automata rule matching data or deterministic finite automata rule matching data, and correspondingly, the regular expression matching is non-deterministic finite automata matching or deterministic finite automata matching.
3. The method of claim 1, wherein the prefix match is a string exact match.
4. The method according to claim 3, wherein the prefix matching rule data is compiled using a P-AC or QSV algorithm, and correspondingly, performing prefix matching on the packet to be matched by using the prefix matching rule data includes:
and performing prefix matching on the message to be matched by using a P-AC or QSV algorithm by using prefix matching rule data.
5. The method according to claim 1, wherein before performing regular expression matching on the packet to be matched by using finite automata rule matching data, the method further comprises:
and acquiring corresponding finite automata rule matching data according to the prefix matching rule data hit by prefix matching.
6. The method of any of claims 1 to 5, further comprising:
obtaining a regular expression;
extracting prefix character strings in the regular expression to generate prefix matching rule data;
and generating the rule matching data of the finite automaton according to the regular expression.
7. The method according to any of claims 1 to 5, wherein the method is implemented inside an FPGA or ASIC.
8. The method according to claim 7, wherein before performing regular expression matching on the packet to be matched by using finite automata rule matching data, the method further comprises:
and acquiring corresponding finite automata rule matching data from an external storage module according to the prefix matching rule data hit by prefix matching.
9. The method of claim 8, further comprising:
acquiring the rule matching data of the finite automaton from the rule compiling module;
and writing the finite automata rule matching data into the external storage module.
10. The method of claim 7, further comprising:
acquiring prefix matching rule data from a rule compiling module;
and writing the prefix matching rule data into an internal storage module of the FPGA or the ASIC.
11. A packet inspection device, comprising:
the message acquisition module is used for acquiring a message to be matched;
the prefix matching module is used for carrying out prefix matching on the message to be matched by utilizing prefix matching rule data;
and the finite automata matching module is used for carrying out regular expression matching on the message to be matched by utilizing the finite automata rule matching data when the prefix matching is successful.
12. The apparatus of claim 11, further comprising:
and the scheduling module is used for acquiring corresponding finite automata rule matching data according to the prefix matching rule data hit by prefix matching.
13. The apparatus of claim 12,
the scheduling module is also used for acquiring the finite automata rule matching data from the rule compiling module and writing the finite automata rule matching data into an external storage module.
14. The apparatus of claim 12,
the scheduling module is further configured to obtain prefix matching rule data from a rule compiling module, and write the prefix matching rule data into an internal storage module of the FPGA or the ASIC.
15. The apparatus of claim 11, further comprising:
the rule compiling module is used for acquiring a regular expression; extracting prefix character strings in the regular expression to generate prefix matching rule data; and generating the rule matching data of the finite automaton according to the regular expression.
16. A message detection system, comprising:
a message detection apparatus according to claims 11 to 14;
and the external storage module is in communication connection with the message detection device and is used for storing the rule matching data of the finite automata.
17. The system of claim 16, further comprising:
and the rule compiling module is in communication connection with the message detection device and is used for generating the prefix matching rule data and/or the finite automata rule matching data and sending the prefix matching rule data and/or the finite automata rule matching data to the message detection device.
18. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the message detection method according to any of claims 1 to 10 when executing the computer program.
19. A computer-readable storage medium storing computer-executable instructions for performing, when executed by a processor, the steps of:
the message detection method according to any of claims 1 to 10.
CN202110685089.XA 2021-06-21 2021-06-21 Message detection method, device, system, electronic equipment and storage medium Pending CN115577067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110685089.XA CN115577067A (en) 2021-06-21 2021-06-21 Message detection method, device, system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110685089.XA CN115577067A (en) 2021-06-21 2021-06-21 Message detection method, device, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115577067A true CN115577067A (en) 2023-01-06

Family

ID=84579269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110685089.XA Pending CN115577067A (en) 2021-06-21 2021-06-21 Message detection method, device, system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115577067A (en)

Similar Documents

Publication Publication Date Title
CN107608750B (en) Device for pattern recognition
US8473523B2 (en) Deterministic finite automata graph traversal with nodal bit mapping
US8180803B2 (en) Deterministic finite automata (DFA) graph compression
KR101648235B1 (en) Pattern-recognition processor with matching-data reporting module
CN108256164B (en) Boolean logic in a state machine lattice
CN107256156B (en) Method and system for detection in state machines
US7565380B1 (en) Memory optimized pattern searching
CN110287163B (en) Method, device, equipment and medium for collecting and analyzing security log
US20090138440A1 (en) Method and apparatus for traversing a deterministic finite automata (DFA) graph compression
US20150242429A1 (en) Data matching based on hash table representations of hash tables
US20150324457A1 (en) Ordering a Set of Regular Expressions for Matching Against a String
US9398117B2 (en) Protocol data unit interface
WO2009121289A1 (en) Method and apparatus for constructing pattern matching state machine and pattern recognition
US8543528B2 (en) Exploitation of transition rule sharing based on short state tags to improve the storage efficiency
US20180375773A1 (en) Technologies for efficient network flow classification with vector bloom filters
US20060259508A1 (en) Method and apparatus for detecting semantic elements using a push down automaton
US10397263B2 (en) Hierarchical pattern matching for deep packet analysis
CN107025230B (en) Processing method and device for web crawler
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN113411380A (en) Processing method, logic circuit and equipment based on FPGA (field programmable gate array) programmable session table
US20100205411A1 (en) Handling complex regex patterns storage-efficiently using the local result processor
CN115577067A (en) Message detection method, device, system, electronic equipment and storage medium
CN114827030B (en) Flow classification device based on folded SRAM and table entry compression method
CN107643892B (en) Interface processing method, device, storage medium and processor
Liu et al. A de-compositional approach to regular expression matching for network security

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination