CN114036353A - Data packet matching method and device, electronic equipment and storage medium - Google Patents

Data packet matching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114036353A
CN114036353A CN202111388835.5A CN202111388835A CN114036353A CN 114036353 A CN114036353 A CN 114036353A CN 202111388835 A CN202111388835 A CN 202111388835A CN 114036353 A CN114036353 A CN 114036353A
Authority
CN
China
Prior art keywords
data
matched
matching
data packet
protocol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111388835.5A
Other languages
Chinese (zh)
Inventor
陈旭
时幸伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202111388835.5A priority Critical patent/CN114036353A/en
Publication of CN114036353A publication Critical patent/CN114036353A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a data packet matching method, a device, electronic equipment and a storage medium, which relate to the technical field of Internet, and the method comprises the following steps: compiling all regular expression rules matched with the data packet into a rule base, and generating an automaton corresponding to the rule base; analyzing a data packet to be matched, acquiring data of the data packet to be matched from a data link layer to an application layer, and generating a serialized feature string based on the data; and matching the automaton and the serialized feature strings to obtain a matching result. The data matching efficiency can be improved.

Description

Data packet matching method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data packet matching method and apparatus, an electronic device, and a storage medium.
Background
When the rules of the network data packet are matched, the conventional network data packet rule matching engine disassembles and classifies the data packet and the rules, the packets of different types are matched with the rules of different types for multiple times, and finally, all matching results are integrated to finally determine whether a certain rule is hit. The data structure and algorithm used in the process are complex, and the code quantity is large. The matching granularity is larger by searching the content of the whole data packet or the data part of the application layer. There is a problem in that data matching efficiency is low.
Disclosure of Invention
Based on this, an object of the embodiments of the present application is to provide a method and an apparatus for matching a data packet, an electronic device, and a storage medium, so as to improve data matching efficiency.
In a first aspect, an embodiment of the present application provides a data packet matching method, including:
compiling all regular expression rules matched with the data packet into a rule base, and generating an automaton corresponding to the rule base;
analyzing a data packet to be matched, acquiring data of the data packet to be matched from a data link layer to an application layer, and generating a serialized feature string based on the data;
and matching the automaton and the serialized feature strings to obtain a matching result.
In the implementation process, the data packet to be matched is analyzed, the serialized characteristic string is extracted from the data packet to represent the protocol variable value corresponding to the data packet, the data packet is matched based on the automaton compiled by the regular expression, the data packet does not need to be split and classified, the matching result of the data packet can be obtained by one-time matching, and therefore the data matching efficiency can be improved.
Optionally, the generating an automaton corresponding to the rule base includes:
determining the number of the rule bases to be compiled based on the time consumption for compiling the regular expression rules and the memory occupation condition, and generating the automaton corresponding to each rule base;
the matching the automaton and the serialized feature string comprises:
and when a plurality of automata are generated, matching each automata with the serialized feature string in sequence.
In the implementation process, the number of compiling rule bases can be adjusted according to the matching precision requirement, when the rule order is small, one automaton is generated, the matching result is obtained through one-time matching, when the rule order is large, a plurality of automatons are generated, the matching precision and the matching speed can be considered, and the matching flexibility can be improved.
Optionally, the obtaining data of the to-be-matched data packet from the data link layer to the application layer includes:
and sequentially carrying out data link layer analysis, network layer analysis, transmission layer analysis and application layer analysis on the data packet to be matched to obtain protocol header data and protocol load data of each data layer of the data packet to be matched.
Further, the generating a serialized feature string based on the data comprises:
determining the protocol variable and a value corresponding to the protocol variable for each data layer based on the header data and the protocol payload data;
and splicing the values of the protocol variables of each data layer to obtain the serialized characteristic strings of the data packets to be matched.
In the implementation process, the TCP/ip four-layer protocol is analyzed and matched on the data packet, the analysis result is spliced into the serialized characteristic string, each protocol head data and protocol load data of the network data packet are represented by the serialized characteristic string, the serialized characteristic string is matched with the compiled rule base, and the matching result is output, so that the transmission protocol of the data packet is determined, a plurality of parts of the data packet can be matched at the same time, the protocol head data and the protocol load data are not required to be matched separately and then are synthesized to obtain the matching result, and the matching efficiency can be improved. Meanwhile, the matching granularity is finer, and the specific data fields in each protocol can be matched, so that the matching precision can be improved.
Optionally, each regular expression rule includes at least one protocol variable, and before compiling all the regular expression rules matching the data packet into a rule base, the method further includes:
preprocessing the data packet to be matched to obtain a preprocessing serialized characteristic string;
and determining the writing sequence of the protocol variables and the number of the protocol variables of the regular expression rule based on the preprocessing serialization characteristic string.
Optionally, the method may further include:
setting a preset protocol variable writing sequence and the number of preset protocol variables;
compiling the regular expression rule into a rule base based on the writing sequence of the preset protocol variables and the number of the preset protocol variables, and analyzing the data packet to be matched to generate the serialized feature string.
In the implementation process, the data packet to be matched can be preprocessed to obtain the sequence of the protocol variables of the serialized characteristic string and the number of the protocols or the writing sequence of the predefined protocol variables and the number of the preset protocol variables, the format of the regular expression rule is determined, the regular expression rule is compiled in the same format as the serialized characteristic string, the matching result can be obtained through one-time regular matching, and the matching result can be obtained without separately matching each protocol header data and each protocol load data and then integrating the protocol header data and the protocol load data, so that the matching efficiency can be improved.
Optionally, the matching the automaton and the serialized feature string comprises:
and sequentially reading the serialized feature strings according to the character sequence based on the automaton, and outputting an acceptance signal representing that the automaton accepts the serialized feature strings or a rejection signal representing that the automaton rejects the serialized feature strings when the reading is finished so as to determine whether the serialized feature strings are matched with the rule base.
In the implementation process, the deterministic finite automata compiled by the regular expression reads the serialized feature string, judges whether the character string is accepted by the deterministic finite automata or not, determines a matching result based on automata output, and does not need to classify data packets, so that the matching efficiency can be improved.
In a second aspect, an embodiment of the present application provides a packet matching apparatus, including:
the compiling module is used for compiling all regular expression rules matched with the data packet into a rule base and generating an automaton corresponding to the rule base;
the analysis module is used for analyzing the data packet to be matched, acquiring data of the data packet to be matched from a data link layer to an application layer, and generating a serialized characteristic string based on the data;
and the matching module is used for matching the automaton with the serialized characteristic string to obtain a matching result.
In the implementation process, the data packet to be matched is analyzed, the serialized characteristic string is extracted from the data packet to represent the protocol variable value corresponding to the data packet, the data packet is matched based on the automaton compiled by the regular expression, the data packet does not need to be split and classified, the matching result of the data packet can be obtained by one-time matching, and therefore the data matching efficiency can be improved.
Optionally, the compiling module may be specifically configured to: determining the number of the rule bases compiled based on the time consumption for compiling the regular expression rules and the memory occupation condition, and generating the automaton corresponding to each rule base.
The matching module may be further operable to: and when a plurality of automata are generated, matching each automata with the serialized feature string in sequence.
In the implementation process, the number of compiling rule bases can be adjusted according to the matching precision requirement, when the rule order is small, one automaton is generated, the matching result is obtained through one-time matching, when the rule order is large, a plurality of automatons are generated, the matching precision and the matching speed can be considered, and the matching flexibility can be improved.
Optionally, the parsing module may be specifically configured to: and sequentially carrying out data link layer analysis, network layer analysis, transmission layer analysis and application layer analysis on the data packet to be matched to obtain protocol header data and protocol load data of each data layer of the data packet to be matched.
Further, the parsing module may be further specifically configured to: determining the protocol variable and a value corresponding to the protocol variable for each data layer based on the header data and the protocol payload data; and splicing the values of the protocol variables of each data layer to obtain the serialized characteristic strings of the data packets to be matched.
In the implementation process, the TCP/ip four-layer protocol is analyzed and matched on the data packet, the analysis result is spliced into the serialized characteristic string, each protocol head data and protocol load data of the network data packet are represented by the serialized characteristic string, the serialized characteristic string is matched with the compiled rule base, and the matching result is output, so that the transmission protocol of the data packet is determined, a plurality of parts of the data packet can be matched at the same time, the protocol head data and the protocol load data are not required to be matched separately and then are synthesized to obtain the matching result, and the matching efficiency can be improved. Meanwhile, the matching granularity is finer, and the specific data fields in each protocol can be matched, so that the matching precision can be improved.
Optionally, the packet matching apparatus may further include a preprocessing module, configured to:
preprocessing the data packet to be matched to obtain a preprocessing serialized characteristic string; and determining the writing sequence of the protocol variables and the number of the protocol variables of the regular expression rule based on the preprocessing serialization characteristic string.
Optionally, the preprocessing module is further operable to:
setting a preset protocol variable writing sequence and the number of preset protocol variables;
the compiling module can be specifically used for compiling the regular expression rule into a rule base based on the writing sequence of the preset protocol variables and the number of the preset protocol variables;
the parsing module may be specifically configured to parse the data packet to be matched based on the preset protocol variable writing sequence and the number of preset protocol variables to generate the serialized feature string.
In the implementation process, the data packet to be matched can be preprocessed to obtain the sequence of the protocol variables of the serialized characteristic string and the number of the protocols or the writing sequence of the predefined protocol variables and the number of the preset protocol variables, the format of the regular expression rule is determined, the regular expression rule is compiled in the same format as the serialized characteristic string, the matching result can be obtained through one-time regular matching, and the matching result can be obtained without separately matching each protocol header data and each protocol load data and then integrating the protocol header data and the protocol load data, so that the matching efficiency can be improved.
Optionally, the matching module may be specifically configured to:
and sequentially reading the serialized feature strings according to the character sequence based on the automaton, and outputting an acceptance signal representing that the automaton accepts the serialized feature strings or a rejection signal representing that the automaton rejects the serialized feature strings when the reading is finished so as to determine whether the serialized feature strings are matched with the rule base.
In the implementation process, the deterministic finite automata compiled by the regular expression reads the serialized feature string, judges whether the character string is accepted by the deterministic finite automata or not, determines a matching result based on automata output, and does not need to classify data packets, so that the matching efficiency can be improved.
In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any one of the foregoing implementation manners when reading and executing the program instructions.
In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored in the computer-readable storage medium, and when the computer program instructions are read and executed by a processor, the steps in any of the foregoing implementation manners are performed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic diagram illustrating steps of a data packet matching method according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating a process of parsing a data packet according to an embodiment of the present application;
fig. 3 is a schematic diagram of a step of determining a format of a regular expression according to an embodiment of the present application;
fig. 4 is a schematic diagram of a packet matching apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. For example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In the research process, the applicant finds that when the conventional network data packet rule matching engine performs rule matching on network data packets, the data packets and the rules are disassembled and classified, the packets of different types are matched with the rules of different types for multiple times, and finally, all matching results are integrated to finally determine whether a rule is hit, so that the problem of low matching efficiency exists.
Based on this, an embodiment of the present application provides a data packet matching method, which matches a data packet by extracting a serialized feature string from a data packet to be matched, so as to improve the data packet matching efficiency, please refer to fig. 1, where fig. 1 is a schematic diagram of steps of the data packet matching method provided in the embodiment of the present application, and the data packet matching method may include the following steps:
in step S11, compiling all regular expression rules matching the data packet into a rule base, and generating an automaton corresponding to the rule base.
The regular expression is a formula describing a pattern matched with the character string, and whether the character string contains a certain substring or not can be checked through the regular expression, the matched substring is replaced, or a substring meeting a certain condition is taken out from the certain string, and the like.
The regular expression rule to be compiled may be determined according to the value of a specific part of the data packet to be matched, and may be expressed as mac, src [ ^ ] + mac, dst ═ 01:02:03:04:05:06ipv4.src ═ 1\ 1ipv4.dst ═ 2\ 2. (, the regular expression rule indicates that the source mac address of the matching data packet is any mac address, the destination mac address is 01:02:03:04:05:06, the source ip is 1.1.1.1, the destination ip is ip section 2.2.2.0/24, the tcp source port is any port, the tcp destination port is 80, and the http domain name content comprises baidu.com or qq.com. The value of the protocol variable can be a regular expression or a constant character string. The Regular expression syntax supports the PCRE (Perl Compatible Regular expressions) Regular expression syntax specification. Protocol variables may be increased or decreased according to matching requirements.
The automaton may be a Deterministic Finite Automaton (DFA), and after all regular expression rules are compiled into a rule base, a Deterministic finite automaton is output.
In step S12, the data packet to be matched is analyzed, data of the data packet to be matched from the data link layer to the application layer is obtained, and a serialized feature string is generated based on the data.
In the packet switching network, a single message is divided into a plurality of data blocks, and each data block is a data packet. Data packets may be transmitted along different paths in the network and recombined at the destination for data transmission.
Illustratively, the data link layer, the network layer, the transmission layer and the application layer of the data packet may be sequentially analyzed to obtain the values of the protocol variables of each data layer, and the values of all the protocol variables are spliced to obtain the serialized feature string.
In step S13, the automaton and the serialized feature string are matched to obtain a matching result.
Therefore, the data packet to be matched is analyzed, the serialized characteristic strings are extracted from the analyzed data packet to represent the protocol variable values corresponding to the data packet, the data packet is matched based on the automaton compiled by the regular expression, the data packet does not need to be split and classified, the matching result of the data packet can be obtained through one-time matching, and therefore the data matching efficiency can be improved.
In an optional embodiment, when the order of magnitude of the regular expression rule is large, the number of the rule bases compiled may be determined based on time consumption for compiling the regular expression rule and a memory occupation condition, and the automaton corresponding to each rule base is generated. Step S13 may specifically be, when a plurality of automata are generated, matching each automata with the serialized feature string in sequence.
Illustratively, when the matching accuracy requirement is high, the order of the regular expression rules can reach one hundred thousand orders of magnitude, and excessive rules can result in long compiling time or large memory occupation, so that all the regular expression rules can be divided to generate a plurality of rule bases, a corresponding automaton is generated after each rule base is compiled, when the data packets are matched, the serialized feature strings of the data packets are obtained through the step S12, and the serialized feature strings are matched through the plurality of automatons in sequence, so that the matching result is obtained.
Therefore, the number of compiling rule bases can be adjusted according to the matching precision requirement, when the rule order is small, one automatic machine is generated, the matching result is obtained through one-time matching, when the rule order is large, a plurality of automatic machines are generated, the matching precision and the matching speed can be considered, and the matching flexibility can be improved.
Optionally, in step S12, the data packets may be sequentially parsed based on the data packet hierarchy, and step S12 may specifically be to sequentially perform data link layer parsing, network layer parsing, transport layer parsing, and application layer parsing on the data packet to be matched, so as to obtain the protocol header data and the protocol payload data of each data layer of the data packet to be matched. Referring to step S12, an embodiment of the present invention provides an implementation step for parsing a packet, please refer to fig. 2, fig. 2 is a schematic diagram of a flow for parsing a packet according to an embodiment of the present invention, where the flow for parsing a packet may include the following steps:
in step S121, the protocol variable and the value corresponding to the protocol variable of each data layer are determined based on the header data and the protocol payload data.
For example, the process of parsing a packet provided in the present application may be applied to a network packet processing device, which may be a network-enabled electronic device, such as a configurator of an engineering device, a mobile phone, a tablet computer, a personal digital assistant, and the like. The equipment receives a network data packet, and performs data link layer analysis, network layer analysis, transmission layer analysis and application layer analysis on the data packet to be matched in sequence to obtain protocol header data and protocol load data of each data layer of the data packet to be matched. The transmission device adds functional information to the data load from top to bottom according to a protocol in one communication, and when data passes through each layer, a corresponding header, namely protocol header data, is encapsulated to the data according to the protocol of the layer.
In step S122, the values of the protocol variables of each data layer are spliced to obtain the serialized feature string of the to-be-matched data packet.
For example, the format of the serialized feature string obtained after splicing may be mac, src ═ 00:02:04:05:06mac, dst ═ 01:02:03:04:05:06ipv6.src ═ ff06:: c3 ipv6 dst ═ cc28::12udp, sport ═ 1234udp, dport ═ 1234udp, payload ═ xa1\ xb2\ x3f \ x44
mac.src==00:02:04:05:06mac.dst==01:02:03:04:05:06ipv4.src==1.1.1.1ipv4.dst==2.2.2.22tcp.sport==1234tcp.dport==80http.host=www.baidu.com。
Therefore, the TCP/ip four-layer protocol analysis and matching are carried out on the data packet, the analysis results are spliced into the serialized feature strings, each protocol head data and protocol load data of the network data packet are represented by the serialized feature strings, the serialized feature strings are matched with the compiled rule base, the matching results are output, the transmission protocol of the data packet is determined, a plurality of parts of the data packet can be matched at the same time, the protocol head data and the protocol load data are not required to be matched separately and then are synthesized to obtain the matching results, and the matching efficiency can be improved. Meanwhile, the matching granularity is finer, and the specific data fields in each protocol can be matched, so that the matching precision can be improved.
In an optional embodiment, each regular expression rule includes at least one protocol variable, before step S11, an implementation process for determining a regular expression format is further provided in the embodiment of the present application, please refer to fig. 3, where fig. 3 is a schematic diagram of a step for determining a regular expression format provided in the embodiment of the present application, and the process for determining a regular expression format may include the following steps:
in step S31, the data packet to be matched is preprocessed to obtain a preprocessed serialized feature string.
In step S32, the order of writing the protocol variables and the number of the protocol variables of the regular expression rule are determined based on the pre-processing serialized feature string.
In another embodiment, a preset protocol variable writing sequence and a preset protocol variable number can be set; compiling the regular expression rule into a rule base based on the writing sequence of the preset protocol variables and the number of the preset protocol variables, and analyzing the data packet to be matched to generate the serialized feature string.
Therefore, the data packet to be matched can be preprocessed to obtain the sequence of the protocol variables and the number of the protocols of the serialized characteristic string or the writing sequence of the predefined protocol variables and the number of the preset protocol variables, the format of the regular expression rule is determined, the regular expression rule is written in the same format as the serialized characteristic string, the matching result can be obtained through one-time regular matching, and the matching result can be obtained without the need of separately matching each protocol header data and each protocol load data and then integrating the protocol header data and the protocol load data, so that the matching efficiency can be improved.
Optionally, for step S13, the serialized feature strings are sequentially read in character order based on the automaton, and when the reading is completed, an acceptance signal indicating that the automaton accepts the serialized feature strings or a rejection signal indicating that the automaton rejects the serialized feature strings is output to determine whether the serialized feature strings match the rule base.
The automaton can output a directed graph representing an acceptance character string or a rejection character string, each node of the directed graph corresponds to one state, the automaton draws the node into two circles when accepting the character string, and draws the node into one circle when rejecting the character string.
Therefore, the embodiment of the application reads the serialized feature string by the deterministic finite automata after the regular expression is compiled, judges whether the character string is accepted by the deterministic finite automata or not, determines the matching result based on the automata output, and does not need to classify the data packet, so that the matching efficiency can be improved.
Further, the embodiment of the application can be applied to a device with a network protocol recognition or network security attack detection function, and based on a network data packet processing device, the technical scheme of the application is used as one of the functional modules (hereinafter referred to as a network protocol recognition module), receives a network data packet, and analyzes and matches tcp/ip five-layer protocols on the data packet. The specific matching process is as follows: starting the system, loading the network card drive, and carrying out related initialization of hardware and software. The network protocol identification module initializes, loads regular expression rules and compiles the rules, registers message processing functions. The device receives the network data packet and distributes the network data packet to the processing function of the network protocol identification module for identification. The network protocol identification module analyzes the protocol variables of the data packet and splices the analysis results into a serialized characteristic string. And the network protocol identification module matches the serialized characteristic strings with the compiled rule base through DFA and outputs a matching result, such as determining which protocol is or whether the data is a network security attack.
Based on the same inventive concept, an embodiment of the present application further provides a packet matching apparatus 40, please refer to fig. 4, where fig. 4 is a schematic diagram of the packet matching apparatus provided in the present application, and the packet matching apparatus 40 may include:
a compiling module 41, configured to compile all regular expression rules matching the data packet into a rule base, and generate an automaton corresponding to the rule base;
the analysis module 42 is configured to analyze a data packet to be matched to obtain data of the data packet to be matched from a data link layer to an application layer, and generate a serialized feature string based on the data, where the serialized feature string represents protocol header data and protocol load data of the data packet;
and the matching module 43 is configured to match the automaton with the serialized feature string to obtain a matching result.
Optionally, compiling module 41 may be specifically configured to: determining the number of the rule bases compiled based on the time consumption for compiling the regular expression rules and the memory occupation condition, and generating the automaton corresponding to each rule base.
The matching module 43 may also be used to: and when a plurality of automata are generated, matching each automata with the serialized feature string in sequence.
Optionally, the parsing module 42 may be specifically configured to: and sequentially carrying out data link layer analysis, network layer analysis, transmission layer analysis and application layer analysis on the data packet to be matched to obtain protocol header data and protocol load data of each data layer of the data packet to be matched.
Further, the parsing module 42 may be further specifically configured to: determining the protocol variable and a value corresponding to the protocol variable for each data layer based on the header data and the protocol payload data; and splicing the values of the protocol variables of each data layer to obtain the serialized characteristic strings of the data packets to be matched.
Optionally, the packet matching device 40 may further include a preprocessing module, configured to:
preprocessing the data packet to be matched to obtain a preprocessing serialized characteristic string; and determining the writing sequence of the protocol variables and the number of the protocol variables of the regular expression rule based on the preprocessing serialization characteristic string.
Optionally, the preprocessing module is further operable to:
setting a preset protocol variable writing sequence and the number of preset protocol variables;
the compiling module 41 may be specifically configured to compile the regular expression rule into a rule base based on the writing order of the preset protocol variables and the number of the preset protocol variables;
the parsing module 42 may be specifically configured to parse the data packet to be matched to generate the serialized feature string based on the preset protocol variable writing sequence and the number of preset protocol variables.
Optionally, the matching module 43 may be specifically configured to:
and sequentially reading the serialized feature strings according to the character sequence based on the automaton, and outputting an acceptance signal representing that the automaton accepts the serialized feature strings or a rejection signal representing that the automaton rejects the serialized feature strings when the reading is finished so as to determine whether the serialized feature strings are matched with the rule base.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes the steps in any one of the above implementation manners when reading and executing the program instructions.
Based on the same inventive concept, embodiments of the present application further provide a computer-readable storage medium, where computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform steps in any of the above-mentioned implementation manners.
The computer-readable storage medium may be a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and other various media capable of storing program codes. The storage medium is used for storing a program, and the processor executes the program after receiving an execution instruction.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
Alternatively, all or part of the implementation may be in software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part.
The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.).
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for packet matching, comprising:
compiling all regular expression rules matched with the data packet into a rule base, and generating an automaton corresponding to the rule base;
analyzing a data packet to be matched, acquiring data of the data packet to be matched from a data link layer to an application layer, and generating a serialized feature string based on the data;
and matching the automaton and the serialized feature strings to obtain a matching result.
2. The method of claim 1, wherein the generating an automaton corresponding to the rule base comprises:
determining the number of the rule bases to be compiled based on the time consumption for compiling the regular expression rules and the memory occupation condition, and generating the automaton corresponding to each rule base;
the matching the automaton and the serialized feature string comprises:
and when a plurality of automata are generated, matching each automata with the serialized feature string in sequence.
3. The method according to claim 1, wherein the obtaining data of the data packet to be matched from a data link layer to an application layer comprises:
and sequentially carrying out data link layer analysis, network layer analysis, transmission layer analysis and application layer analysis on the data packet to be matched to obtain protocol header data and protocol load data of each data layer of the data packet to be matched.
4. The method of claim 3, wherein the generating a serialized feature string based on the data comprises:
determining the protocol variable and a value corresponding to the protocol variable for each data layer based on the header data and the protocol payload data;
and splicing the values of the protocol variables of each data layer to obtain the serialized characteristic strings of the data packets to be matched.
5. The method of claim 1, wherein each regular expression rule includes at least one protocol variable, and wherein before compiling all regular expression rules that match a packet into a rule base, the method further comprises:
preprocessing the data packet to be matched to obtain a preprocessing serialized characteristic string;
and determining the writing sequence of the protocol variables and the number of the protocol variables of the regular expression rule based on the preprocessing serialization characteristic string.
6. The method of claim 1, further comprising:
setting a preset protocol variable writing sequence and the number of preset protocol variables;
compiling the regular expression rule into a rule base based on the writing sequence of the preset protocol variables and the number of the preset protocol variables, and analyzing the data packet to be matched to generate the serialized feature string.
7. The method of claim 1, wherein the matching the automaton and the serialized feature string comprises:
and sequentially reading the serialized feature strings according to the character sequence based on the automaton, and outputting an acceptance signal representing that the automaton accepts the serialized feature strings or a rejection signal representing that the automaton rejects the serialized feature strings when the reading is finished so as to determine whether the serialized feature strings are matched with the rule base.
8. A packet matching apparatus, comprising:
the compiling module is used for compiling all regular expression rules matched with the data packet into a rule base and generating an automaton corresponding to the rule base;
the analysis module is used for analyzing the data packet to be matched so as to obtain the data of the data packet to be matched from a data link layer to an application layer, and generating a serialized characteristic string based on the data;
and the matching module is used for matching the automaton with the serialized characteristic string to obtain a matching result.
9. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-7.
10. A computer-readable storage medium having computer program instructions stored thereon for execution by a processor to perform the steps of the method of any one of claims 1-7.
CN202111388835.5A 2021-11-22 2021-11-22 Data packet matching method and device, electronic equipment and storage medium Pending CN114036353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111388835.5A CN114036353A (en) 2021-11-22 2021-11-22 Data packet matching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111388835.5A CN114036353A (en) 2021-11-22 2021-11-22 Data packet matching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114036353A true CN114036353A (en) 2022-02-11

Family

ID=80145116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111388835.5A Pending CN114036353A (en) 2021-11-22 2021-11-22 Data packet matching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114036353A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117938987A (en) * 2024-03-25 2024-04-26 天津布尔科技有限公司 Protocol gateway applied to Internet of vehicles and control method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117938987A (en) * 2024-03-25 2024-04-26 天津布尔科技有限公司 Protocol gateway applied to Internet of vehicles and control method thereof
CN117938987B (en) * 2024-03-25 2024-06-18 天津布尔科技有限公司 Protocol gateway applied to Internet of vehicles and control method thereof

Similar Documents

Publication Publication Date Title
US7512634B2 (en) Systems and methods for processing regular expressions
CN107292170B (en) Method, device and system for detecting SQL injection attack
US9426166B2 (en) Method and apparatus for processing finite automata
US20040205411A1 (en) Method of detecting malicious scripts using code insertion technique
CN109842629A (en) The implementation method of custom protocol based on protocol analysis frame
US8543528B2 (en) Exploitation of transition rule sharing based on short state tags to improve the storage efficiency
KR100772523B1 (en) Apparatus for detecting intrusion using pattern and method thereof
US9064032B2 (en) Blended match mode DFA scanning
US20120290736A1 (en) Systems and Methods for Processing Regular Expressions
CN110768875A (en) Application identification method and system based on DNS learning
CN114553784A (en) Current limiting processing method and device
CN113946546B (en) Abnormality detection method, computer storage medium, and program product
CN111884876A (en) Method, device, equipment and medium for detecting protocol type of network protocol
CN104333483A (en) Identification method, system and identification device for internet application flow
CN113961919A (en) Malicious software detection method and device
CN114036353A (en) Data packet matching method and device, electronic equipment and storage medium
CN104333461A (en) Identification method, system and identification device for internet application flow
CN111046938A (en) Network traffic classification and identification method and equipment based on character string multi-mode matching
CN103093147B (en) A kind of method identifying information and electronic installation
CN112839055B (en) Network application identification method and device for TLS encrypted traffic and electronic equipment
CN114697066A (en) Network threat detection method and device
EP4242832A1 (en) Method and apparatus for parsing programming language, and non-volatile storage medium
CN108304467B (en) Method for matching between texts
US11184282B1 (en) Packet forwarding in a network device
CN113946516A (en) Code coverage rate determining method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination