CN117978706A - Traffic protocol identification method and device, electronic equipment and storage medium - Google Patents

Traffic protocol identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117978706A
CN117978706A CN202410372192.2A CN202410372192A CN117978706A CN 117978706 A CN117978706 A CN 117978706A CN 202410372192 A CN202410372192 A CN 202410372192A CN 117978706 A CN117978706 A CN 117978706A
Authority
CN
China
Prior art keywords
feature
rule
node
protocol
state machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410372192.2A
Other languages
Chinese (zh)
Inventor
李琳
周睿康
蔡一鸣
朱峰
赵梓桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electronics Standardization Institute
Original Assignee
China Electronics Standardization Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electronics Standardization Institute filed Critical China Electronics Standardization Institute
Priority to CN202410372192.2A priority Critical patent/CN117978706A/en
Publication of CN117978706A publication Critical patent/CN117978706A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a flow protocol identification method, a device, electronic equipment and a storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: performing feature matching of non-SP features on load data in the flow to be identified; acquiring a first feature ID of each matched non-SP feature; when the first feature ID exists in the rule state machine, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node exists; when a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out SP feature matching on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result. The invention can improve the protocol identification efficiency.

Description

Traffic protocol identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for identifying a flow protocol, an electronic device, and a storage medium.
Background
Traffic refers to data flows, data packets, etc. in a computer network, and protocol identification for traffic is the basis for performing upper layer traffic analysis.
The existing flow protocol identification method realizes the identification of the target protocol and the sub-protocol by a protocol plug-in mode. The method analyzes plug-ins through a plurality of deployed protocols and identifies the actual content of the data packet. The protocol analysis plug-in can be dynamically expanded, and the protocol analysis plug-in is deployed based on actual needs so as to adapt to customizing the content to be analyzed under different scenes.
Another existing traffic protocol identification method is to generate a feature library file according to a custom protocol, and complete protocol identification by using the library file. The method comprises the steps of generating a protocol feature configuration library file through a custom protocol; analyzing the protocol characteristic configuration library file, filtering the data packet, and counting the matching result of the protocol. The feature items and the feature keywords in the feature library of the method can be expanded according to the service requirement, and the method has certain expandability and usability.
In the existing methods, the recognition efficiency in protocol recognition is not considered, and if the efficiency of protocol recognition is low, the timeliness of upper-layer service analysis is affected, so that the requirements of application scenes cannot be met.
Disclosure of Invention
The invention provides a flow protocol identification method, a device, electronic equipment and a storage medium, which are used for solving the defect of low identification efficiency of flow protocol identification in the prior art so as to improve the efficiency of protocol identification.
The invention provides a traffic protocol identification method, which comprises the following steps:
performing feature matching of non-SP features on load data in the flow to be identified;
Acquiring a first feature ID of each matched non-SP feature under the condition of matching to at least one non-SP feature;
for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine;
Performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists;
Under the condition that the target father node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on the feature information of the target SP feature to obtain an SP feature matching result;
and determining the target protocol of the flow to be identified based on the SP feature matching result.
According to the method for identifying a traffic protocol provided by the invention, the node matching is performed based on the second feature ID and the association relationship between the father node and the child node recorded in the rule state machine, and whether a target father node associated with the child node corresponding to the second feature ID exists is determined, which comprises the following steps:
Splicing the second characteristic ID with the characteristic ID of each father node recorded in the rule state machine to obtain at least one splicing candidate characteristic ID;
And aiming at each splicing candidate feature ID, carrying out node matching on the splicing candidate feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining the father node successfully matched as the target father node under the condition of successful matching.
According to the traffic protocol identification method provided by the invention, the node matching of the splicing candidate feature ID and the association relationship between the father node and the child node recorded in the rule state machine comprises the following steps:
Determining a first hash key corresponding to the splicing candidate feature ID;
and matching the first hash key with at least one second hash key recorded in the rule state machine, wherein the second hash key is obtained by splicing the characteristic ID of the father node and the characteristic ID of the associated child node in the rule state machine.
According to the traffic protocol identification method provided by the invention, the method further comprises the following steps:
acquiring at least one protocol rule;
Traversing each protocol rule to obtain all rule features in the currently traversed protocol rule;
Setting a root node in an initial state machine, and mounting a first rule feature under the root node based on the sequence of each rule feature in the currently traversed protocol rule to obtain a child node corresponding to the root node;
And taking the child node corresponding to the root node as a father node of a second rule feature in the currently traversed protocol rule, and mounting the second rule feature to obtain child nodes of the child node corresponding to the root node until all rule features of the protocol rule are mounted under the corresponding nodes in a grading manner to obtain the rule state machine.
According to the traffic protocol identification method provided by the invention, the method further comprises the following steps:
For any two protocol rules, if the former mounted protocol rule is mounted with the same rule feature in the peer node, the latter mounted protocol rule is forbidden to be mounted with the same rule feature in the peer node.
According to the traffic protocol identification method provided by the invention, the method further comprises the following steps:
And marking a parent node corresponding to the rule feature and recording feature information of the rule feature in the parent node under the condition that the type of the rule feature is the SP feature.
According to the method for identifying the flow protocol provided by the invention, the method for determining the target protocol of the flow to be identified based on the SP feature matching result comprises the following steps:
And under the condition that the SP characteristic matching result characterizes that the load data comprises the target SP characteristic, determining a protocol corresponding to the target SP characteristic as a target protocol of the flow to be identified.
The invention also provides a traffic protocol identification device, which comprises:
the first matching module is used for carrying out feature matching of non-SP features on the load data in the flow to be identified;
The acquisition module is used for acquiring a first feature ID of each matched non-SP feature under the condition that the at least one non-SP feature is matched;
The first determining module is used for determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine for each first feature ID;
the first determining module is further configured to perform node matching based on the second feature ID and an association relationship between a parent node and a child node recorded in a rule state machine, and determine whether a target parent node associated with the child node corresponding to the second feature ID exists;
The second matching module is used for carrying out feature matching of the SP features on the load data based on the feature information of the target SP features under the condition that the target father node exists in the rule state machine and the target SP features are mounted under the child nodes corresponding to the second feature ID, so as to obtain an SP feature matching result;
and the second determining module is used for determining the target protocol of the flow to be identified based on the SP feature matching result.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes any one of the flow protocol identification methods when executing the computer program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a flow protocol identification method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a method of traffic protocol identification as described in any of the above.
The invention provides a flow protocol identification method, a device, electronic equipment and a storage medium, wherein the method carries out characteristic matching of non-SP characteristics on load data in flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result. Based on the above, when the feature recognition is performed on the traffic to be recognized, the feature matching of the SP feature is performed on the load data based on the feature information of the target SP feature only when the target parent node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, so that the targeted feature matching can be performed on the target SP feature in the rule state machine, traversing of all the non-SP features in the feature library is avoided, the time consumption of the SP feature matching can be reduced, and the recognition efficiency of traffic protocol recognition can be improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a flow protocol identification method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a rule state machine provided by an embodiment of the present invention;
FIG. 3 is a schematic block diagram of protocol rule installation provided by an embodiment of the present invention;
FIG. 4 is a schematic block diagram of rule state machine generation provided by an embodiment of the present invention;
FIG. 5 is a block diagram of an identification flow of a protocol identification module according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a flow protocol identification device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in the present invention, the numbers of the described objects, such as "first", "second", etc., are only used to distinguish the described objects, and do not have any sequence or technical meaning.
The traffic is usually generated based on the protocol rule corresponding to the traffic, and the protocol rule to which the traffic belongs is analyzed and determined to be the traffic protocol identification. The protocol rule may be a general protocol rule or a user-defined protocol rule.
In general, one protocol rule includes one or more features, and in different protocol rules, the ordering and combination manners of the features are different, and the features may be, for example, a port number, a traffic direction, a specific character string feature, a specific hexadecimal feature, a message length, a synthetic mode (SYNTHESIS PATTERN, SP) feature, and the like. The SP feature may be a feature specifying a particular algorithm, particular values, or particular ranges, among others. For example, when a certain SP feature is matched, a data field of a specified position in the flow is required to be acquired according to an offset specified by the SP feature, calculation of the SP feature specifying algorithm is performed on the data field to obtain a calculated value, the calculated value is compared with a specific range specified by the SP feature, and when the comparison is successful, the SP feature is determined to be included in the flow.
The determination of SP features is different from the determination of non-SP features such as port numbers, traffic directions, specific character string features, specific hexadecimal features, and the like, and requires more computation time. In practical application, the protocol identification is to identify the network traffic in a feature library, where the feature library includes a plurality of features with a large number. In addition to the internal feature library, there are also feature library definition ways that provide the user with an open. The user can define own characteristics and corresponding protocol rules according to the supported characteristic configuration mode. Since a large part of protocol rules in the protocol identification comprise SP features, the invention mainly improves the identification efficiency of the SP features by taking the improvement of the identification efficiency of the SP features as a starting point.
In the existing traffic protocol identification method, the traffic is generally identified by SP features based on an identification engine comprising all SP features, whether the traffic contains the SP features is determined, and the specific category of the identified SP features is determined; meanwhile, carrying out non-SP feature recognition on the flow based on a recognition engine comprising all the non-SP features, determining whether the flow comprises the non-SP features or not, and determining the specific category of the recognized non-SP features; and then, matching with each protocol rule based on the identified SP features and the identified non-SP features, and determining the protocol of the protocol rule as the protocol corresponding to the flow when all the identified SP features and the identified non-SP features are included in the protocol rule. In the process, the SP features are identified by traversing all the SP features, the identification process consumes longer time and consumes larger performance of identification equipment, so that the existing protocol identification efficiency is lower, and the requirements of practical application cannot be met. From the above findings, it is known that the recognition of the SP features should be avoided from traversing all SP features in the recognition engine as much as possible, so that the efficiency of protocol recognition can be improved.
Aiming at the problems, the embodiment of the invention provides a flow protocol identification method, which carries out feature matching of non-SP features on load data in flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result. Based on the above, when the feature recognition is performed on the traffic to be recognized, the feature matching of the SP feature is performed on the load data based on the feature information of the target SP feature only when the target parent node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, so that the targeted feature matching can be performed on the target SP feature in the rule state machine, traversing of all the non-SP features in the feature library is avoided, the time consumption of the SP feature matching can be reduced, and the recognition efficiency of traffic protocol recognition can be improved.
The following describes a flow protocol identification method provided by an embodiment of the present invention with reference to fig. 1 to 5. Fig. 1 is a flow chart of a flow protocol identification method provided by an embodiment of the present invention, where the flow protocol identification method provided by the embodiment of the present invention is applicable to perform protocol identification on various types of flows, for example, may perform protocol identification on a flow of a general protocol rule, and may also perform protocol identification on a flow of a protocol rule configured by a user. The execution main body of the method can be a computer, a server cluster, a specially designed flow protocol identification device or other electronic equipment, or can be a flow protocol identification device arranged in the electronic equipment, and the flow protocol identification device can be realized by software, hardware or a combination of the two. For example, the electronic device of the execution subject of the present invention may be a device such as an industrial firewall or an audit product. The flow protocol identification method provided by the embodiment of the invention is exemplified by an industrial control firewall. As shown in fig. 1, the traffic protocol identification method includes steps 110 to 160.
Step 110, performing feature matching of non-SP features on the load data in the traffic to be identified.
In this step, the traffic to be identified may be any traffic to be identified transmitted in the network, for example, may be a network data packet input into an industrial control firewall or an audit product, etc. The load data in the flow to be identified is the data field for protocol identification in the flow to be identified. The non-SP feature may be understood as any feature other than the SP feature, and may include, for example, a port number, a traffic direction, a specific character string feature, a specific hexadecimal feature, and a message length.
Specifically, when the feature matching of the non-SP feature is performed on the load data in the traffic to be identified, for example, the feature matching may be performed on the traffic to be identified by inputting the traffic to be identified into a hexadecimal/character string engine, or the non-SP feature matching may be performed on the traffic to be identified by inputting the traffic to be identified into an engine of an AC multimode matching algorithm (Aho-Corasick automation, AC), so as to obtain all the non-SP features included in the traffic to be identified. The AC multimode matching algorithm engine may also be referred to as an AC automaton or AC engine hereinafter.
For example, a feature library of non-SP features is loaded in the AC engine, and the feature library of non-SP features includes a plurality of non-SP features, each of which has a corresponding feature ID (Identity), i.e., an identification number. And inputting the flow to be identified into the AC engine for feature matching search, identifying each non-SP feature included in the flow to be identified, and outputting the sequence of each identified non-SP feature, the corresponding information such as each feature ID and the like.
Step 120, in the case of matching to at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature.
In this step, the non-SP feature recognition engine may perform non-SP feature recognition on the traffic to be recognized, and in the case of matching at least one non-SP feature, the non-SP feature recognition engine may output the feature ID of each matched non-SP feature, where the feature ID is the first feature ID.
In step 130, for each first feature ID, if the first feature ID exists in the rule state machine, the first feature ID is determined as a second feature ID corresponding to a child node of the rule state machine.
In this step, the rule state machine may be a state machine obtained by initializing in the industrial control firewall before the start of the traffic protocol identification, and the rule state machine may also be referred to as a protocol identification state machine, a rule tree, a state machine, or the like. The rule state machine may be configured to determine each feature identified in the traffic to be identified, and determine whether there is an association between each feature. It can be understood that if at least two features are included in the same protocol rule, there is an association relationship between the features. For example, protocol a includes: if the features identified from a certain flow to be identified are the features 1,2 and 3, the features in the flow to be identified have an association relationship, and the protocol corresponding to the flow to be identified is more likely to be the protocol A.
For example, when the rule state machine is initialized, a rule tree is generated based on a protocol rule file including a plurality of protocol rules and a feature library file including a plurality of features, so as to obtain a state machine with a tree structure, namely a rule state machine.
For example, assume that the protocol rules of four protocols, protocol a, protocol B, protocol C, and protocol D, are included in the protocol rules file. The protocol rule corresponding to the protocol A is that the protocol comprises: feature 1, feature 2, and feature 3; the protocol rule corresponding to the protocol B is that the protocol comprises: feature 1, feature 4, and feature 5; the protocol rule corresponding to the protocol C is that the protocol comprises: feature 2, feature 4, and feature 6; the protocol rule corresponding to the protocol D is that the protocol comprises: feature 2, feature 5 and feature 7. The numbers 1, 2,3 and the like are feature IDs corresponding to the features, and the feature IDs can be IDs obtained by dividing according to self-increasing assignment rules. All features and feature IDs corresponding to the features are included in the feature library file, for example, 100 features including feature 1 to feature 100, wherein one part is an SP feature and the other part is a non-SP feature. When the rule state machine is initialized, the specific content of each feature and the feature ID corresponding to the feature can be read from the feature library file. For example, feature 1 is a non-SP feature, whose specific content is port number xx1; feature 2 is another non-SP feature, whose specific content is the string xx2; feature 5 is an SP feature whose specific content is to offset 5 bytes from the beginning of the payload data and obtain a field value of length 2 bytes, which should lie in a specific range 16-20; features 3, 4, 6 and 7 may all be SP features, and specific features thereof will not be described here.
Fig. 2 is a schematic diagram of a rule state machine according to an embodiment of the present invention, and as shown in fig. 2, when the rule state machine is initialized, a root node is created first, where an ID of the root node may be set to 0. And sequentially reading protocol rules of the protocol A, the protocol B, the protocol C and the protocol D in the protocol rule file, and mounting the features corresponding to the protocol rules on the nodes of the corresponding level. For example, firstly, the protocol rule of the protocol A is read, the feature 1 is mounted on the first-stage node, and the feature ID of the feature 1 can be recorded in a state table corresponding to the rule state machine during mounting, and then the feature ID '1' is stored in the rule state machine; the second characteristic feature 2 of the protocol A is mounted on the second-stage node, and the characteristic ID of the characteristic feature 2 can be recorded in a state table corresponding to the rule state machine during mounting, and then the characteristic ID '2' is stored in the rule state machine; and the third characteristic 3 of the protocol A is mounted on the third-stage node, and the characteristic ID of the characteristic 3 can be recorded in a state table corresponding to the rule state machine during mounting, so that the characteristic ID '3' is stored in the rule state machine, all the characteristics of the protocol A are mounted in the rule state machine, and the protocol name of the protocol can be mounted after the last stage, namely the third-stage node. After the feature reading and mounting of the protocol A are completed, the features of the protocol B are read and mounted according to the mode, and then the features of the protocol C and the features of the protocol D are mounted in sequence. After the feature reading and mounting of all the protocols in the protocol rule file are completed, the rule state machine is initialized.
It should be understood that a parent node and a child node are relative terms used to describe a sequential relationship between two adjacent levels of nodes, where a parent node is a node of the two adjacent levels that is closer to the root node, and a child node is a node of the two adjacent levels that is further from the root node, as in fig. 2, a second level node is a child node of a first level node, and the first level node is a parent node of the second level node; there is no relationship between parent and child nodes between peer nodes.
Illustratively, after step 120, the first feature IDs of the matched non-SP features may be obtained, e.g., two first feature IDs may be obtained, respectively "1" and "4". For each first feature ID, it is determined whether first feature IDs "1" and "4" are present in the rule state machine. In the case that the first feature ID exists in the rule state machine, the first feature ID is determined to be a second feature ID corresponding to a child node of the rule state machine. Since the first feature IDs "1" and "4" have been stored when the protocol B is read and mounted, the first feature IDs "1" and "4" can each be determined as a second feature ID corresponding to a child node of the rule state machine, which can be understood as the first feature ID existing in the rule state machine.
And 140, performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists.
Illustratively, the association relationship between the parent node and the child node is recorded in the rule state machine. The association relationship can be understood as a sequential relationship between adjacent features belonging to one protocol. For example, when feature IDs of features of the protocol a are recorded in a state table corresponding to the rule state machine, association relationships between parent nodes and child nodes may be recorded based on the sequence relationships of the feature IDs and the levels of the nodes corresponding to the features, and the association relationships between a group of parent nodes and child nodes may be represented as 01, 12, 23 in a spliced combination form of the feature IDs, where 01 represents the association relationship between a group of parent nodes and child nodes, 12 represents the association relationship between a group of parent nodes and child nodes, and 23 represents the association relationship between a group of parent nodes and child nodes. Based on the second feature ID and the association relationship between the parent node and the child node recorded in the rule state machine, node matching can be performed, and a target parent node associated with the child node corresponding to the second feature ID can be determined.
For example, in step 130, the second feature IDs "1" and "4" are determined, and since the association relationship between the parent node and the child node of 01, 14, and 45 is stored when the protocol B of the rule state machine is initialized, the association relationship between the parent node and the child node of 45 is the feature ID corresponding to the parent node of "4" being "5", and the second feature ID "4" cannot be used as the feature ID corresponding to the child node, and thus the association relationship between the parent node and the child node of 45 cannot be used for determining the target parent node. 01 and 14 can be used for determining a target father node, in the association relation of 01, a '1' is a child node, and the corresponding target father node is a '0'; in the association relationship of 14, "4" is a child node, and the corresponding target parent node is "1".
In this step, since there is a target parent node, it is indicated that a node having an association relationship exists in a node at a higher level of the second feature ID, and the second feature ID may be a feature ID under a certain protocol in the rule state machine, and subsequent judgment is continued by the second feature ID, so that it is possible to determine the corresponding protocol. If the target father node does not exist, the second feature ID is an isolated node, and no upper node is associated with the second feature ID in the rule state machine, so that the second feature ID does not belong to any protocol in the rule state machine.
And step 150, under the condition that a target father node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on the feature information of the target SP feature, and obtaining an SP feature matching result.
Specifically, when the rule state machine is obtained through initialization, feature information of the SP feature may be mounted in the rule state machine, where the feature information includes information such as a feature ID of the feature and a feature specific content. For example, feature information of the SP feature may be mounted in a parent node having an association relationship with the SP feature, so as to purposefully and quickly complete feature matching of the SP feature.
For example, since feature 5 is an SP feature, feature information such as a feature ID and feature specific content of feature 5 may be mounted in a parent node having an association relationship therewith when a rule state machine is initialized. As shown in fig. 2, the feature information of the feature 5 may be mounted in the feature 2 of the first level node and mounted in the feature 4 of the second level node. For the second feature IDs "1" and "4", there are both target parent nodes, and since the target parent node of the feature 1 is the root node 0, no feature information of the SP feature is mounted thereon, so that matching of the SP feature is not triggered. Feature information of the SP feature 5 is mounted in the feature 4, the feature 5 is a target SP feature, and based on specific content of the feature 5 mounted in the feature 4, feature matching of the SP feature on the load data can be triggered, and a matching result of the target SP feature is obtained. When the feature matching is performed, for example, based on the specific content of the target SP feature, operations such as the value of the target field, operation, comparison of specific values or specific ranges are performed on the load data, so as to obtain a conclusion whether the flow to be identified contains the target SP feature, that is, a matching result is obtained.
Step 160, determining a target protocol of the flow to be identified based on the SP feature matching result.
In this step, the target protocol is the finally determined protocol corresponding to the flow to be identified. Based on the obtained SP feature matching result, whether the flow to be identified contains the target SP feature or not can be determined, and then the target protocol of the flow to be identified can be determined by combining the feature matching results of other target SP features and other non-SP features.
For example, the SP feature matching result obtained in step 150 indicates that the traffic to be identified includes the target SP feature, and meanwhile, other target SP features and/or non-SP features are also determined to be included in the traffic to be identified, and the features that are successfully matched all belong to the same protocol rule, so that it can be determined that the target protocol of the traffic to be identified is the protocol of the protocol rule.
The flow protocol identification method provided by the embodiment of the invention carries out feature matching of non-SP features on load data in the flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result. Based on the above, when the feature recognition is performed on the traffic to be recognized, the feature matching of the SP feature is performed on the load data based on the feature information of the target SP feature only when the target parent node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, so that the targeted feature matching can be performed on the target SP feature in the rule state machine, traversing of all the non-SP features in the feature library is avoided, the time consumption of the SP feature matching can be reduced, and the recognition efficiency of traffic protocol recognition can be improved.
In practical application, whether an industrial control firewall or an audit product is adopted, audit and protection are realized by matching with a strategy configured by a user on the basis of protocol identification. Protocol identification is used for supporting message deep analysis, so that accurate protection is achieved. Protocol identification is a basic stone for industrial audit and protection, and the protocol identification is used for subsequent audit and protection treatment. The existing protocol identification method is realized based on all traversal, rule definition and query of the SP features are relatively complex, all policy traversal is carried out on the SP features every time the protocol of new traffic is identified, and the method can cause low efficiency of the protocol identification, thereby causing low equipment performance. In addition, in the method of traversing all SP features, all SP features need to be installed into the policy array of the recognition engine, resulting in a large memory consumption.
The method of the invention establishes the upper and lower association relations among all levels of nodes in the rule state machine, and the target SP features are directly embedded into the upper level nodes with the association relations, so that the method not only ensures the correctness of state jump, but also can realize quick jump, and the algorithm is simple to calculate and low in conflict rate, thereby shortening the searching time, improving the protocol identification efficiency and improving the equipment performance. In addition, the method does not need to independently install all SP features into a strategy array of the recognition engine, so that the memory can be saved; moreover, the method can be suitable for the installation and inquiry of most network product protocol rules, has specific universality and has higher adaptability and expansibility.
In practical application, for the second feature ID, since the second feature ID indicates that the second feature ID may belong to a certain protocol in the rule state machine when the second feature ID exists in the target parent node, it is more accurately determined whether the target parent node associated with the child node corresponding to the second feature ID exists, and accuracy of protocol identification can be improved.
In an embodiment, based on the second feature ID and the association relationship between the parent node and the child node recorded in the rule state machine, node matching is performed, and whether a target parent node associated with the child node corresponding to the second feature ID exists is determined, which may be specifically implemented by the following manner:
Splicing the second characteristic ID with the characteristic ID of each father node recorded in the rule state machine to obtain at least one splicing candidate characteristic ID; aiming at each splicing candidate feature ID, carrying out node matching on the relation between the splicing candidate feature ID and the father node and the son node recorded in the rule state machine, and determining the father node successfully matched as a target father node under the condition of successful matching.
Specifically, the feature IDs of the parent nodes recorded in the rule state machine may be respectively used as feature IDs of candidate parent nodes, and for each second feature ID determined in the traffic to be identified, the feature IDs of the candidate parent nodes are respectively spliced. And when the characteristic IDs of the candidate father nodes are spliced, the characteristic IDs of the candidate father nodes are behind the first second characteristic IDs, and the ID is spliced to obtain at least one spliced candidate characteristic ID.
By way of example, and as shown in FIG. 2, the characteristic IDs of parent nodes recorded in the rule state machine are: the third level nodes in fig. 2 are leaf nodes of the rule tree, namely, the node of the last feature of the protocol rule, and the feature of the next level node is not mounted after the third level node, so that the leaf nodes cannot be used as father nodes. Taking the characteristic IDs of all the father nodes recorded in the rule state machine as the characteristic IDs of the candidate father nodes, wherein the characteristic IDs of the candidate father nodes are as follows: "0", "1", "2", "4" and "5". Aiming at the second feature ID '1', the splicing candidate feature ID which can be spliced is as follows: "01", "11", "21", "41" and "51". Aiming at the second feature ID '4', the splicing candidate feature ID which can be spliced is as follows: "04", "14", "24", "44" and "54".
As shown in fig. 2, the association relationship between the parent node and the child node recorded in the rule state machine may be expressed as "01", "12", "23", "14", "45", "02", "24", "46", "25" and "57" by a spliced combination of the feature ID of the parent node and the feature ID of the associated child node. Aiming at each splicing candidate feature ID, carrying out node matching on the relation between the splicing candidate feature ID and the father node and the son node recorded in the rule state machine, and determining the father node successfully matched as a target father node under the condition of successful matching. And if the matching is successful, the association relation between the splicing candidate feature ID and the father node and the child node recorded in the rule state machine is the same. It can be known that "01", "14" and "24" in each of the splice candidate feature IDs can be successfully matched, the feature ID "0" of the parent node is the feature ID of the target parent node of the second feature ID "1", the feature IDs "1" and "2" of the parent node are both the feature IDs of the target parent node of the second feature ID "4", and the target parent node of the second feature ID "1" and the target parent node of the second feature ID "4" can be determined.
In this embodiment, by respectively splicing the second feature ID with the feature IDs of the parent nodes recorded in the rule state machine, all the parent nodes can be used as candidate parent nodes, so as to splice each spliced candidate feature ID, thereby avoiding the problem that the target parent node is not fully determined due to missing of the parent nodes; and aiming at each splicing candidate feature ID, the node matching is carried out on the splicing candidate feature ID and the association relation between the father node and the son node recorded in the rule state machine, so that each target father node can be comprehensively and accurately determined, the accuracy and the comprehensiveness of determining the target father node are improved, and the accuracy of protocol identification can be further improved.
In practical applications, the feature ID in the feature library may be a character string with a longer number of bits, for example, a character string with 16 bits or 32 bits; or the feature ID may be a more complex unique identification generated by the encoding rules. In these cases, in order to improve the efficiency of node matching between the spliced candidate feature IDs and the association relationship, efficient matching may be performed by using a hash key (hash key) matching method for the diversity and complexity of features.
In an embodiment, the node matching is performed on the concatenation candidate feature ID and the association relationship between the parent node and the child node recorded in the rule state machine, which may be specifically implemented by the following manner:
Determining a first hash key corresponding to the splicing candidate feature ID; and matching the first hash key with at least one second hash key recorded in the rule state machine, wherein the second hash key is a hash key obtained by splicing the characteristic ID of the father node in the rule state machine with the characteristic ID of the associated child node.
Specifically, when the rule state machine is obtained through initialization, the recorded characteristic IDs of the father nodes with the association relationship and the associated characteristic IDs of the child nodes can be spliced, hash keys corresponding to the data after the characteristic IDs of the father nodes and the associated characteristic IDs of the child nodes are spliced are calculated, the hash keys are second hash keys, and each second hash key is recorded in the rule state machine.
For example, as shown in fig. 2, the data obtained by concatenating the characteristic ID of the parent node and the characteristic ID of the associated child node recorded in the rule state machine is represented as "01", "12", "23", "14", "45", "02", "24", "46", "25" and "57". Corresponding hash keys are calculated for '01', '12', '23', '14', '45', '02', '24', '46', '25', and '57', respectively, to obtain respective corresponding second hash keys. When the corresponding second hash key is calculated, any hash key calculation mode can be adopted, and the invention is not limited to this.
For example, after each stitching candidate feature ID is obtained, a hash key corresponding to each stitching candidate feature ID may be obtained by calculation in the same calculation manner as the second hash key is obtained, that is, the first hash key corresponding to each stitching candidate feature ID is obtained. Based on the fact that each first hash key is matched with each second hash key recorded in the rule state machine, quick matching can be conducted on the premise that matching accuracy is guaranteed, and matching efficiency is improved.
In this embodiment, since the hash keys have higher uniqueness and different hash key collision rates are low, the uniqueness identification can be performed, so that node matching is performed based on the first hash key and the second hash key, and on the premise of ensuring the matching accuracy, the efficiency of node matching is improved, so that the efficiency of determining the target father node can be improved.
Fig. 3 is a schematic block diagram of protocol rule installation provided by an embodiment of the present invention, and fig. 4 is a schematic block diagram of rule state machine generation provided by an embodiment of the present invention. The specific process of obtaining a rule-like body machine is further described below in connection with fig. 3 and 4.
In one embodiment, the method further comprises: acquiring at least one protocol rule; traversing each protocol rule to obtain all rule features in the currently traversed protocol rule; setting a root node in an initial state machine, and mounting a first rule feature under the root node based on the sequence of each rule feature in the currently traversed protocol rule to obtain a child node corresponding to the root node; and taking the child node corresponding to the root node as a father node of a second rule feature in the currently traversed protocol rule, and mounting the second rule feature to obtain child nodes of the child nodes corresponding to the root node until the rule features of all the protocol rules are mounted under the corresponding nodes in a grading manner to obtain a rule state machine.
Specifically, the rule state machine generation process may be understood as loading all protocol rules for protocol identification into the initial state machine to form a rule tree with a tree structure, thereby obtaining the rule state machine. The traffic to be identified can be searched based on the constructed rule state machine, and whether all the characteristics matched in the traffic to be identified can be legally jumped or not is determined, namely, whether all the characteristics have an association relation or not is determined. And determining that each feature matched in the flow to be identified can be legally jumped through a rule state machine, and determining that each identified feature belongs to each feature under the same protocol rule, so as to determine a target protocol corresponding to the flow to be identified.
The industrial control firewall can comprise a feature library installation module, a protocol identification state machine generation module and a protocol identification module. The feature library installation module can realize protocol rule installation. For the diversity and complexity of features, a feature may be identified using a combination of globally unique identification IDs. The characteristics of the same protocol rule are searched through the hash key, so that whether a new global unique ID needs to be generated or not is judged, or the existing global unique ID is returned, and a foundation is laid for the realization of a subsequent installation. Through constructing the association relation of the upper and lower level features, the SP features are stored in the association relation, and the traversal performance of the SP features is optimized. A corresponding search engine is built for the non-SP features.
In one implementation, the method further comprises: for any two protocol rules, if the former mounted protocol rule has mounted the same rule feature in the peer node, the latter mounted protocol rule is forbidden to mount the same rule feature in the peer node.
Exemplary, as shown in fig. 3, after the feature library is installed, features in the protocol rule are sequentially acquired, hash keys corresponding to the features are calculated, whether current features exist in the peer node is determined, if yes, it is indicated that the current features exist in the peer node, the features are not repeatedly installed in the peer node, and feature IDs of the existing features are acquired; if the current feature does not exist in the peer node, a global unique ID can be generated for the feature, for example, the feature ID of the parent node of the feature is spliced with the feature ID of the feature to generate the global unique ID. The association relationship between the upper node and the lower node is established, for example, the association relationship between the upper node and the lower node can be represented by a global unique ID generated after the characteristic ID of the parent node is spliced with the characteristic ID of the characteristic.
In this implementation manner, for any two protocol rules, if the former mounted protocol rule has mounted the same rule feature in the peer node, the latter mounted protocol rule prohibits mounting the same rule feature in the peer node. Therefore, the same characteristics can be prevented from being repeatedly mounted in the same-level node, so that the complexity of the rule state machine can be reasonably controlled, and the problems of reduced operation performance and reduced matching efficiency caused by redundancy of the rule state machine are avoided.
Compared with the prior art, one of the invention conception of the invention is that SP features in each protocol rule are mounted in a rule state machine, so that traversing of all SP features in a feature library in the prior art can be avoided, and further the efficiency of protocol identification is improved.
To achieve the above concept, in one implementation, the method further includes: and when the type of the rule feature is the SP feature, marking on a parent node corresponding to the rule feature, and recording feature information of the rule feature in the parent node.
For example, as shown in fig. 3, after the association relationship between the upper node and the lower node is established, it is further determined whether the current feature is an SP feature. If yes, storing the feature information of the SP feature into an association relation, for example, reading the feature information of the SP feature from a feature library, and mounting the feature information in a father node having the association relation with the current node. Further, the corresponding lower node is marked as an SP feature in the associated parent node, for example, a flag bit is set in the parent node, and a flag is set on the flag bit. If not, a hexadecimal/character string engine is established for the non-SP features, and the method can be understood as leading to a recognition engine capable of recognizing the non-SP features to perform feature recognition. The feature library installation module completes the installation of the protocol rules.
In this implementation manner, when the rule state machine is generated, the SP feature determination may be performed on each current feature, to determine whether the current feature is an SP feature, and feature information of the SP feature is mounted in a corresponding node. Based on the above, a rule state machine capable of performing targeted and rapid matching on the target SP characteristics can be obtained. Meanwhile, the father node corresponding to the rule feature is marked, so that the target SP feature can be quickly determined during protocol identification, and the matching efficiency of the target SP feature is improved.
The generation of the rule-like body machine may be implemented by a protocol recognition state machine generation module, for example. As shown in fig. 4, the specific flow of the protocol identification state machine generation module to generate the rule state machine includes: starting to generate a rule state machine; setting a first state in the rule state machine to be 0, namely setting a root node of a starting state; sequentially acquiring characteristics in the protocol rules; acquiring a feature ID of a first feature of a protocol rule; associating the characteristic ID of the node of the current level with the characteristic ID of the node of the upper level; inserting the feature ID of the node of the current level into a state machine; judging whether the current feature is the last rule of the protocol rule, if not, traversing the feature ID of the subsequent feature in the protocol rule; if yes, mounting protocol information of a corresponding protocol rule under the last feature, wherein the protocol information can comprise information such as a protocol name, protocol rule content and the like; after the characteristics of all protocol rules are obtained according to the flow, the generation of the rule state machine can be completed.
For example, the protocol identification can be implemented based on a protocol identification module. Fig. 5 is a block diagram of an identification flow of a protocol identification module according to an embodiment of the present invention, and as shown in fig. 5, after starting protocol identification, an initial identification state is set to 0, i.e. a root state. And acquiring application layer data in the traffic to be identified, wherein the application layer data is load data. The method comprises the steps of sending load data in a flow to be identified into a protocol identification module, and searching non-SP features such as hexadecimal feature strings and character string feature strings in the load data; acquiring and recording a hit feature ID, and taking the hit feature ID as a feature ID corresponding to the child node; traversing the feature IDs of all the father nodes in the rule state machine; judging whether the parent node can jump to the child node or not, wherein the judgment can be understood as judging whether the existing feature ID, namely the superior ID, has an association relationship with the feature ID which is just hit or not; if yes, continuing the following processing, otherwise returning to the step of traversing the feature IDs of all the father nodes in the rule state machine; recording the characteristic ID of the child node into a state table; inquiring whether the SP features are mounted under the node; if yes, continuing the following steps, otherwise judging whether the feature is the last feature in the protocol rule; inquiring the detailed information of the SP features recorded under the feature ID, and judging whether the SP features can hit or not, namely, matching the SP features of the target in the flow to be identified; hit is successful in matching; if hit, judging whether the feature is the last feature in the protocol rule, if not hit, returning to the step of traversing the feature IDs of all the father nodes in the rule state machine; and if not, returning to the step of traversing the feature IDs of all the father nodes in the rule state machine. And acquiring other feature IDs based on the steps to process until all feature IDs are acquired, whether the acquisition is successful or not.
In an embodiment, based on the SP feature matching result, the target protocol of the traffic to be identified is determined, which may specifically be: and under the condition that the SP characteristic matching result represents that the load data comprises the target SP characteristic, determining a protocol corresponding to the target SP characteristic as a target protocol of the flow to be identified.
Specifically, when the SP feature matching result indicates that the load data includes the target SP feature, it indicates that the protocol of the flow to be identified has a high probability of being the protocol corresponding to the target SP feature. Further, comprehensive determination can be performed based on other hit features in the flow to be identified, whether the hit features have an association relationship or not, and if the association relationship exists, a protocol corresponding to the target SP feature can be determined as a target protocol of the flow to be identified.
In this embodiment, when the SP feature matching result indicates that the load data includes the target SP feature, the protocol corresponding to the target SP feature may be determined as the target protocol of the flow to be identified, so as to achieve the purpose of fast and efficient flow protocol identification.
The flow protocol identification device provided by the embodiment of the present invention is described below, and the flow protocol identification device described below and the flow protocol identification method described above can be referred to correspondingly.
Fig. 6 is a schematic structural diagram of a flow protocol identification device according to an embodiment of the present invention, and referring to fig. 6, a flow protocol identification device 600 includes:
a first matching module 610, configured to perform feature matching of non-SP features on load data in the traffic to be identified;
an obtaining module 620, configured to obtain, if at least one non-SP feature is matched, a first feature ID of each matched non-SP feature;
a first determining module 630, configured to determine, for each first feature ID, if the first feature ID exists in the rule state machine, the first feature ID as a second feature ID corresponding to a child node of the rule state machine;
The first determining module 630 is further configured to perform node matching based on the second feature ID and the association relationship between the parent node and the child node recorded in the rule state machine, and determine whether a target parent node associated with the child node corresponding to the second feature ID exists;
The second matching module 640 is configured to perform feature matching of SP features on the load data based on feature information of the target SP features, to obtain an SP feature matching result, where the target SP feature is mounted under a child node corresponding to the second feature ID and a target parent node exists in the rule state machine;
A second determining module 650 is configured to determine a target protocol of the traffic to be identified based on the SP feature matching result.
In an example embodiment, the first determining module 630 is specifically configured to:
Splicing the second characteristic ID with the characteristic ID of each father node recorded in the rule state machine to obtain at least one splicing candidate characteristic ID;
Aiming at each splicing candidate feature ID, carrying out node matching on the relation between the splicing candidate feature ID and the father node and the son node recorded in the rule state machine, and determining the father node successfully matched as a target father node under the condition of successful matching.
In an example embodiment, the first determining module 630 is specifically configured to:
determining a first hash key corresponding to the splicing candidate feature ID;
And matching the first hash key with at least one second hash key recorded in the rule state machine, wherein the second hash key is a hash key obtained by splicing the characteristic ID of the father node in the rule state machine with the characteristic ID of the associated child node.
In an example embodiment, the traffic protocol identification apparatus 600 further includes a generating module, where the generating module is specifically configured to:
acquiring at least one protocol rule;
Traversing each protocol rule to obtain all rule features in the currently traversed protocol rule;
Setting a root node in an initial state machine, and mounting a first rule feature under the root node based on the sequence of each rule feature in the currently traversed protocol rule to obtain a child node corresponding to the root node;
And taking the child node corresponding to the root node as a father node of a second rule feature in the currently traversed protocol rule, and mounting the second rule feature to obtain child nodes of the child nodes corresponding to the root node until the rule features of all the protocol rules are mounted under the corresponding nodes in a grading manner to obtain a rule state machine.
In an example embodiment, the generating module is further to:
for any two protocol rules, if the former mounted protocol rule has mounted the same rule feature in the peer node, the latter mounted protocol rule is forbidden to mount the same rule feature in the peer node.
In an exemplary embodiment, the traffic protocol identification device 600 further includes a marking module, where the marking module is specifically configured to:
And when the type of the rule feature is the SP feature, marking on a parent node corresponding to the rule feature, and recording feature information of the rule feature in the parent node.
In an example embodiment, the second determining module 650 is specifically configured to:
And under the condition that the SP characteristic matching result represents that the load data comprises the target SP characteristic, determining a protocol corresponding to the target SP characteristic as a target protocol of the flow to be identified.
The apparatus of the present embodiment may be used to execute the method of any one of the embodiments of the flow protocol identification method side, and the specific implementation process and technical effects thereof are similar to those of the embodiment of the flow protocol identification method side, and specific reference may be made to the detailed description of the embodiment of the flow protocol identification method side, which is not repeated herein.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, the electronic device may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a traffic protocol identification method comprising: performing feature matching of non-SP features on load data in the flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform a traffic protocol identification method provided by the above methods, the method including: performing feature matching of non-SP features on load data in the flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can perform a traffic protocol identification method provided by the above methods, and the method includes: performing feature matching of non-SP features on load data in the flow to be identified; under the condition of matching with at least one non-SP feature, acquiring a first feature ID of each matched non-SP feature; for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine; performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists; under the condition that a target father node exists in the rule state machine and a target SP feature is mounted under a child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on feature information of the target SP feature to obtain an SP feature matching result; and determining a target protocol of the flow to be identified based on the SP feature matching result.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for identifying a traffic protocol, comprising:
performing feature matching of non-SP features on load data in the flow to be identified;
Acquiring a first feature ID of each matched non-SP feature under the condition of matching to at least one non-SP feature;
for each first feature ID, determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine;
Performing node matching based on the second feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining whether a target father node associated with the child node corresponding to the second feature ID exists;
Under the condition that the target father node exists in the rule state machine and the target SP feature is mounted under the child node corresponding to the second feature ID, carrying out feature matching of the SP feature on the load data based on the feature information of the target SP feature to obtain an SP feature matching result;
and determining the target protocol of the flow to be identified based on the SP feature matching result.
2. The traffic protocol identification method according to claim 1, wherein the determining whether the target parent node associated with the child node corresponding to the second feature ID exists based on the second feature ID and the association relationship between the parent node and the child node recorded in the rule state machine, includes:
Splicing the second characteristic ID with the characteristic ID of each father node recorded in the rule state machine to obtain at least one splicing candidate characteristic ID;
And aiming at each splicing candidate feature ID, carrying out node matching on the splicing candidate feature ID and the association relation between the father node and the child node recorded in the rule state machine, and determining the father node successfully matched as the target father node under the condition of successful matching.
3. The traffic protocol identification method according to claim 2, wherein the node matching the concatenation candidate feature ID with the association relationship between the parent node and the child node recorded in the rule state machine includes:
Determining a first hash key corresponding to the splicing candidate feature ID;
and matching the first hash key with at least one second hash key recorded in the rule state machine, wherein the second hash key is obtained by splicing the characteristic ID of the father node and the characteristic ID of the associated child node in the rule state machine.
4. A traffic protocol identification method according to any one of claims 1 to 3 wherein the method further comprises:
acquiring at least one protocol rule;
Traversing each protocol rule to obtain all rule features in the currently traversed protocol rule;
Setting a root node in an initial state machine, and mounting a first rule feature under the root node based on the sequence of each rule feature in the currently traversed protocol rule to obtain a child node corresponding to the root node;
And taking the child node corresponding to the root node as a father node of a second rule feature in the currently traversed protocol rule, and mounting the second rule feature to obtain child nodes of the child node corresponding to the root node until all rule features of the protocol rule are mounted under the corresponding nodes in a grading manner to obtain the rule state machine.
5. The traffic protocol identification method according to claim 4 wherein the method further comprises:
For any two protocol rules, if the former mounted protocol rule is mounted with the same rule feature in the peer node, the latter mounted protocol rule is forbidden to be mounted with the same rule feature in the peer node.
6. The traffic protocol identification method according to claim 5 wherein the method further comprises:
And marking a parent node corresponding to the rule feature and recording feature information of the rule feature in the parent node under the condition that the type of the rule feature is the SP feature.
7. A traffic protocol identification method according to any one of claims 1 to 3 wherein the determining the target protocol of the traffic to be identified based on the SP feature matching result comprises:
And under the condition that the SP characteristic matching result characterizes that the load data comprises the target SP characteristic, determining a protocol corresponding to the target SP characteristic as a target protocol of the flow to be identified.
8. A traffic protocol identification device, comprising:
the first matching module is used for carrying out feature matching of non-SP features on the load data in the flow to be identified;
The acquisition module is used for acquiring a first feature ID of each matched non-SP feature under the condition that the at least one non-SP feature is matched;
The first determining module is used for determining the first feature ID as a second feature ID corresponding to a child node of the rule state machine when the first feature ID exists in the rule state machine for each first feature ID;
the first determining module is further configured to perform node matching based on the second feature ID and an association relationship between a parent node and a child node recorded in a rule state machine, and determine whether a target parent node associated with the child node corresponding to the second feature ID exists;
The second matching module is used for carrying out feature matching of the SP features on the load data based on the feature information of the target SP features under the condition that the target father node exists in the rule state machine and the target SP features are mounted under the child nodes corresponding to the second feature ID, so as to obtain an SP feature matching result;
and the second determining module is used for determining the target protocol of the flow to be identified based on the SP feature matching result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the flow protocol identification method according to any of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the flow protocol identification method according to any one of claims 1 to 7.
CN202410372192.2A 2024-03-29 2024-03-29 Traffic protocol identification method and device, electronic equipment and storage medium Pending CN117978706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410372192.2A CN117978706A (en) 2024-03-29 2024-03-29 Traffic protocol identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410372192.2A CN117978706A (en) 2024-03-29 2024-03-29 Traffic protocol identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117978706A true CN117978706A (en) 2024-05-03

Family

ID=90858123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410372192.2A Pending CN117978706A (en) 2024-03-29 2024-03-29 Traffic protocol identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117978706A (en)

Similar Documents

Publication Publication Date Title
US8543528B2 (en) Exploitation of transition rule sharing based on short state tags to improve the storage efficiency
US9256831B2 (en) Match engine for detection of multi-pattern rules
US8442931B2 (en) Graph-based data search
CN109543942A (en) Data verification method, device, computer equipment and storage medium
US20050278781A1 (en) System security approaches using sub-expression automata
US20080046423A1 (en) Method and system for multi-character multi-pattern pattern matching
CN108985934B (en) Block chain modification method and device
CN107122221A (en) Compiler for regular expression
CN113946546B (en) Abnormality detection method, computer storage medium, and program product
CN111971931A (en) Method for verifying transactions in a blockchain network and nodes forming the network
US7216364B2 (en) System security approaches using state tables
CN111767364B (en) Data processing method, device and equipment
EP1607823A2 (en) Method and system for virus detection based on finite automata
US9875248B2 (en) System and method for identifying a file path using tree data structure
CN108304467B (en) Method for matching between texts
CN117978706A (en) Traffic protocol identification method and device, electronic equipment and storage medium
US7860712B2 (en) Method of storing data in a memory circuit for AHO-corasick type character recognition automaton and corresponding storage circuit
CN116248337A (en) Protocol fuzzy test method and device based on test case automatic generation
CN115221360A (en) Tree structure configuration method and system
CN112437096A (en) Acceleration strategy searching method and system
CN111353018A (en) Data processing method and device based on deep packet inspection and network equipment
JP7307784B2 (en) Automata Processing Apparatus and Method for Regular Expression Engine Utilizing Glushkov Automata Generation and Hybrid Matching
CN112995222B (en) Network detection method, device, equipment and medium
CN114610606B (en) Binary system module similarity matching method and device based on arrival-fixed value analysis
KR100670783B1 (en) Method and apparatus for packet classification using Field Level

Legal Events

Date Code Title Description
PB01 Publication
CB03 Change of inventor or designer information

Inventor after: Li Lin

Inventor after: Zhou Ruikang

Inventor after: Cai Yiming

Inventor after: Zhu Feng

Inventor after: Zhao Zitong

Inventor after: Huang Jingjing

Inventor after: Xia Ji

Inventor before: Li Lin

Inventor before: Zhou Ruikang

Inventor before: Cai Yiming

Inventor before: Zhu Feng

Inventor before: Zhao Zitong