CN111726264B - Network protocol variation detection method, device, electronic equipment and storage medium - Google Patents

Network protocol variation detection method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111726264B
CN111726264B CN202010560524.1A CN202010560524A CN111726264B CN 111726264 B CN111726264 B CN 111726264B CN 202010560524 A CN202010560524 A CN 202010560524A CN 111726264 B CN111726264 B CN 111726264B
Authority
CN
China
Prior art keywords
protocol
network protocol
data stream
known network
fuzzy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010560524.1A
Other languages
Chinese (zh)
Other versions
CN111726264A (en
Inventor
许小丰
戴佳浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 36 Research Institute
Original Assignee
CETC 36 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 36 Research Institute filed Critical CETC 36 Research Institute
Priority to CN202010560524.1A priority Critical patent/CN111726264B/en
Publication of CN111726264A publication Critical patent/CN111726264A/en
Application granted granted Critical
Publication of CN111726264B publication Critical patent/CN111726264B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/18Protocol analysers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/048Fuzzy inferencing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Abstract

The application discloses a network protocol variation detection method, a device, electronic equipment and a storage medium, wherein the network protocol variation detection method comprises the following steps: extracting a feature vector of a known network protocol, and constructing a feature database; acquiring a target characteristic vector of a data stream to be detected, matching the target characteristic vector with characteristic vectors of known network protocols in a characteristic database, and determining a candidate network protocol set; and determining a known network protocol corresponding to the protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm. According to the embodiment of the application, the accuracy of unknown protocol identification can be improved according to the actual condition of the data stream, the phenomenon of 'missing matching' is effectively prevented, the overall calculated amount is small, and the method and the device are suitable for application scenes with high real-time processing requirements.

Description

Network protocol variation detection method, device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to a network protocol variation detection method, a network protocol variation detection device, electronic equipment and a storage medium.
Background
An anonymous communication network (hereinafter referred to as an anonymous network) is a network system composed of hardware and software components capable of providing anonymous services to network communication users. Secret access and communication are generally achieved by establishing a secure tunnel between the visitor and the server using cryptographic techniques. These anonymous network systems provide encryption protection for network traffic and hide the visitor's original IP (Internet Protocol) address, which presents a significant challenge to network security administration.
Due to the convenience and anonymity of the anonymous network, the anonymous network becomes the first choice of cyberimers, and various cyberimes such as website attack, online computer virus spread, online smuggling, online illegal transactions, slur, disparagement and disparagement are carried out by applying the anonymous channel. Criminals can hide their own real positions and information by using anonymous networks, and avoid government supervision.
How to rapidly and accurately intelligently analyze and detect the protocol variation so as to realize network security protection is concerned by more and more researchers.
Disclosure of Invention
In view of the above, the present application is proposed to provide a network protocol variation detection method, apparatus, electronic device and storage medium that overcome or at least partially solve the above problems.
According to an aspect of the present application, there is provided a network protocol variation detection method, including:
extracting a feature vector of a known network protocol, and constructing a feature database;
acquiring a target characteristic vector of a data stream to be detected, matching the target characteristic vector with characteristic vectors of known network protocols in a characteristic database, and determining a candidate network protocol set;
and determining a known network protocol corresponding to the protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm.
Optionally, matching the target feature vector with feature vectors of known network protocols in a feature database, and determining a candidate network protocol set includes:
respectively calculating Euclidean distances between the target characteristic vector and the characteristic vectors of the known network protocols;
and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
Optionally, the determining, based on the fuzzy inference algorithm, a known network protocol corresponding to a protocol variation used by the data stream to be checked from the candidate network protocol set includes:
fuzzifying a target feature vector of a data stream to be detected to obtain a fuzzy set corresponding to each element in the target feature vector;
performing reasoning and synthesis according to the fuzzy set and a fuzzy implication relation in a pre-established fuzzy rule base to obtain a similarity fuzzy subset between the data stream to be detected and each known network protocol in the candidate network protocol set;
and defuzzifying the similarity fuzzy subset to determine a known network protocol corresponding to the protocol variation used by the data stream to be detected.
Optionally, defuzzifying the fuzzy subset of similarity, and determining a known network protocol corresponding to a protocol variation used by the data stream to be detected includes:
and calculating the gravity center of an area surrounded by the membership function curves of the similarity fuzzy subset, and determining the value corresponding to the gravity center as the known network protocol corresponding to the protocol variety.
Optionally, the acquiring a target feature vector of the data stream to be inspected includes: scanning the first 16 bytes of the intercepted data stream to be detected to obtain a target characteristic vector of the data stream to be detected; the target feature vector comprises one or more of the following elements: data flow survival time, data flow mapping port, data flow fixed byte, data frame/datagram arrival interval, signature algorithm, secure transmission protocol, certificate duration, data frame/datagram length, protocol version number.
Optionally, the extracting the feature vector of the known network protocol, and constructing the feature database includes:
and extracting the characteristic vector of each known network protocol in the TCP/IP protocol cluster to construct a characteristic database.
According to another aspect of the present application, there is provided a network protocol variation detecting apparatus including:
the database construction module is used for extracting the characteristic vector of the known network protocol and constructing a characteristic database;
the matching module is used for acquiring a target characteristic vector of the data stream to be detected, matching the target characteristic vector with characteristic vectors of known network protocols in a characteristic database and determining a candidate network protocol set;
and the determining module is used for determining the known network protocol corresponding to the protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm.
Optionally, the matching module is specifically configured to calculate euclidean distances between the target feature vector and feature vectors of known network protocols, respectively; and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.
According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.
According to the technical scheme, the database comprising the known network protocols is established, the intercepted data stream is analyzed, the target characteristic vector is obtained, the target characteristic vector is matched with the characteristic vector of the known network protocols in the database, the candidate network protocol set is preliminarily determined, the number of the characteristic vectors of the data stream can be dynamically expanded, and the accuracy of unknown protocol identification is improved according to the actual condition of the data stream. In addition, the candidate network protocol set is determined, so that the selection space corresponding to the known network protocol can be increased on the premise of improving the matching efficiency, and the phenomenon of 'missing matching' is effectively prevented. And finally, on the basis of the candidate network protocol set, the known network protocol type corresponding to the variant protocol is finally obtained by using a fuzzy inference algorithm, the scheme is easy to implement, the overall calculation amount is small, and the method is suitable for application scenes with high requirements on real-time processing.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flow chart illustrating a network protocol variation detection method according to an embodiment of the present application;
FIG. 2 illustrates a flow diagram of fuzzy inference of an embodiment of the present application;
FIG. 3 is a diagram illustrating a fuzzy classification membership function according to an embodiment of the present application;
FIG. 4 is a block diagram of a network protocol variation detection apparatus according to an embodiment of the present application;
fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
A variant of a protocol (or a variant protocol) refers to a proprietary protocol type in which the bit stream obtained from the network is related to a known protocol but has different characteristics. Users of anonymous networks often use variant protocols to communicate over the network for illegal purposes such as evading government regulations.
The technical idea of the application is as follows: the method comprises the steps of carrying out detailed analysis and detection on a normal network data packet, a variant data packet and a bit stream by utilizing an intelligent protocol analysis technology aiming at an anonymous variant network protocol and flow data thereof, finding out and analyzing a large amount of data, finding out possible illegal behaviors of a private protocol by utilizing a fuzzy test method, blocking related access control application, auditing information content, detecting and recovering information on the basis of protocol identification application, improving controllability and robustness of network information and a command system, and providing a credible certificate for communication information security.
Specifically, the embodiment of the present application provides a scheme for analyzing a network protocol in a wired/wireless network by using a fuzzy inference method, where on the premise of acquiring a network data stream by using open source software, an abstract feature vector is obtained by scanning a packet header or a frame header; then, calculating the Euclidean distance between the obtained data stream feature vector and the feature vector of the known network protocol (universal network protocol, anonymous network existing version and the like) in the database; on the basis, the known protocol type closest to the data stream to be detected is obtained by utilizing fuzzy mathematics, so that the change source of the non-public protocol is identified and obtained, the coordinated development of a software system in the field of network security in China is greatly promoted, and the long-term development planning of network security in China is established by adding tiles.
Fig. 1 is a schematic flow chart illustrating a method for detecting a network protocol variation according to an embodiment of the present application, and referring to fig. 1, the method for detecting a network protocol variation according to an embodiment of the present application includes the following steps:
and step S110, extracting the feature vector of the known network protocol and constructing a feature database.
The known network Protocol refers to publicly available protocols, such as known network protocols in a TCP/IP (Transmission Control Protocol/Internet Protocol) Protocol cluster. Correspondingly, extracting the feature vector of the known network protocol, and constructing the feature database comprises: extracting the characteristic vector of each known network protocol in the TCP/IP protocol cluster, constructing the characteristic vector of the characteristic database, and constructing the characteristic database.
The Protocol (Protocol) is an agreement or contract in the network communication process, which is established by a computer organization and defines many details, and both sides of communication must comply to normally transmit and receive data. In the embodiment of the present application, there are many known network protocols, for example, TCP (Transmission Control Protocol), UDP (User Datagram Protocol), IP (Internet Protocol), ICMP (Internet Control Message Protocol), ARP (Address Resolution Protocol), RARP (Reverse Address Resolution Protocol), and the like, and two communicating parties must use the same Protocol to communicate. When the internet communicates, a corresponding network protocol is needed, a TCP/IP is a protocol cluster developed and established for using the internet, the TCP/IP protocol cluster is a set of protocols composed of a plurality of network transmission protocols, and the currently actually used network model is a TCP/IP model, which simplifies an Open System Interconnection (OSI) model and only includes four layers, namely, an application layer, a transport layer, a network layer and a link layer (network interface layer) from top to bottom, and each layer includes a plurality of protocols.
According to the method and the device, a characteristic database of a known protocol is established according to a TCP/IP protocol cluster, various protocols of bit streams in network communication from a physical layer to an application layer are established according to a TCP/IP model, and the protocol characteristic database is conveniently analyzed correspondingly to a variant protocol.
And step S120, acquiring a target characteristic vector of the data stream to be detected, matching the target characteristic vector with the characteristic vectors of all known network protocols in the characteristic database, and determining a candidate network protocol set.
The set of candidate network protocols here includes those known network protocols whose feature vectors match the target feature vector.
Step S130, determining a known network protocol corresponding to a protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm. In the step, the known network protocol corresponding to the variant protocol is finally determined by using a fuzzy inference algorithm.
As shown in fig. 1, in the network protocol variation detection method according to the embodiment of the present application, based on the known protocol analysis, the obtained bit stream is analyzed to obtain the target feature vector, the target feature vector is matched with the feature vectors of the known network protocols in the feature database, and the candidate network protocol set is preliminarily determined, so that the accuracy of the unknown protocol identification can be improved according to the actual condition of the network data stream. In addition, the proposal of the candidate protocol set can improve the matching efficiency of the character strings, expand the selection space corresponding to the known protocol and effectively prevent the phenomenon of 'missing matching'. And finally, on the basis of the candidate protocol set, a corresponding known protocol type is finally obtained by using a fuzzy mathematical reasoning algorithm, the algorithm is easy to realize, the overall calculation amount is small, and the method is suitable for application scenes with high real-time processing requirements.
The network protocol variation detection method of the embodiment of the application can be divided into three stages: the first stage establishes a database of known protocol characteristics, and stores various types of protocols of bit streams from a physical layer to an application layer in network communication into the database of known protocol characteristics. In the second stage, on the basis of the first stage, the first 16 bytes of the intercepted data stream are scanned to obtain an abstract target feature vector; and scanning the known protocol feature database to perform comparison matching operation on the target feature vector and the feature vector of the existing protocol, calculating Euclidean distance between the feature vectors, preliminarily determining a candidate protocol set corresponding to the variant protocol, and preventing the phenomenon of 'missing matching'. And in the third stage, under the condition that the second stage is finished, the known network protocol type corresponding to the variant protocol in the acquired data stream is finally determined by using a fuzzy inference algorithm.
That is to say, the acquiring the target feature vector of the to-be-detected data stream in the foregoing step S120 includes: and scanning the first 16 bytes of the intercepted data stream to be detected to obtain a target characteristic vector of the data stream to be detected. The target feature vector obtained in the embodiment of the present application includes one or more of the following elements: data flow survival time, data flow mapping port, data flow fixed byte, data frame/datagram arrival interval, signature algorithm, secure transmission protocol, certificate duration, data frame/datagram length, protocol version number.
Note that, attribute 1: data stream survival time X1. Because the transmission paths of data streams in the network are different, a time interval in which data packets sent by the same data source node intercepted at the same place have been transmitted in the network in unit time is defined.
Attribute 2: the data stream maps port X2. Different network applications correspond to different ports, and well-known applications follow custom rules, using specific well-known ports, and not public proprietary protocol ports.
Attribute 3: the data stream is fixed byte X3. The binary data string at some specific position in the network load can also be used as a digital signature after hash transformation of the characteristics of the data message.
Attribute 4: data frame/datagram arrival interval X4. In unit time T, the average interval time T of arrival of data frames or data packets captured by test points in the network.
It should be noted here that a bit stream (data stream) is original data, and the original data is split into packets (also called "packets") at a transmission layer, and "frames" (frames) are transmitted in a data link layer. After the data packet arrives at the data link layer, the protocol header and the protocol trailer of the data link layer are added to form a data frame.
Attribute 5: signature algorithm X5. Signature algorithms used in data stream cipher sockets to ensure data integrity, such as: SHAWithRSA.
Attribute 6: secure transport protocol X6. Protocols that provide confidentiality and data integrity for network data flows between two communicating applications, such as: TLS (Transport Layer Security), SSL (Secure Sockets Layer), and the like.
Attribute 7: certificate duration X7. The local current time when the authentication certificate is generated and the remote server certificate expiration time interval. For example: 2 hours, 2 weeks, etc.
Attribute 8: data frame/datagram length X8. And in unit time T, the average length of the data frames or data messages captured by the test points in the network.
Attribute 9: protocol version number X9. And extracting the obtained protocol version characteristics from the data stream acquired by the packet capturing program. For example: IPV4, IPV6, Tor2.0, and the like.
It can be understood that, for the intercepted data stream, in the embodiment of the present application, the first 16 bytes of the intercepted data stream are scanned according to the characteristic that the header data or the header data of the specific location is not changed, so as to obtain the target feature vector, where the target feature vector of the current data stream to be detected includes 9 elements, namely the foregoing attribute 1 to the attribute 9.
After obtaining the target feature vector of the data stream, the step S120 matches the target feature vector with the feature vectors of known network protocols in the feature database, and determining the candidate network protocol set includes: respectively calculating Euclidean distances between the target characteristic vector and the characteristic vectors of the known network protocols; and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
In the embodiment of the application, the candidate protocol set corresponding to the obtained variant protocol is preliminarily determined by using the Euclidean distance between the feature vectors. For example, the database is scanned on the premise of extracting the data stream features, the comparison and matching operation is performed on the database and the feature vectors of the existing protocols in the database, the known network protocol with the Euclidean distance between the intercepted data stream feature vectors and the known protocol feature vectors being smaller than the specified distance threshold epsilon is selected, and the selected known network protocol is put into the candidate network protocol set.
Specifically, the step S120 preliminarily determines the candidate set corresponding to the obtained variant protocol, and includes two substeps, i.e., feature extraction and euclidean distance calculation, as follows.
Extracting a feature vector: the characteristic vector extraction process is similar to the character string matching process of the known protocol when a protocol characteristic database is constructed, and the first 16 bytes of the intercepted data stream are scanned according to the characteristic that the data of a packet header or a frame header at a specific position is not changed to obtain the abstract characteristic vector.
The following describes the calculation process by taking a data flow in a certain message format as an example:
the feature vector extracted for a certain data stream is represented as: x ═ X1 x2 … x9](ii) a The feature vector Y of the known network protocol in the feature database is represented as:
Figure BDA0002545924810000091
in the formula: y isi,jRepresenting the jth attribute of the ith protocol feature vector in the database.
Step (2), Euclidean distance calculation:
the Euclidean distance calculation formula of the feature vectors X and Y is as follows:
Figure BDA0002545924810000092
wherein M isxThe characteristic attribute of x is a vector representation method; (m)x-my) Representing a vector subtraction.
It should be noted that, in the embodiment of the present invention, when extracting the feature vector of the known network protocol and constructing the feature database, the present invention performs the feature decomposition of the public protocol by using the existing packet capturing software (e.g., wireshark, etc.) and the intrusion detection system shared by the source code (e.g., Snort, OSSIM), extracts the elements corresponding to the aforementioned elements from attribute 1 to attribute 9, 9 as the feature vector of each known network protocol, and stores the known network protocol and the feature vector thereof in the constructed feature database. Note: OSSIM (Open Source Security Information Management system).
And on the basis of determining the candidate protocol set and acquiring the target characteristic vector of the data stream, converting the input determined values of the attributes of the characteristic vector of the data stream into a fuzzy set form by the input fuzzification interface. And the fuzzy inference machine calculates an output fuzzy set of the similarity by combining the input fuzzy set with the fuzzy rule base. And finally, the output defuzzification interface converts the similarity fuzzy output into a determined value form. Through the operation of the third stage, the known protocol corresponding to the data bit stream can be finally and uniquely determined. That is to say, the determining, based on the fuzzy inference algorithm, the known network protocol corresponding to the protocol variation used by the data stream to be checked from the candidate network protocol set includes: fuzzifying a target feature vector of a data stream to be detected to obtain a fuzzy set corresponding to each element in the target feature vector; performing reasoning and synthesis according to the fuzzy set and a fuzzy implication relation in a pre-established fuzzy rule base to obtain the similarity of the data stream to be detected and each known network protocol in the candidate network protocol set; and defuzzifying the similarity, and determining a known network protocol corresponding to the protocol variation used by the data stream to be detected.
FIG. 2 is a flow diagram illustrating fuzzy inference of an embodiment of the present application, referring to FIG. 2, employing Mamdani fuzzy inference to analyze a known protocol corresponding to a study protocol variation; the Mamdani fuzzy inference process is shown in fig. 2. And the input fuzzification interface converts the input determined value of the characteristic vector of the data stream to be detected into a fuzzy set form. And the fuzzy inference machine calculates the output fuzzy subset of the similarity by combining the fuzzy rule base according to the input fuzzy set. And finally, the output defuzzification interface converts the similarity fuzzy output into a determined value form.
Referring to fig. 2, the input fuzzy interface: and the fuzzy set is used for converting the input determination value of the characteristic vector of the data stream to be detected into a fuzzy set.
The following is an example of 3 attributes of the 9 attributes of the aforementioned data stream feature vector, where the 3 attributes are the data stream survival time T, the data stream fixed byte n, and the secure transport protocol d, respectively, and then the corresponding fuzzy set is the data survival time fuzzy set T*(t) data stream fixed byte N*(n) and secure transport protocol D*(d) The specific blurring manner is as follows:
Figure BDA0002545924810000111
Figure BDA0002545924810000112
Figure BDA0002545924810000113
dx,yin the aforementioned step (2)The euclidean distance of (c).
Referring to fig. 2, the fuzzy rule base: the method comprises the steps of firstly establishing an input and output language variable set, namely, carrying out fuzzy classification on the known network protocol feature vectors and the final known protocol similarity in a candidate protocol set, and establishing a corresponding membership function. By fuzzy classification of the feature vectors, the use efficiency of the feature vectors can be improved. In order to simplify the calculation, the embodiment of the application selects linear division method classification, and establishes the membership function of the fuzzy subclass by adopting a trigonometric function and a trapezoidal function.
The rule base of the embodiment of the application includes: the Time of data-traffic is { low, medium, high }, the fixed byte (Fix-byte) of the data stream is { few, adequatate, rich }, the secure transport protocol (Safe-protocol) is { far, medium, close }, and the similarity to the known protocol is { very low, low, medium, high, very high }.
The specific membership function is shown in fig. 3, and in the graphs (a) to (d) shown in fig. 3, the abscissa represents: values are determined, the ordinate indicates: the classification value is blurred. In fig. 3, (a) shows membership functions of the data stream survival times, low in the graph (a) shows that the data stream survival time is short, and med (medium) shows that the data stream survival time is medium, high in the graph (a) shows that the data stream survival time is long.
In fig. 3, the graph (b) shows the membership function of the fixed bytes of the data stream, and few in the graph (b) shows that the fixed byte number of the data stream is small ade (adapt), which means that the fixed byte number of the data stream is equal to the average level rich, which means that the fixed byte number of the data stream is large.
In fig. 3, (c) shows membership functions of the security transport protocol, and far in the graph (c) shows the security transport protocol far (medium) shows the security transport protocol close.
In fig. 3, (d) shows a membership function of similarity to a known protocol, where vlow (very low) shows that the similarity between the data stream to be detected and the current known network protocol is very low, low shows that the similarity between the data stream to be detected and the current known network protocol is low, med (medium) shows that the similarity between the data stream to be detected and the current known network protocol is at an intermediate level, high shows that the similarity between the data stream to be detected and the current known network protocol is high, and vhigh (very high) shows that the similarity between the data stream to be detected and the current known network protocol is very high.
With continued reference to fig. 2, the fuzzy inference engine: and the fuzzy subset of the similarity is calculated according to the fuzzy implication relation input by the fuzzy rule base and the input fuzzy set.
The embodiments of the present application establish fuzzy inference rules, expressed in the form of "if … the …", based on previously known conditions (existing network protocol types). Here, the similarity (or superiority) is denoted by ad, and its fuzzy subset is denoted by ad (y).
In the above example, 27 rules are established in the fuzzy rule base, and the part is shown as follows:
If the Length of data-traffic is low,Fix-byte is few and Safe-protocol is far,then AD is very low;
……
If the Length of data-traffic is medium,Fix-byte is adequate and Safe-protocol is medium,then AD is medium;
……
If the Safe-protocol is high,Fix-byte is high and Safe-protocol is high,then AD is very high;
after obtaining fuzzy input of candidate characteristic attributes, the fuzzy inference engine carries out inference synthesis according to the fuzzy implication relation R, namely calculating to obtain fuzzy output AD (y) of the similarity of each known network protocol in the candidate protocol set:
A*(t,n,d)=T*(t)∧N*(n)∧D*(d);
Figure BDA0002545924810000131
let q (A)k,A*) Is Ak(t, n, d) and A*(t, n, d) closeness.
Figure BDA0002545924810000132
Thus, it is possible to obtain:
Figure BDA0002545924810000133
thereby obtaining a similarity fuzzy output AD (y):
Figure BDA0002545924810000134
referring to fig. 2, the output defuzzification interface: for defuzzifying the fuzzy subset AD (y) of the similarity, the determined value AD thereof is obtained.
In the embodiment of the present application, the defuzzifying of the fuzzy subset of similarity to determine a known network protocol corresponding to a protocol variation used by a data stream to be detected specifically includes: and calculating the gravity center of an area surrounded by the membership function curves of the similarity fuzzy subset, and determining the value corresponding to the gravity center as the known network protocol corresponding to the protocol variety. The output defuzzification interface in the embodiment Of the present application uses a Center-Of-Gravity method, that is, calculates the Center-Of-Gravity COG (Center-Of-Gravity) Of the region surrounded by the membership function curve.
Therefore, the network protocol variation detection method provided by the embodiment of the application can improve the accuracy of unknown protocol identification according to the actual condition of the network data stream. The final unique known network protocol is determined by adopting a fuzzy inference algorithm, the method is simple and easy to implement, the overall calculated amount is small, the method is particularly suitable for application scenes with high real-time processing requirements, and the application requirements in the fields of protocol analysis, information acquisition, information monitoring and the like in a wired network are met.
The same technical concept as the foregoing network protocol variation detection method, an embodiment of the present application further provides a network protocol variation detection apparatus, and fig. 4 shows a block diagram of the network protocol variation detection apparatus according to the embodiment of the present application. Referring to fig. 4, the network protocol variation detecting apparatus 400 according to the embodiment of the present application includes:
and a database construction module 410, configured to extract feature vectors of known network protocols to construct a feature database.
And the matching module 420 is configured to obtain a target feature vector of the data stream to be detected, match the target feature vector with feature vectors of known network protocols in a feature database, and determine a candidate network protocol set.
And the determining module 430 is used for determining the known network protocol corresponding to the protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm.
In this embodiment of the present application, the matching module 420 is specifically configured to calculate euclidean distances between the target feature vector and the feature vectors of the known network protocols, respectively; and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
In the embodiment of the present application, the determining module 430 is specifically configured to fuzzify a target feature vector of a data stream to be detected, so as to obtain a fuzzy set corresponding to each element in the target feature vector; performing reasoning synthesis according to the fuzzy set and a fuzzy implication relation in a pre-established fuzzy rule base to obtain a similarity fuzzy subset between the data stream to be detected and each known network protocol in the candidate network protocol set; and defuzzifying the similarity fuzzy subset to determine a known network protocol corresponding to the protocol variation used by the data stream to be detected.
In this embodiment of the application, the determining module 430 is specifically configured to calculate a gravity center of an area surrounded by membership function curves of the similarity fuzzy subset, and determine a value corresponding to the gravity center as a known network protocol corresponding to the protocol variation.
In this embodiment of the present application, the matching module 420 is specifically configured to scan the first 16 bytes of the intercepted data stream to be detected, so as to obtain a target feature vector of the data stream to be detected; the target feature vector comprises one or more of the following elements: data flow survival time, data flow mapping port, data flow fixed byte, data frame/datagram arrival interval, signature algorithm, secure transmission protocol, certificate duration, data frame/datagram length, protocol version number.
In this embodiment of the present application, the database building module 410 is specifically configured to extract feature vectors of known network protocols in a TCP/IP protocol cluster, and build a feature database.
It should be noted that, for the specific implementation of the above device embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.
In summary, according to the technical scheme for detecting the network protocol variation, the captured data stream is analyzed to obtain the target feature vector, the target feature vector is matched with the feature vector of the known network protocol in the database, the candidate network protocol set is preliminarily determined, and the accuracy of unknown protocol identification is improved. In addition, by preliminarily determining the candidate network protocol set, the matching efficiency is improved, the selection space corresponding to the known network protocol is expanded, and the phenomenon of 'missing matching' is effectively prevented. And finally, on the basis of the candidate network protocol set, the known network protocol type corresponding to the protocol variation used by the data stream is obtained by using a fuzzy inference algorithm, so that the overall calculation amount is small, and the method is suitable for application scenes with high requirements on real-time processing.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application. In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description. Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application. Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination. The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the network protocol variation detection apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form. For example, fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 500 comprises a processor 510 and a memory 520 arranged to store computer executable instructions (computer readable program code). The memory 520 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 520 has a storage space 530 storing computer readable program code 531 for performing any of the method steps in the above described method. For example, the storage space 530 for storing the computer readable program code may include respective computer readable program codes 531 for respectively implementing various steps in the above method. The computer readable program code 531 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 6. Fig. 6 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 600 has stored thereon a computer readable program code 531 for performing the steps of the method according to the application, readable by the processor 510 of the electronic device 500, which computer readable program code 531, when executed by the electronic device 500, causes the electronic device 500 to perform the steps of the method described above, in particular the computer readable program code 531 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 531 may be compressed in a suitable form. It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (8)

1. A method for detecting network protocol variations, comprising:
extracting a feature vector of a known network protocol, and constructing a feature database;
acquiring a target characteristic vector of a data stream to be detected, matching the target characteristic vector with characteristic vectors of known network protocols in a characteristic database, and determining a candidate network protocol set;
determining a known network protocol corresponding to a protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm;
matching the target feature vector with feature vectors of known network protocols in a feature database, and determining a candidate network protocol set comprises:
respectively calculating Euclidean distances between the target characteristic vector and the characteristic vectors of the known network protocols;
and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
2. The method of claim 1, wherein the determining the known network protocol corresponding to the protocol variation used by the data stream to be examined from the set of candidate network protocols based on the fuzzy inference algorithm comprises:
fuzzifying a target feature vector of a data stream to be detected to obtain a fuzzy set corresponding to each element in the target feature vector;
performing reasoning and synthesis according to the fuzzy set and a fuzzy implication relation in a pre-established fuzzy rule base to obtain a similarity fuzzy subset between the data stream to be detected and each known network protocol in the candidate network protocol set;
and defuzzifying the similarity fuzzy subset to determine a known network protocol corresponding to the protocol variation used by the data stream to be detected.
3. The method of claim 2, wherein defuzzifying the subset of similarity ambiguities to determine a known network protocol corresponding to a protocol variation used by the stream to be examined comprises:
and calculating the gravity center of an area surrounded by the membership function curves of the similarity fuzzy subset, and determining the value corresponding to the gravity center as the known network protocol corresponding to the protocol variety.
4. The method of claim 1, wherein said obtaining a target feature vector of a stream of data to be examined comprises:
scanning the first 16 bytes of the intercepted data stream to be detected to obtain a target characteristic vector of the data stream to be detected;
the target feature vector comprises one or more of the following elements: data flow survival time, data flow mapping port, data flow fixed byte, data frame/datagram arrival interval, signature algorithm, secure transmission protocol, certificate duration, data frame/datagram length, protocol version number.
5. The method of any one of claims 1-4, wherein extracting feature vectors for known network protocols and constructing a feature database comprises:
and extracting the characteristic vector of each known network protocol in the TCP/IP protocol cluster to construct a characteristic database.
6. A network protocol variation detection apparatus, comprising:
the database construction module is used for extracting the characteristic vector of the known network protocol and constructing a characteristic database;
the matching module is used for acquiring a target characteristic vector of the data stream to be detected, matching the target characteristic vector with characteristic vectors of known network protocols in a characteristic database and determining a candidate network protocol set;
the determining module is used for determining a known network protocol corresponding to a protocol variation used by the data stream to be detected from the candidate network protocol set based on a fuzzy inference algorithm;
the matching module is specifically used for respectively calculating Euclidean distances between the target characteristic vectors and characteristic vectors of known network protocols; and comparing the calculated Euclidean distance with a distance threshold, and when the Euclidean distance is smaller than the distance threshold, putting the corresponding known network protocol into a candidate network protocol set.
7. An electronic device, comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-5.
CN202010560524.1A 2020-06-18 2020-06-18 Network protocol variation detection method, device, electronic equipment and storage medium Active CN111726264B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010560524.1A CN111726264B (en) 2020-06-18 2020-06-18 Network protocol variation detection method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010560524.1A CN111726264B (en) 2020-06-18 2020-06-18 Network protocol variation detection method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111726264A CN111726264A (en) 2020-09-29
CN111726264B true CN111726264B (en) 2021-11-19

Family

ID=72567408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010560524.1A Active CN111726264B (en) 2020-06-18 2020-06-18 Network protocol variation detection method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111726264B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112422548A (en) * 2020-11-10 2021-02-26 宁波智轩物联网科技有限公司 Communication protocol setting system based on cloud controller
CN112801261A (en) * 2021-01-04 2021-05-14 郑州轻工业大学 Power data stream transmission time reasoning method based on graph neural network
CN114765634B (en) * 2021-01-13 2023-12-12 腾讯科技(深圳)有限公司 Network protocol identification method, device, electronic equipment and readable storage medium
CN113242205B (en) * 2021-03-19 2022-07-01 武汉绿色网络信息服务有限责任公司 Network traffic classification control method, device, server and storage medium
CN113253026A (en) * 2021-05-13 2021-08-13 北京三维天地科技股份有限公司 Monitoring method and device for on-off state of instrument
CN113268987B (en) * 2021-05-26 2023-08-11 北京百度网讯科技有限公司 Entity name recognition method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0707250A4 (en) * 1992-08-24 1996-01-08 Omron Tateisi Electronics Co Failure detection apparatus and method
WO2008008046A1 (en) * 2006-07-11 2008-01-17 Agency For Science, Technology And Research Method and system for multi-object tracking
CN102164182A (en) * 2011-04-18 2011-08-24 北京神州绿盟信息安全科技股份有限公司 Device and method for identifying network protocol
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system
CN105024993A (en) * 2015-05-25 2015-11-04 上海南邮实业有限公司 Protocol comparison method based on vector operation
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN108092948A (en) * 2016-11-23 2018-05-29 中国移动通信集团湖北有限公司 A kind of recognition methods of network attack mode and device
CN109495296A (en) * 2018-11-02 2019-03-19 国网四川省电力公司电力科学研究院 Intelligent substation communication network state evaluation method based on clustering and neural network
CN109525457A (en) * 2018-11-14 2019-03-26 中国人民解放军陆军工程大学 A kind of network protocol fuzz testing method based on state transition traversal
US10623426B1 (en) * 2017-07-14 2020-04-14 NortonLifeLock Inc. Building a ground truth dataset for a machine learning-based security application

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624587B (en) * 2012-03-26 2015-04-29 中国电力科学研究院 System and method capable of achieving defect detection for IEC60870-5-101/104 communication protocol
US20160357790A1 (en) * 2012-08-20 2016-12-08 InsideSales.com, Inc. Resolving and merging duplicate records using machine learning
CN104155574B (en) * 2014-07-31 2017-12-15 国网湖北省电力公司武汉供电公司 Distribution network failure sorting technique based on Adaptive Neuro-fuzzy Inference
CN104270392B (en) * 2014-10-24 2017-09-26 中国科学院信息工程研究所 A kind of network protocol identification method learnt based on three grader coorinated trainings and system
CN105827469A (en) * 2014-12-29 2016-08-03 国家电网公司 MODBUS TCP implementation defect tester and detection method thereof
US9979740B2 (en) * 2015-12-15 2018-05-22 Flying Cloud Technologies, Inc. Data surveillance system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0707250A4 (en) * 1992-08-24 1996-01-08 Omron Tateisi Electronics Co Failure detection apparatus and method
WO2008008046A1 (en) * 2006-07-11 2008-01-17 Agency For Science, Technology And Research Method and system for multi-object tracking
CN102164182A (en) * 2011-04-18 2011-08-24 北京神州绿盟信息安全科技股份有限公司 Device and method for identifying network protocol
CN103297427A (en) * 2013-05-21 2013-09-11 中国科学院信息工程研究所 Unknown network protocol identification method and system
CN105024993A (en) * 2015-05-25 2015-11-04 上海南邮实业有限公司 Protocol comparison method based on vector operation
CN108092948A (en) * 2016-11-23 2018-05-29 中国移动通信集团湖北有限公司 A kind of recognition methods of network attack mode and device
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
US10623426B1 (en) * 2017-07-14 2020-04-14 NortonLifeLock Inc. Building a ground truth dataset for a machine learning-based security application
CN109495296A (en) * 2018-11-02 2019-03-19 国网四川省电力公司电力科学研究院 Intelligent substation communication network state evaluation method based on clustering and neural network
CN109525457A (en) * 2018-11-14 2019-03-26 中国人民解放军陆军工程大学 A kind of network protocol fuzz testing method based on state transition traversal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于数据流特征向量识别的P2P僵尸网络检测方法研究";汤伟;《中国优秀硕士学位论文全文数据库》;20150115;全文 *

Also Published As

Publication number Publication date
CN111726264A (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN111726264B (en) Network protocol variation detection method, device, electronic equipment and storage medium
CN112203282B (en) 5G Internet of things intrusion detection method and system based on federal transfer learning
CN109600363B (en) Internet of things terminal network portrait and abnormal network access behavior detection method
CN107733851A (en) DNS tunnels Trojan detecting method based on communication behavior analysis
Bhuyan et al. AOCD: An Adaptive Outlier Based Coordinated Scan Detection Approach.
He et al. Inferring application type information from tor encrypted traffic
Malik et al. Feature engineering and machine learning framework for DDoS attack detection in the standardized internet of things
CN115277102B (en) Network attack detection method and device, electronic equipment and storage medium
Soleimani et al. Real-time identification of three Tor pluggable transports using machine learning techniques
Pashaei et al. Early Intrusion Detection System using honeypot for industrial control networks
CN113328985A (en) Passive Internet of things equipment identification method, system, medium and equipment
Li et al. Street-Level Landmarks Acquisition Based on SVM Classifiers.
CN111598711A (en) Target user account identification method, computer equipment and storage medium
CN114301850B (en) Military communication encryption flow identification method based on generation of countermeasure network and model compression
Choudhary et al. CRIDS: Correlation and regression-based network intrusion detection system for IoT
CN113726809B (en) Internet of things equipment identification method based on flow data
CN110472410B (en) Method and device for identifying data and data processing method
CN111181969A (en) Spontaneous flow-based Internet of things equipment identification method
US11495101B2 (en) Method of communicating between a client-server system and remote clients
CN114760216A (en) Scanning detection event determination method and device and electronic equipment
CN115021986A (en) Construction method and device for Internet of things equipment identification deployable model
CN114205816A (en) Information security architecture of power mobile Internet of things and use method thereof
Du et al. Fenet: Roles classification of ip addresses using connection patterns
din et al. Detection of botnet in IoT network through machine learning based optimized feature importance via ensemble models
Oliveira et al. Do we need a perfect ground-truth for benchmarking Internet traffic classifiers?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant