CN116886637A - Single-feature encryption stream detection method and system based on graph integration - Google Patents
Single-feature encryption stream detection method and system based on graph integration Download PDFInfo
- Publication number
- CN116886637A CN116886637A CN202311133687.1A CN202311133687A CN116886637A CN 116886637 A CN116886637 A CN 116886637A CN 202311133687 A CN202311133687 A CN 202311133687A CN 116886637 A CN116886637 A CN 116886637A
- Authority
- CN
- China
- Prior art keywords
- graph
- data packet
- flow
- feature
- uplink
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 53
- 230000010354 integration Effects 0.000 title claims abstract description 45
- 230000002452 interceptive effect Effects 0.000 claims abstract description 52
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000011144 upstream manufacturing Methods 0.000 claims abstract description 26
- 238000013145 classification model Methods 0.000 claims abstract description 24
- 238000013528 artificial neural network Methods 0.000 claims abstract description 21
- 230000003993 interaction Effects 0.000 claims description 39
- 230000008569 process Effects 0.000 description 28
- 238000010586 diagram Methods 0.000 description 23
- 238000004364 calculation method Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000000059 patterning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application provides a single-feature encryption stream detection method and a system based on graph integration, wherein the method comprises the following steps: acquiring a plurality of data packets in the flow information, and dividing the plurality of data packets into a plurality of data streams based on quintuple information of the data packets; acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to acquire a second characteristic value; constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph; based on node attributes of nodes of the uplink flow interactive graph and the middle downlink flow interactive graph, performing graph integration on the uplink flow interactive graph and the downlink flow interactive graph to obtain an uplink flow integral graph and a downlink flow integral graph; and splicing the upstream flow integral graph and the downstream flow integral graph to obtain a combined integral graph, and obtaining a detection result based on inputting the combined integral graph into a preset neural network classification model.
Description
Technical Field
The application relates to the technical field of flow detection, in particular to a single-feature encryption flow detection method and system based on graph integration.
Background
In the prior art, by using data encryption in the transmission of the internet of things, the visibility of the internet of things equipment to malicious stream attack can be effectively reduced. Extracting the basic properties from the encrypted data header also provides additional benefits, potentially enabling real-time classification.
However, classification detection implemented using machine learning in existing research generally requires fusion of multidimensional features, and often uses only the raw data itself for detection, resulting in lower detection accuracy.
Disclosure of Invention
In view of this, embodiments of the present application provide a single feature encryption stream detection method based on graph integration to obviate or ameliorate one or more of the disadvantages of the prior art.
One aspect of the present application provides a method for detecting a single-feature encrypted stream based on graph integration, the method comprising the steps of:
acquiring flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to obtain a second characteristic value;
constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph based on the characteristic sequence, wherein the uplink flow interactive graph and the downlink flow interactive graph comprise a plurality of nodes;
respectively integrating the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
By adopting the scheme, the whole detection process can be finished only by the first characteristic value of the data packet, fusion of multidimensional characteristics is not needed, and the calculation difficulty is reduced; the scheme classifies the data flow by constructing the joint integral graph, on one hand, the graph integral is applied to the detection of the data flow, the detection precision is improved, and on the other hand, the scheme integrates the graph of the original node attribute, fully considers the capability of transmitting information between nodes, and further improves the detection precision.
In some embodiments of the present application, in the step of dividing the plurality of data packets into the plurality of data streams based on the five-tuple information of the data packets, a combination of the data packets conforming to any one of the conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address is taken as one data stream based on the source IP address and the destination IP address in the five-tuple information.
In some embodiments of the present application, in the step of obtaining the first characteristic value of the data packet in the data stream, the length of each data packet is obtained by parsing, and the length of the data packet is taken as the first characteristic value.
In some embodiments of the present application, in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an uplink data packet or a downlink data packet, and obtaining the second feature value, the data packet is classified into an uplink data packet and a downlink data packet based on a source IP address and a destination IP address of the data packet in one data stream, and the first flag is assigned to the uplink data packet, and the second flag is assigned to the downlink data packet.
In some embodiments of the present application, in the step of assigning a first flag or a second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
In some embodiments of the present application, the step of constructing an upstream traffic interaction map and a downstream traffic interaction map based on the feature sequence includes:
constructing an uplink feature sequence and a downlink feature sequence based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
and constructing an uplink flow interactive graph by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and constructing a downlink flow interactive graph by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
In some embodiments of the present application, in the step of integrating the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of the nodes are updated based on the order of the nodes in the graph.
In some embodiments of the present application, in the step of splicing the upstream flow integral graph and the downstream flow integral graph to obtain a joint integral graph, nodes in the upstream flow integral graph and the downstream flow integral graph are sequentially combined to obtain the joint integral graph.
In some embodiments of the present application, in the step of inputting the joint integral map to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
The second aspect of the present application also provides a single feature encrypted stream detection system based on graph integration, the system comprising a computer device comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps of the method as hereinbefore described when the computer instructions are executed by the processor.
The third aspect of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps performed by the aforementioned graph integration based single feature encrypted stream detection method.
Additional advantages, objects, and features of the application will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present application are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present application will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and together with the description serve to explain the application.
FIG. 1 is a schematic diagram of an embodiment of a single feature encrypted stream detection method based on graph integration according to the present application;
FIG. 2 is a schematic diagram of another embodiment of a single-feature encrypted stream detection method based on graph integration according to the present application;
FIG. 3 is a schematic diagram of integrating the upstream flow interaction graph to obtain an upstream flow integration graph;
FIG. 4 is a schematic diagram of integrating the downstream flow interaction graph to obtain a downstream flow integration graph;
fig. 5 is a schematic diagram of the joint integrating diagram obtained by splicing the upstream flow integrating diagram and the downstream flow integrating diagram.
Detailed Description
The present application will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present application more apparent. The exemplary embodiments of the present application and the descriptions thereof are used herein to explain the present application, but are not intended to limit the application.
It should be noted here that, in order to avoid obscuring the present application due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present application are shown in the drawings, while other details not greatly related to the present application are omitted.
In order to solve the above problems, as shown in fig. 1, the present application provides a single-feature encrypted stream detection method based on graph integration, where the steps of the method include:
step S100, obtaining flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
in a specific implementation process, pyshare (using a Wireshark Python packet parsing tool) is used to parse and extract the pcap (packet capture) packet, and specifically, when a user and a client have information interaction, traffic information can be collected.
In a specific implementation process, the five-tuple information of the data packet includes a source IP address, a source port, a destination IP address, a destination port and a transport layer protocol.
By adopting the scheme, as the flow information generally covers various types of flow information, all data packets in the flow information are firstly split into independent flows, and the data of each data packet is read into the csv file according to the line after the extraction is completed.
Step S200, a first characteristic value of a data packet in a data stream is obtained, and a first mark or a second mark is given to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet, so as to obtain a second characteristic value;
in a specific implementation process, the first characteristic value may be a data length in a data packet or a data packet size.
In the implementation process, the first mark or the second mark can be a positive mark or a negative mark, and if the first mark is a positive mark, the second mark is a negative mark; if the first mark is a negative sign mark, the second mark is a positive sign mark.
In a specific implementation process, in the step of assigning the first flag or the second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the obtained csv file is read in the form of a dataframe.
Step S300, constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interaction diagram and a downlink flow interaction diagram based on the characteristic sequence, wherein the uplink flow interaction diagram and the downlink flow interaction diagram comprise a plurality of nodes;
in some embodiments of the present application, in the step of constructing the upstream traffic interaction graph, an upstream feature sequence and a downstream feature sequence are constructed based on the feature sequence, and in the step of constructing the upstream feature sequence, a second feature value of a downstream data packet in the feature sequence is replaced with 0; in the step of constructing the downlink feature sequence, replacing the second feature value of the uplink data packet in the feature sequence with 0, and reserving the sequence of the original second feature value in the feature sequence.
In the implementation process, if the first mark is set to be a positive mark, the second mark is set to be a negative mark, the first mark is given to the first characteristic value of the uplink data packet, and the second mark is given to the first characteristic value of the downlink data packet; and if the feature sequence is-66, -60, +60, +1514, -183, the uplink feature sequence is 0, +60, +1514, 0, the downlink feature sequence is-66, -60, 0, -183, the third feature value of the uplink feature sequence is used as the node attribute of the node in the uplink flow interactive graph to construct the uplink flow interactive graph, and the third feature value of the downlink feature sequence is used as the node attribute of the node in the downlink flow interactive graph to construct the downlink flow interactive graph.
Step S400, performing graph integration on the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph respectively to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and S500, splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
In the implementation process, each joint integral graph corresponds to one data stream, the preset neural network classification model is a graph neural network model, and a detection result is output through a classifier of the graph neural network model, wherein the detection result can be a malicious stream or a non-malicious stream, a malicious stream type and the like.
In a specific implementation process, the neural network classification model may adopt a GCN classification model, a GIN classification model or a GAT classification model which are completed through training.
In the specific implementation process, the graph structure of the scheme is represented by a tree structure, and a Cross linked List data structure is adopted during storage, so that storage pressure is reduced, and the Cross linked List (Cross-linked List) structure is a storage structure of the graph in the bottom layer of the computer, so that the structure information of the graph can be easily read and related integral operation can be carried out on nodes through the structure.
By adopting the scheme, the whole detection process can be finished only by the first characteristic value of the data packet, fusion of multidimensional characteristics is not needed, and the calculation difficulty is reduced; the scheme classifies the data flow by constructing the joint integral graph, on one hand, the graph integral is applied to the detection of the data flow, the detection precision is improved, and on the other hand, the scheme integrates the graph of the original node attribute, fully considers the capability of transmitting information between nodes, and further improves the detection precision.
In some embodiments of the present application, in the step of dividing the plurality of data packets into the plurality of data streams based on the five-tuple information of the data packets, a combination of the data packets conforming to any one of the conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address is taken as one data stream based on the source IP address and the destination IP address in the five-tuple information.
In a specific implementation process, one data stream includes an uplink data packet and a downlink data packet, and if the source IP address is a and the destination IP address is B, the source IP address is B and the destination IP address is a downlink data packet.
In the implementation process, if the data packet x1 exists, the source IP address is A, and the destination IP address is B; the data packet x2 exists, the source IP address is A, and the destination IP address is B; the data packet x3 exists, the source IP address is B, and the destination IP address is A; if there is a packet x4, the source IP address is B, and the destination IP address is a, the 4 packets conform to one of two conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address, and the combination of the 4 packets is used as one data stream.
In some embodiments of the present application, in the step of obtaining the first characteristic value of the data packet in the data stream, the length of each data packet is obtained by parsing, and the length of the data packet is taken as the first characteristic value.
By adopting the scheme, the whole detection process can be completed only through one feature of the data packet, fusion of multidimensional features is not needed, and the calculation difficulty is reduced.
In some embodiments of the present application, in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an uplink data packet or a downlink data packet, and obtaining the second feature value, the data packet is classified into an uplink data packet and a downlink data packet based on a source IP address and a destination IP address of the data packet in one data stream, and the first flag is assigned to the uplink data packet, and the second flag is assigned to the downlink data packet.
In some embodiments of the present application, in the step of assigning a first flag or a second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
In a specific implementation process, the first characteristic value may be a data length in a data packet or a data packet size.
As shown in fig. 2, in some embodiments of the present application, the step 300 includes a step S310 of constructing a second eigenvalue in the data stream as an eigenvalue sequence;
in the implementation process, the step of constructing the uplink flow interaction diagram and the downlink flow interaction diagram based on the feature sequence comprises the following steps:
step S320, an uplink feature sequence and a downlink feature sequence are constructed based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
in the specific implementation process, in the step of constructing the uplink feature sequence, replacing a second feature value of a downlink data packet in the feature sequence with 0; in the step of constructing the downlink feature sequence, replacing the second feature value of the uplink data packet in the feature sequence with 0, and reserving the sequence of the original second feature value in the feature sequence.
In the implementation process, if the first mark is set to be a positive mark, the second mark is set to be a negative mark, the first mark is given to the first characteristic value of the uplink data packet, and the second mark is given to the first characteristic value of the downlink data packet; and if the signature sequence is-66, -60, +60, +1514, -183, the upstream signature sequence is 0, +60, +1514, 0, and the downstream signature sequence is-66, -60, 0, -183.
Step S330, an uplink flow interactive graph is constructed by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and a downlink flow interactive graph is constructed by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
In the specific implementation process, the third characteristic value of the uplink characteristic sequence is used as the node attribute of the node in the uplink flow interactive graph, an edge is constructed between the corresponding nodes of the adjacent third characteristic values, the uplink flow interactive graph is obtained, the third characteristic value of the downlink characteristic sequence is used as the node attribute of the node in the downlink flow interactive graph, and an edge is constructed between the corresponding nodes of the adjacent third characteristic values, so that the downlink flow interactive graph is obtained.
By adopting the scheme, the uplink flow interaction diagram and the downlink flow interaction diagram are respectively constructed based on the characteristic values of the single characteristic of the data packet, the uplink flow interaction diagram and the downlink flow interaction diagram are respectively constructed on the basis of the characteristic sequences, the data are respectively embodied in the uplink direction and the downlink direction, and the accuracy of detecting the two directions when the data are finally detected is improved.
In some embodiments of the present application, in the step of integrating the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of the nodes are updated based on the order of the nodes in the graph.
As shown in fig. 3 and 4, in the implementation process, if the node attributes of the nodes in the uplink traffic interaction graph are 0, +60, +1514, and 0, respectively, and the graph integration is performed on the uplink traffic interaction graph, in the first integration process for the uplink traffic interaction graph, the node attribute of the node after the first node is added to the node attribute of the original first node, the node attributes of the nodes are updated to 0, +60, +1514, and 0, in the second integration process, the node attribute of the node after the second node is added to the node attribute of the original second node, the node attribute of each node is updated to 0, +60, +1514, 0, and the node attribute of each node is updated to 0, +60, +1514, 0 by integrating the node attribute of each node for the third time, the node attribute updates for each node are 0, +60, +1574, +60, the node attribute of each node is updated to 0 in the fourth integration 0, +60, +1574, +60;
if the node attributes of the nodes in the downlink flow interactive graph are-66, -60, 0 and-183, adding the node attribute of the node after the first node to the node attribute of the original first node in the first integration process of the uplink flow interactive graph, updating the node attribute of each node to be-66, -132, -126, -66 and-249, adding the node attribute of the node after the second node to the node attribute of the original second node in the second integration process, node attribute updates for each node are-66, -132, -192, -132, -315, and in the same step, node attribute updates for each node are-66, -132, -192, -375, the node attribute updates for each node are-66, -132, -192, -375, and integrating node attribute updates of each node for the fifth time to be-66, -132, -192 and-375, and obtaining node attributes of each node in the downstream flow integrated graph to be-66, -132, -192 and-375.
By adopting the scheme, the scheme further integrates the uplink flow interaction diagram and the downlink flow interaction diagram, data in the two diagrams are improved to be richer, the scheme is combined with the topological structure of the diagram structure, the information transmission capability between nodes is further embodied, the path integration principle is applied to the diagram structure, and the accuracy of single-feature detection is improved by utilizing the information transmission capability between the nodes.
In some embodiments of the present application, in the step of splicing the upstream flow integral graph and the downstream flow integral graph to obtain a joint integral graph, nodes in the upstream flow integral graph and the downstream flow integral graph are sequentially combined to obtain the joint integral graph.
In some embodiments of the present application, as shown in fig. 5, if the node attribute of each node in the upstream traffic integral graph is 0, +60, +1574, and the node attribute of each node in the downstream traffic integral graph is-66, -132, -192, -375, and combining the node sequences in the uplink flow integral graph and the downlink flow integral graph to obtain a joint integral graph, wherein the node attribute of each node in the joint integral graph is 0, +60, +1574, -66, -132, -192, -375.
In a specific implementation process, the flow detection is performed by patterning the original flow interaction information and integrating the graph, namely performing discrete integration on nodes on the graph in the original flow interaction graph, so that an uplink flow integration graph and a downlink flow integration graph are constructed, and the detection capability of the graph integration can be improved on the basis of the original graph, so that the scheme achieves a better detection result on the premise of a small number of characteristics.
The solution combines interactions between neighborhood node features and MQTT and CoAP protocol based client servers to convert network flow classification problems to graph classification problems, allowing limited feature enhancement detection capabilities to be used.
In some embodiments of the present application, in the step of inputting the joint integral map to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
In a specific implementation process, the scheme detects the original traffic through a single feature such as a data packet length, and in a previous deep learning detection method, only a few selected features are usually needed to classify malicious streams. In the scheme, the method is applied to the construction of the graph based on the data packet length sequence only, so that the detection capability of a single-feature-based method (such as the data packet length sequence) is enhanced.
In the specific implementation process, the existing flow detection mainly has the following problems:
1. existing studies using machine learning typically require fusion of multidimensional features, require computationally intensive feature processing (e.g., dynamic time warping of packet time series), but the use of some low-level features presents challenges for accurate detection.
2. The existing graph structure has large storage redundancy and brings great additional cost to calculation.
3. The existing path integration method only operates on the level of the original data, and cannot be combined with topological structures such as a graph structure. Thus, the ability to pass information between nodes is ignored, adversely affecting analysis and decision making of data with limited characteristics;
the application provides a theory of graph integration, and applies the theory to network flow detection of the Internet of things and the Internet, thereby solving the problem of low accuracy of classifying the network flow by using a small amount of features. In the aspect of the storage of the graph structure, the structure is also a storage structure of the graph in the bottom layer of the computer, and the scheme can easily read the structural information of the graph and perform related integral operation on the nodes through the structure. Besides the common flow interactive graph, the scheme also provides a lightweight tree structure flow interactive graph representation mode, and the number of edges of the tree structure is smaller under the condition that the same nodes are found, so that the storage and calculation cost of a computer is reduced.
The beneficial effect of this scheme includes:
1. the scheme provides a theory of graph integration for the first time, and applies the theory to network flow detection of the Internet of things and the Internet, so that the problem of low accuracy of classifying by using a small number of features in network flow classification is solved. In the aspect of the storage of the graph structure, the structure is in the form of a cross linked list, and is also the storage structure of the graph in the bottom layer of the computer, so that the structure information of the graph can be easily read and the related integration operation can be carried out on the nodes through the structure. Besides the common flow interactive graph, a lightweight tree structure is provided for representing the flow interactive graph, and the number of edges of the tree structure is smaller under the condition that the same nodes are found, so that the storage and calculation cost of a computer is reduced.
2. The existing path integration method only operates on the level of the original data, and cannot be combined with topological structures such as a graph structure. Thus, the ability to pass information between nodes is ignored, adversely affecting the analysis and decision making of data with limited characteristics. The application applies the path integral principle to the graph structure, and improves the accuracy of single feature detection by utilizing the capability of information transmission between nodes.
Experimental example:
in the experimental example, as shown in the following table one, GCN represents that the adopted graph neural network is a graph convolution network, GIN represents that the adopted graph neural network is a graph isomorphic network (Graph Isomorphism Network, GIN), GAT represents that the adopted graph neural network is a graph annotation meaning network (Graph Attention Network, GAT), gu represents that uplink flow interaction graph, gd represents that downlink flow interaction graph, guf represents that uplink flow integration graph, gdf represents that downlink flow integration graph, gu+gd represents that the uplink flow interaction graph and the downlink flow interaction graph are spliced as input to the graph neural network; guf+Gdf represents the spliced upstream flow integral graph and downstream flow integral graph which adopt the scheme as input and are input into the graph neural network.
In a specific implementation, to further investigate the usefulness of GIT in multi-class classification, we utilized the UNSW-NB15 dataset. To truly restore the ratio of malicious and benign streams, we extracted 2000 packets for each attack type from the dataset and mixed them with 20000 packets for benign streams. We have chosen only the sequence of packet lengths as the characteristic of the analysis. We constructed classification results based on the packet length sequences in LSTM and RNN and based on the length sequences in GCN, GIN and GAT.
List one
The original sequence used in this experiment was a sequence constructed based on the data length of the data packet, GCN, GIN, GAT and batch_size of LSTM in this experimental example were 128, the number of hidden layers was 64, and they were all quite similar.
As shown in table one, by comparing the embodiments, the classification accuracy of the manner of using the original sequence and inputting the original sequence to the LSTM neural network is the lowest; further, under the condition that different graph neural networks are adopted, the classification accuracy of the mode of splicing the uplink flow integrating graph and the downlink flow integrating graph is higher than that of the mode of correspondingly adopting the spliced uplink flow interaction graph and the downlink flow interaction graph as inputs and inputting the spliced uplink flow integrating graph and the downlink flow integrating graph into the graph neural network.
From experimental results, we can conclude that the present scheme is effective for multi-class classification. In addition, GIN classifier achieves 0.8557 accuracy in ten categories classification, significantly higher than the highest accuracy 0.6172 achieved when only packet length features are used, and higher than the highest accuracy 0.6634 with spliced upstream and downstream traffic interaction maps as input to the graph neural network. Furthermore, classification accuracy achieved using GIT in graph-based classification is superior to that achieved without GIT.
The embodiment of the application also provides a single-feature encryption stream detection system based on graph integration, which comprises computer equipment, wherein the computer equipment comprises a processor and a memory, the memory is stored with computer instructions, the processor is used for executing the computer instructions stored in the memory, and the system realizes the steps realized by the method when the computer instructions are executed by the processor.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, is configured to implement the steps implemented by the single-feature encryption stream detection method based on graph integration. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present application.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, and various modifications and variations can be made to the embodiments of the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (10)
1. A single-feature encrypted stream detection method based on graph integration, characterized in that the method comprises the steps of:
acquiring flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to obtain a second characteristic value;
constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph based on the characteristic sequence, wherein the uplink flow interactive graph and the downlink flow interactive graph comprise a plurality of nodes;
respectively integrating the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
2. The map integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of dividing a plurality of data packets into a plurality of data streams based on five-tuple information of the data packets, a combination of data packets conforming to any one of conditions that a source IP address and a destination IP address are the same, or that a destination IP address is the same as the source IP address and a source IP address is the same as the destination IP address is taken as one data stream, based on the source IP address and the destination IP address in the five-tuple information.
3. The method for detecting a single-feature encrypted stream based on graph integration according to claim 1, wherein in the step of obtaining a first feature value of a packet in a data stream, a length of each packet is obtained by parsing, and the length of the packet is used as the first feature value.
4. The method according to claim 1, wherein in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an upstream data packet or a downstream data packet to obtain a second feature value, the data packet is classified into an upstream data packet and a downstream data packet based on a source IP address and a destination IP address of the data packet in one data stream, the first flag is assigned to the upstream data packet, and the second flag is assigned to the downstream data packet.
5. The method according to claim 1, wherein in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an upstream data packet or a downstream data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
6. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein the step of constructing an upstream traffic interaction graph and a downstream traffic interaction graph based on the feature sequence comprises:
constructing an uplink feature sequence and a downlink feature sequence based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
and constructing an uplink flow interactive graph by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and constructing a downlink flow interactive graph by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
7. The graph integration-based single-feature encrypted stream detection method according to any one of claims 1 to 6, wherein in the step of performing graph integration on the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of nodes are updated based on the order of the nodes in the graph.
8. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of splicing the upstream traffic integral graph and the downstream traffic integral graph to obtain a joint integral graph, nodes in the upstream traffic integral graph and the downstream traffic integral graph are sequentially combined to obtain the joint integral graph.
9. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of inputting the joint integration graph to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
10. A graph integration based single feature encrypted stream detection system comprising a computer device including a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps implemented by the method according to any one of claims 1-9 when the computer instructions are executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311133687.1A CN116886637B (en) | 2023-09-05 | 2023-09-05 | Single-feature encryption stream detection method and system based on graph integration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311133687.1A CN116886637B (en) | 2023-09-05 | 2023-09-05 | Single-feature encryption stream detection method and system based on graph integration |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116886637A true CN116886637A (en) | 2023-10-13 |
CN116886637B CN116886637B (en) | 2023-12-19 |
Family
ID=88262445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311133687.1A Active CN116886637B (en) | 2023-09-05 | 2023-09-05 | Single-feature encryption stream detection method and system based on graph integration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116886637B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113795773A (en) * | 2019-03-08 | 2021-12-14 | 欧司朗股份有限公司 | Component for a LIDAR sensor system, LIDAR sensor device, method for a LIDAR sensor system and method for a LIDAR sensor device |
US20220066456A1 (en) * | 2016-02-29 | 2022-03-03 | AI Incorporated | Obstacle recognition method for autonomous robots |
CN114615093A (en) * | 2022-05-11 | 2022-06-10 | 南京信息工程大学 | Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning |
CN115242496A (en) * | 2022-07-20 | 2022-10-25 | 安徽工业大学 | Tor encrypted traffic application behavior classification method and device based on residual error network |
CN115303901A (en) * | 2022-08-05 | 2022-11-08 | 北京航空航天大学 | Elevator traffic flow identification method based on computer vision |
-
2023
- 2023-09-05 CN CN202311133687.1A patent/CN116886637B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220066456A1 (en) * | 2016-02-29 | 2022-03-03 | AI Incorporated | Obstacle recognition method for autonomous robots |
CN113795773A (en) * | 2019-03-08 | 2021-12-14 | 欧司朗股份有限公司 | Component for a LIDAR sensor system, LIDAR sensor device, method for a LIDAR sensor system and method for a LIDAR sensor device |
CN114615093A (en) * | 2022-05-11 | 2022-06-10 | 南京信息工程大学 | Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning |
CN115242496A (en) * | 2022-07-20 | 2022-10-25 | 安徽工业大学 | Tor encrypted traffic application behavior classification method and device based on residual error network |
CN115303901A (en) * | 2022-08-05 | 2022-11-08 | 北京航空航天大学 | Elevator traffic flow identification method based on computer vision |
Non-Patent Citations (2)
Title |
---|
XIAOJUAN WANG等: "Evolutionary Algorithm-Based and Network Architecture Search-Enabled Multiobjective Traffic Classification", SPECIAL SECTION ON INTELLIGENT BIG DATA ANALYTICS FOR INTERNET OF THINGS, SERVICES AND PEOPLE * |
文坤;杨家海;程凤娟;尹辉;王健峰;: "骨干网络中RoQ攻击的监测、定位和识别", 计算机研究与发展, no. 04 * |
Also Published As
Publication number | Publication date |
---|---|
CN116886637B (en) | 2023-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191767B (en) | Vectorization-based malicious traffic attack type judging method | |
CN107967311B (en) | Method and device for classifying network data streams | |
CN113328985B (en) | Passive Internet of things equipment identification method, system, medium and equipment | |
US20170316082A1 (en) | Classifying social media users | |
CN108173704A (en) | A kind of method and device of the net flow assorted based on representative learning | |
CN106254321A (en) | A kind of whole network abnormal data stream sorting technique | |
CN107483451B (en) | Method and system for processing network security data based on serial-parallel structure and social network | |
CN112468324B (en) | Graph convolution neural network-based encrypted traffic classification method and device | |
CN111526099A (en) | Internet of things application flow detection method based on deep learning | |
CN111355671B (en) | Network traffic classification method, medium and terminal equipment based on self-attention mechanism | |
CN103973589A (en) | Network traffic classification method and device | |
Zhang et al. | Density approach: a new model for BigData analysis and visualization | |
CN108199878B (en) | Personal identification information identification system and method in high-performance IP network | |
CN116886637B (en) | Single-feature encryption stream detection method and system based on graph integration | |
Dener et al. | RFSE-GRU: Data balanced classification model for mobile encrypted traffic in big data environment | |
CN116170237B (en) | Intrusion detection method fusing GNN and ACGAN | |
CN109327404B (en) | P2P prediction method and system based on naive Bayes classification algorithm, server and medium | |
CN104753934A (en) | Method for separating known protocol multi-communication-parties data stream into point-to-point data stream | |
Li et al. | Interaction matters: Encrypted traffic classification via status-based interactive behavior graph | |
Filasiak et al. | On the testing of network cyber threat detection methods on spam example | |
Pai et al. | An Interpretable Generalization Mechanism for Accurately Detecting Anomaly and Identifying Networking Intrusion Techniques | |
CN110689074A (en) | Feature selection method based on fuzzy set feature entropy value calculation | |
CN117424764B (en) | System resource access request information processing method and device, electronic equipment and medium | |
Keshapagu et al. | Analysis of datasets for network traffic classification | |
CN114048829B (en) | Network flow channelization time sequence screening method and device based on template construction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |