CN116886637B - Single-feature encryption stream detection method and system based on graph integration - Google Patents

Single-feature encryption stream detection method and system based on graph integration Download PDF

Info

Publication number
CN116886637B
CN116886637B CN202311133687.1A CN202311133687A CN116886637B CN 116886637 B CN116886637 B CN 116886637B CN 202311133687 A CN202311133687 A CN 202311133687A CN 116886637 B CN116886637 B CN 116886637B
Authority
CN
China
Prior art keywords
graph
data packet
flow
feature
uplink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311133687.1A
Other languages
Chinese (zh)
Other versions
CN116886637A (en
Inventor
何明枢
韩影
王小娟
阳柳
郭世泽
俞赛赛
路子逵
王欣蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202311133687.1A priority Critical patent/CN116886637B/en
Publication of CN116886637A publication Critical patent/CN116886637A/en
Application granted granted Critical
Publication of CN116886637B publication Critical patent/CN116886637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a single-feature encryption stream detection method and a system based on graph integration, wherein the method comprises the following steps: acquiring a plurality of data packets in the flow information, and dividing the plurality of data packets into a plurality of data streams based on quintuple information of the data packets; acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to acquire a second characteristic value; constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph; based on node attributes of nodes of the uplink flow interactive graph and the middle downlink flow interactive graph, performing graph integration on the uplink flow interactive graph and the downlink flow interactive graph to obtain an uplink flow integral graph and a downlink flow integral graph; and splicing the upstream flow integral graph and the downstream flow integral graph to obtain a combined integral graph, and obtaining a detection result based on inputting the combined integral graph into a preset neural network classification model.

Description

Single-feature encryption stream detection method and system based on graph integration
Technical Field
The invention relates to the technical field of flow detection, in particular to a single-feature encryption flow detection method and system based on graph integration.
Background
In the prior art, by using data encryption in the transmission of the internet of things, the visibility of the internet of things equipment to malicious stream attack can be effectively reduced. Extracting the basic properties from the encrypted data header also provides additional benefits, potentially enabling real-time classification.
However, classification detection implemented using machine learning in existing research generally requires fusion of multidimensional features, and often uses only the raw data itself for detection, resulting in lower detection accuracy.
Disclosure of Invention
In view of this, embodiments of the present invention provide a single feature encryption stream detection method based on graph integration to obviate or ameliorate one or more of the disadvantages of the prior art.
One aspect of the present invention provides a method for detecting a single-feature encrypted stream based on graph integration, the method comprising the steps of:
acquiring flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to obtain a second characteristic value;
constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph based on the characteristic sequence, wherein the uplink flow interactive graph and the downlink flow interactive graph comprise a plurality of nodes;
respectively integrating the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
By adopting the scheme, the whole detection process can be finished only by the first characteristic value of the data packet, fusion of multidimensional characteristics is not needed, and the calculation difficulty is reduced; the scheme classifies the data flow by constructing the joint integral graph, on one hand, the graph integral is applied to the detection of the data flow, the detection precision is improved, and on the other hand, the scheme integrates the graph of the original node attribute, fully considers the capability of transmitting information between nodes, and further improves the detection precision.
In some embodiments of the present invention, in the step of dividing the plurality of data packets into the plurality of data streams based on the five-tuple information of the data packets, a combination of the data packets conforming to any one of the conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address is taken as one data stream based on the source IP address and the destination IP address in the five-tuple information.
In some embodiments of the present invention, in the step of obtaining the first characteristic value of the data packet in the data stream, the length of each data packet is obtained by parsing, and the length of the data packet is taken as the first characteristic value.
In some embodiments of the present invention, in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an uplink data packet or a downlink data packet, and obtaining the second feature value, the data packet is classified into an uplink data packet and a downlink data packet based on a source IP address and a destination IP address of the data packet in one data stream, and the first flag is assigned to the uplink data packet, and the second flag is assigned to the downlink data packet.
In some embodiments of the present invention, in the step of assigning a first flag or a second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
In some embodiments of the present invention, the step of constructing an upstream traffic interaction map and a downstream traffic interaction map based on the feature sequence includes:
constructing an uplink feature sequence and a downlink feature sequence based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
and constructing an uplink flow interactive graph by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and constructing a downlink flow interactive graph by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
In some embodiments of the present invention, in the step of integrating the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of the nodes are updated based on the order of the nodes in the graph.
In some embodiments of the present invention, in the step of splicing the upstream flow integral graph and the downstream flow integral graph to obtain a joint integral graph, nodes in the upstream flow integral graph and the downstream flow integral graph are sequentially combined to obtain the joint integral graph.
In some embodiments of the present invention, in the step of inputting the joint integral map to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
The second aspect of the present invention also provides a single feature encrypted stream detection system based on graph integration, the system comprising a computer device comprising a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps of the method as hereinbefore described when the computer instructions are executed by the processor.
The third aspect of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps performed by the aforementioned graph integration based single feature encrypted stream detection method.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the above-described specific ones, and that the above and other objects that can be achieved with the present invention will be more clearly understood from the following detailed description.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate and together with the description serve to explain the invention.
FIG. 1 is a schematic diagram of an embodiment of a single feature encrypted stream detection method based on graph integration according to the present invention;
FIG. 2 is a schematic diagram of another embodiment of a single-feature encrypted stream detection method based on graph integration according to the present invention;
FIG. 3 is a schematic diagram of integrating the upstream flow interaction graph to obtain an upstream flow integration graph;
FIG. 4 is a schematic diagram of integrating the downstream flow interaction graph to obtain a downstream flow integration graph;
fig. 5 is a schematic diagram of the joint integrating diagram obtained by splicing the upstream flow integrating diagram and the downstream flow integrating diagram.
Detailed Description
The present invention will be described in further detail with reference to the following embodiments and the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. The exemplary embodiments of the present invention and the descriptions thereof are used herein to explain the present invention, but are not intended to limit the invention.
It should be noted here that, in order to avoid obscuring the present invention due to unnecessary details, only structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, while other details not greatly related to the present invention are omitted.
In order to solve the above problems, as shown in fig. 1, the present invention provides a single-feature encrypted stream detection method based on graph integration, where the steps of the method include:
step S100, obtaining flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
in a specific implementation process, pyshare (using a Wireshark Python packet parsing tool) is used to parse and extract the pcap (packet capture) packet, and specifically, when a user and a client have information interaction, traffic information can be collected.
In a specific implementation process, the five-tuple information of the data packet includes a source IP address, a source port, a destination IP address, a destination port and a transport layer protocol.
By adopting the scheme, as the flow information generally covers various types of flow information, all data packets in the flow information are firstly split into independent flows, and the data of each data packet is read into the csv file according to the line after the extraction is completed.
Step S200, a first characteristic value of a data packet in a data stream is obtained, and a first mark or a second mark is given to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet, so as to obtain a second characteristic value;
in a specific implementation process, the first characteristic value may be a data length in a data packet or a data packet size.
In the implementation process, the first mark or the second mark can be a positive mark or a negative mark, and if the first mark is a positive mark, the second mark is a negative mark; if the first mark is a negative sign mark, the second mark is a positive sign mark.
In a specific implementation process, in the step of assigning the first flag or the second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the obtained csv file is read in the form of a dataframe.
Step S300, constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interaction diagram and a downlink flow interaction diagram based on the characteristic sequence, wherein the uplink flow interaction diagram and the downlink flow interaction diagram comprise a plurality of nodes;
in some embodiments of the present invention, in the step of constructing the upstream traffic interaction graph, an upstream feature sequence and a downstream feature sequence are constructed based on the feature sequence, and in the step of constructing the upstream feature sequence, a second feature value of a downstream data packet in the feature sequence is replaced with 0; in the step of constructing the downlink feature sequence, replacing the second feature value of the uplink data packet in the feature sequence with 0, and reserving the sequence of the original second feature value in the feature sequence.
In the implementation process, if the first mark is set to be a positive mark, the second mark is set to be a negative mark, the first mark is given to the first characteristic value of the uplink data packet, and the second mark is given to the first characteristic value of the downlink data packet; and if the feature sequence is-66, -60, +60, +1514, -183, the uplink feature sequence is 0, +60, +1514, 0, the downlink feature sequence is-66, -60, 0, -183, the third feature value of the uplink feature sequence is used as the node attribute of the node in the uplink flow interactive graph to construct the uplink flow interactive graph, and the third feature value of the downlink feature sequence is used as the node attribute of the node in the downlink flow interactive graph to construct the downlink flow interactive graph.
Step S400, performing graph integration on the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph respectively to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and S500, splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
In the implementation process, each joint integral graph corresponds to one data stream, the preset neural network classification model is a graph neural network model, and a detection result is output through a classifier of the graph neural network model, wherein the detection result can be a malicious stream or a non-malicious stream, a malicious stream type and the like.
In a specific implementation process, the neural network classification model may adopt a GCN classification model, a GIN classification model or a GAT classification model which are completed through training.
In the specific implementation process, the graph structure of the scheme is represented by a tree structure, and a Cross linked List data structure is adopted during storage, so that storage pressure is reduced, and the Cross linked List (Cross-linked List) structure is a storage structure of the graph in the bottom layer of the computer, so that the structure information of the graph can be easily read and related integral operation can be carried out on nodes through the structure.
By adopting the scheme, the whole detection process can be finished only by the first characteristic value of the data packet, fusion of multidimensional characteristics is not needed, and the calculation difficulty is reduced; the scheme classifies the data flow by constructing the joint integral graph, on one hand, the graph integral is applied to the detection of the data flow, the detection precision is improved, and on the other hand, the scheme integrates the graph of the original node attribute, fully considers the capability of transmitting information between nodes, and further improves the detection precision.
In some embodiments of the present invention, in the step of dividing the plurality of data packets into the plurality of data streams based on the five-tuple information of the data packets, a combination of the data packets conforming to any one of the conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address is taken as one data stream based on the source IP address and the destination IP address in the five-tuple information.
In a specific implementation process, one data stream includes an uplink data packet and a downlink data packet, and if the source IP address is a and the destination IP address is B, the source IP address is B and the destination IP address is a downlink data packet.
In the implementation process, if the data packet x1 exists, the source IP address is A, and the destination IP address is B; the data packet x2 exists, the source IP address is A, and the destination IP address is B; the data packet x3 exists, the source IP address is B, and the destination IP address is A; if there is a packet x4, the source IP address is B, and the destination IP address is a, the 4 packets conform to one of two conditions that the source IP address and the destination IP address are the same, or that the destination IP address is the same as the source IP address and the source IP address is the same as the destination IP address, and the combination of the 4 packets is used as one data stream.
In some embodiments of the present invention, in the step of obtaining the first characteristic value of the data packet in the data stream, the length of each data packet is obtained by parsing, and the length of the data packet is taken as the first characteristic value.
By adopting the scheme, the whole detection process can be completed only through one feature of the data packet, fusion of multidimensional features is not needed, and the calculation difficulty is reduced.
In some embodiments of the present invention, in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an uplink data packet or a downlink data packet, and obtaining the second feature value, the data packet is classified into an uplink data packet and a downlink data packet based on a source IP address and a destination IP address of the data packet in one data stream, and the first flag is assigned to the uplink data packet, and the second flag is assigned to the downlink data packet.
In some embodiments of the present invention, in the step of assigning a first flag or a second flag to the first feature value based on whether the data packet is an uplink data packet or a downlink data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
In a specific implementation process, the first characteristic value may be a data length in a data packet or a data packet size.
As shown in fig. 2, in some embodiments of the present invention, the step 300 includes a step S310 of constructing a second eigenvalue in the data stream as an eigenvalue sequence;
in the implementation process, the step of constructing the uplink flow interaction diagram and the downlink flow interaction diagram based on the feature sequence comprises the following steps:
step S320, an uplink feature sequence and a downlink feature sequence are constructed based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
in the specific implementation process, in the step of constructing the uplink feature sequence, replacing a second feature value of a downlink data packet in the feature sequence with 0; in the step of constructing the downlink feature sequence, replacing the second feature value of the uplink data packet in the feature sequence with 0, and reserving the sequence of the original second feature value in the feature sequence.
In the implementation process, if the first mark is set to be a positive mark, the second mark is set to be a negative mark, the first mark is given to the first characteristic value of the uplink data packet, and the second mark is given to the first characteristic value of the downlink data packet; and if the signature sequence is-66, -60, +60, +1514, -183, the upstream signature sequence is 0, +60, +1514, 0, and the downstream signature sequence is-66, -60, 0, -183.
Step S330, an uplink flow interactive graph is constructed by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and a downlink flow interactive graph is constructed by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
In the specific implementation process, the third characteristic value of the uplink characteristic sequence is used as the node attribute of the node in the uplink flow interactive graph, an edge is constructed between the corresponding nodes of the adjacent third characteristic values, the uplink flow interactive graph is obtained, the third characteristic value of the downlink characteristic sequence is used as the node attribute of the node in the downlink flow interactive graph, and an edge is constructed between the corresponding nodes of the adjacent third characteristic values, so that the downlink flow interactive graph is obtained.
By adopting the scheme, the uplink flow interaction diagram and the downlink flow interaction diagram are respectively constructed based on the characteristic values of the single characteristic of the data packet, the uplink flow interaction diagram and the downlink flow interaction diagram are respectively constructed on the basis of the characteristic sequences, the data are respectively embodied in the uplink direction and the downlink direction, and the accuracy of detecting the two directions when the data are finally detected is improved.
In some embodiments of the present invention, in the step of integrating the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of the nodes are updated based on the order of the nodes in the graph.
As shown in fig. 3 and 4, in the implementation process, if the node attributes of the nodes in the uplink traffic interaction graph are 0, +60, +1514, and 0, respectively, and the graph integration is performed on the uplink traffic interaction graph, in the first integration process for the uplink traffic interaction graph, the node attribute of the node after the first node is added to the node attribute of the original first node, the node attributes of the nodes are updated to 0, +60, +1514, and 0, in the second integration process, the node attribute of the node after the second node is added to the node attribute of the original second node, the node attribute of each node is updated to 0, +60, +1514, 0, and the node attribute of each node is updated to 0, +60, +1514, 0 by integrating the node attribute of each node for the third time, the node attribute updates for each node are 0, +60, +1574, +60, the node attribute of each node is updated to 0 in the fourth integration 0, +60, +1574, +60;
if the node attributes of the nodes in the downlink flow interactive graph are-66, -60, 0 and-183, adding the node attribute of the node after the first node to the node attribute of the original first node in the first integration process of the uplink flow interactive graph, updating the node attribute of each node to be-66, -132, -126, -66 and-249, adding the node attribute of the node after the second node to the node attribute of the original second node in the second integration process, node attribute updates for each node are-66, -132, -192, -132, -315, and in the same step, node attribute updates for each node are-66, -132, -192, -375, the node attribute updates for each node are-66, -132, -192, -375, and integrating node attribute updates of each node for the fifth time to be-66, -132, -192 and-375, and obtaining node attributes of each node in the downstream flow integrated graph to be-66, -132, -192 and-375.
By adopting the scheme, the scheme further integrates the uplink flow interaction diagram and the downlink flow interaction diagram, data in the two diagrams are improved to be richer, the scheme is combined with the topological structure of the diagram structure, the information transmission capability between nodes is further embodied, the path integration principle is applied to the diagram structure, and the accuracy of single-feature detection is improved by utilizing the information transmission capability between the nodes.
In some embodiments of the present invention, in the step of splicing the upstream flow integral graph and the downstream flow integral graph to obtain a joint integral graph, nodes in the upstream flow integral graph and the downstream flow integral graph are sequentially combined to obtain the joint integral graph.
In some embodiments of the present invention, as shown in fig. 5, if the node attribute of each node in the upstream traffic integral graph is 0, +60, +1574, and the node attribute of each node in the downstream traffic integral graph is-66, -132, -192, -375, and combining the node sequences in the uplink flow integral graph and the downlink flow integral graph to obtain a joint integral graph, wherein the node attribute of each node in the joint integral graph is 0, +60, +1574, -66, -132, -192, -375.
In a specific implementation process, the flow detection is performed by patterning the original flow interaction information and integrating the graph, namely performing discrete integration on nodes on the graph in the original flow interaction graph, so that an uplink flow integration graph and a downlink flow integration graph are constructed, and the detection capability of the graph integration can be improved on the basis of the original graph, so that the scheme achieves a better detection result on the premise of a small number of characteristics.
The solution combines interactions between neighborhood node features and MQTT and CoAP protocol based client servers to convert network flow classification problems to graph classification problems, allowing limited feature enhancement detection capabilities to be used.
In some embodiments of the present invention, in the step of inputting the joint integral map to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
In a specific implementation process, the scheme detects the original traffic through a single feature such as a data packet length, and in a previous deep learning detection method, only a few selected features are usually needed to classify malicious streams. In the scheme, the method is applied to the construction of the graph based on the data packet length sequence only, so that the detection capability of a single-feature-based method (such as the data packet length sequence) is enhanced.
In the specific implementation process, the existing flow detection mainly has the following problems:
1. existing studies using machine learning typically require fusion of multidimensional features, require computationally intensive feature processing (e.g., dynamic time warping of packet time series), but the use of some low-level features presents challenges for accurate detection.
2. The existing graph structure has large storage redundancy and brings great additional cost to calculation.
3. The existing path integration method only operates on the level of the original data, and cannot be combined with topological structures such as a graph structure. Thus, the ability to pass information between nodes is ignored, adversely affecting analysis and decision making of data with limited characteristics;
the invention provides a theory of graph integration, and applies the theory to network flow detection of the Internet of things and the Internet, thereby solving the problem of low accuracy of classifying the network flow by using a small amount of features. In the aspect of the storage of the graph structure, the structure is also a storage structure of the graph in the bottom layer of the computer, and the scheme can easily read the structural information of the graph and perform related integral operation on the nodes through the structure. Besides the common flow interactive graph, the scheme also provides a lightweight tree structure flow interactive graph representation mode, and the number of edges of the tree structure is smaller under the condition that the same nodes are found, so that the storage and calculation cost of a computer is reduced.
The beneficial effect of this scheme includes:
1. the scheme provides a theory of graph integration for the first time, and applies the theory to network flow detection of the Internet of things and the Internet, so that the problem of low accuracy of classifying by using a small number of features in network flow classification is solved. In the aspect of the storage of the graph structure, the structure is in the form of a cross linked list, and is also the storage structure of the graph in the bottom layer of the computer, so that the structure information of the graph can be easily read and the related integration operation can be carried out on the nodes through the structure. Besides the common flow interactive graph, a lightweight tree structure is provided for representing the flow interactive graph, and the number of edges of the tree structure is smaller under the condition that the same nodes are found, so that the storage and calculation cost of a computer is reduced.
2. The existing path integration method only operates on the level of the original data, and cannot be combined with topological structures such as a graph structure. Thus, the ability to pass information between nodes is ignored, adversely affecting the analysis and decision making of data with limited characteristics. The invention applies the path integral principle to the graph structure, and improves the accuracy of single feature detection by utilizing the capability of information transmission between nodes.
Experimental example:
in the experimental example, as shown in the following table one, GCN represents that the adopted graph neural network is a graph convolution network, GIN represents that the adopted graph neural network is a graph isomorphic network (Graph Isomorphism Network, GIN), GAT represents that the adopted graph neural network is a graph annotation meaning network (Graph Attention Network, GAT), gu represents that uplink flow interaction graph, gd represents that downlink flow interaction graph, guf represents that uplink flow integration graph, gdf represents that downlink flow integration graph, gu+gd represents that the uplink flow interaction graph and the downlink flow interaction graph are spliced as input to the graph neural network; guf+Gdf represents the spliced upstream flow integral graph and downstream flow integral graph which adopt the scheme as input and are input into the graph neural network.
In a specific implementation, to further investigate the usefulness of GIT in multi-class classification, we utilized the UNSW-NB15 dataset. To truly restore the ratio of malicious and benign streams, we extracted 2000 packets for each attack type from the dataset and mixed them with 20000 packets for benign streams. We have chosen only the sequence of packet lengths as the characteristic of the analysis. We constructed classification results based on the packet length sequences in LSTM and RNN and based on the length sequences in GCN, GIN and GAT.
List one
The original sequence used in this experiment was a sequence constructed based on the data length of the data packet, GCN, GIN, GAT and batch_size of LSTM in this experimental example were 128, the number of hidden layers was 64, and they were all quite similar.
As shown in table one, by comparing the embodiments, the classification accuracy of the manner of using the original sequence and inputting the original sequence to the LSTM neural network is the lowest; further, under the condition that different graph neural networks are adopted, the classification accuracy of the mode of splicing the uplink flow integrating graph and the downlink flow integrating graph is higher than that of the mode of correspondingly adopting the spliced uplink flow interaction graph and the downlink flow interaction graph as inputs and inputting the spliced uplink flow integrating graph and the downlink flow integrating graph into the graph neural network.
From experimental results, we can conclude that the present scheme is effective for multi-class classification. In addition, GIN classifier achieves 0.8557 accuracy in ten categories classification, significantly higher than the highest accuracy 0.6172 achieved when only packet length features are used, and higher than the highest accuracy 0.6634 with spliced upstream and downstream traffic interaction maps as input to the graph neural network. Furthermore, classification accuracy achieved using GIT in graph-based classification is superior to that achieved without GIT.
The embodiment of the invention also provides a single-feature encryption stream detection system based on graph integration, which comprises computer equipment, wherein the computer equipment comprises a processor and a memory, the memory is stored with computer instructions, the processor is used for executing the computer instructions stored in the memory, and the system realizes the steps realized by the method when the computer instructions are executed by the processor.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, is configured to implement the steps implemented by the single-feature encryption stream detection method based on graph integration. The computer readable storage medium may be a tangible storage medium such as Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, floppy disks, hard disk, a removable memory disk, a CD-ROM, or any other form of storage medium known in the art.
Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein can be implemented as hardware, software, or a combination of both. The particular implementation is hardware or software dependent on the specific application of the solution and the design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and shown, and those skilled in the art can make various changes, modifications and additions, or change the order between steps, after appreciating the spirit of the present invention.
In this disclosure, features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations can be made to the embodiments of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A single-feature encrypted stream detection method based on graph integration, characterized in that the method comprises the steps of:
acquiring flow information, wherein the flow information comprises a plurality of data packets, and dividing the data packets into a plurality of data streams based on quintuple information of the data packets;
acquiring a first characteristic value of a data packet in a data stream, and giving a first mark or a second mark to the first characteristic value based on whether the data packet is an uplink data packet or a downlink data packet to obtain a second characteristic value;
constructing a second characteristic value in the data stream as a characteristic sequence, and constructing an uplink flow interactive graph and a downlink flow interactive graph based on the characteristic sequence, wherein the uplink flow interactive graph and the downlink flow interactive graph comprise a plurality of nodes;
respectively integrating the uplink flow interactive graph and the downlink flow interactive graph based on node attributes of the nodes of the uplink flow interactive graph and the middle downlink flow interactive graph to obtain an uplink flow integrated graph and a downlink flow integrated graph;
and splicing the uplink flow integral graph and the downlink flow integral graph to obtain a combined integral graph, and inputting the combined integral graph into a preset neural network classification model to obtain a detection result for each data stream.
2. The map integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of dividing a plurality of data packets into a plurality of data streams based on five-tuple information of the data packets, a combination of data packets conforming to any one of conditions that a source IP address and a destination IP address are the same, or that a destination IP address is the same as the source IP address and a source IP address is the same as the destination IP address is taken as one data stream, based on the source IP address and the destination IP address in the five-tuple information.
3. The method for detecting a single-feature encrypted stream based on graph integration according to claim 1, wherein in the step of obtaining a first feature value of a packet in a data stream, a length of each packet is obtained by parsing, and the length of the packet is used as the first feature value.
4. The method according to claim 1, wherein in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an upstream data packet or a downstream data packet to obtain a second feature value, the data packet is classified into an upstream data packet and a downstream data packet based on a source IP address and a destination IP address of the data packet in one data stream, the first flag is assigned to the upstream data packet, and the second flag is assigned to the downstream data packet.
5. The method according to claim 1, wherein in the step of assigning a first flag or a second flag to the first feature value based on the data packet being an upstream data packet or a downstream data packet, the first flag is a positive sign flag and the second flag is a negative sign flag.
6. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein the step of constructing an upstream traffic interaction graph and a downstream traffic interaction graph based on the feature sequence comprises:
constructing an uplink feature sequence and a downlink feature sequence based on the feature sequence, wherein the uplink feature sequence and the downlink feature sequence both comprise third feature values with the same number as the second feature values of the feature sequence;
and constructing an uplink flow interactive graph by taking the third characteristic value of the uplink characteristic sequence as the node attribute of the node in the uplink flow interactive graph, and constructing a downlink flow interactive graph by taking the third characteristic value of the downlink characteristic sequence as the node attribute of the node in the downlink flow interactive graph.
7. The graph integration-based single-feature encrypted stream detection method according to any one of claims 1 to 6, wherein in the step of performing graph integration on the upstream traffic interaction graph and the downstream traffic interaction graph, node attributes of nodes are updated based on the order of the nodes in the graph.
8. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of splicing the upstream traffic integral graph and the downstream traffic integral graph to obtain a joint integral graph, nodes in the upstream traffic integral graph and the downstream traffic integral graph are sequentially combined to obtain the joint integral graph.
9. The graph integration-based single-feature encrypted stream detection method according to claim 1, wherein in the step of inputting the joint integration graph to a predetermined neural network classification model to obtain a detection result for each data stream, the predetermined neural network classification model adopts a GCN classification model, a GIN classification model, or a GAT classification model.
10. A graph integration based single feature encrypted stream detection system comprising a computer device including a processor and a memory, the memory having stored therein computer instructions for executing the computer instructions stored in the memory, the system implementing the steps implemented by the method according to any one of claims 1-9 when the computer instructions are executed by the processor.
CN202311133687.1A 2023-09-05 2023-09-05 Single-feature encryption stream detection method and system based on graph integration Active CN116886637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311133687.1A CN116886637B (en) 2023-09-05 2023-09-05 Single-feature encryption stream detection method and system based on graph integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311133687.1A CN116886637B (en) 2023-09-05 2023-09-05 Single-feature encryption stream detection method and system based on graph integration

Publications (2)

Publication Number Publication Date
CN116886637A CN116886637A (en) 2023-10-13
CN116886637B true CN116886637B (en) 2023-12-19

Family

ID=88262445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311133687.1A Active CN116886637B (en) 2023-09-05 2023-09-05 Single-feature encryption stream detection method and system based on graph integration

Country Status (1)

Country Link
CN (1) CN116886637B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113795773A (en) * 2019-03-08 2021-12-14 欧司朗股份有限公司 Component for a LIDAR sensor system, LIDAR sensor device, method for a LIDAR sensor system and method for a LIDAR sensor device
CN114615093A (en) * 2022-05-11 2022-06-10 南京信息工程大学 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN115242496A (en) * 2022-07-20 2022-10-25 安徽工业大学 Tor encrypted traffic application behavior classification method and device based on residual error network
CN115303901A (en) * 2022-08-05 2022-11-08 北京航空航天大学 Elevator traffic flow identification method based on computer vision

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11927965B2 (en) * 2016-02-29 2024-03-12 AI Incorporated Obstacle recognition method for autonomous robots

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113795773A (en) * 2019-03-08 2021-12-14 欧司朗股份有限公司 Component for a LIDAR sensor system, LIDAR sensor device, method for a LIDAR sensor system and method for a LIDAR sensor device
CN114615093A (en) * 2022-05-11 2022-06-10 南京信息工程大学 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN115242496A (en) * 2022-07-20 2022-10-25 安徽工业大学 Tor encrypted traffic application behavior classification method and device based on residual error network
CN115303901A (en) * 2022-08-05 2022-11-08 北京航空航天大学 Elevator traffic flow identification method based on computer vision

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Evolutionary Algorithm-Based and Network Architecture Search-Enabled Multiobjective Traffic Classification;XIAOJUAN WANG等;SPECIAL SECTION ON INTELLIGENT BIG DATA ANALYTICS FOR INTERNET OF THINGS, SERVICES AND PEOPLE;全文 *
骨干网络中RoQ攻击的监测、定位和识别;文坤;杨家海;程凤娟;尹辉;王健峰;;计算机研究与发展(第04期);全文 *

Also Published As

Publication number Publication date
CN116886637A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
Bhat et al. Var-CNN: A data-efficient website fingerprinting attack based on deep learning
CN111191767B (en) Vectorization-based malicious traffic attack type judging method
CN107967311B (en) Method and device for classifying network data streams
CN109154938B (en) Classifying entities in a digital graph using discrete non-trace location data
US11200257B2 (en) Classifying social media users
CN113989583A (en) Method and system for detecting malicious traffic of internet
CN113821793B (en) Multi-stage attack scene construction method and system based on graph convolution neural network
CN106254321A (en) A kind of whole network abnormal data stream sorting technique
CN107483451B (en) Method and system for processing network security data based on serial-parallel structure and social network
CN114422211B (en) HTTP malicious traffic detection method and device based on graph attention network
CN113328985A (en) Passive Internet of things equipment identification method, system, medium and equipment
CN107977592A (en) A kind of image text detection method and system, user terminal and server
CN111526099A (en) Internet of things application flow detection method based on deep learning
CN103973589A (en) Network traffic classification method and device
CN115426137A (en) Malicious encrypted network flow detection tracing method and system
CN114500396A (en) MFD chromatographic characteristic extraction method and system for distinguishing anonymous Tor application flow
CN112468324B (en) Graph convolution neural network-based encrypted traffic classification method and device
CN108199878B (en) Personal identification information identification system and method in high-performance IP network
CN116886637B (en) Single-feature encryption stream detection method and system based on graph integration
Hossain et al. A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection
Ma et al. GraphNEI: A GNN-based network entity identification method for IP geolocation
CN109327404B (en) P2P prediction method and system based on naive Bayes classification algorithm, server and medium
CN109272005B (en) Identification rule generation method and device and deep packet inspection equipment
CN116094971A (en) Industrial control protocol identification method and device, electronic equipment and storage medium
CN104753934A (en) Method for separating known protocol multi-communication-parties data stream into point-to-point data stream

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant