CN116132095A - Hidden malicious traffic detection method integrating statistical features and graph structural features - Google Patents
Hidden malicious traffic detection method integrating statistical features and graph structural features Download PDFInfo
- Publication number
- CN116132095A CN116132095A CN202211477370.5A CN202211477370A CN116132095A CN 116132095 A CN116132095 A CN 116132095A CN 202211477370 A CN202211477370 A CN 202211477370A CN 116132095 A CN116132095 A CN 116132095A
- Authority
- CN
- China
- Prior art keywords
- graph
- flow
- data
- stream
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a hidden malicious flow detection method integrating statistical features and graph structural features, which comprises the following steps: monitoring gateway flow, aggregating data packets with the same source and destination addresses into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream; extracting the packet-by-packet characteristics of the data packets in each data stream, generating a stream characteristic histogram from the packet-by-packet characteristic set in the data stream, and converting the stream characteristic histograms with different lengths into stream characteristic vectors with equal lengths; converting the flow interactive graph into a new graph structure-flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute; training a flow association graph by using a graph convolution neural network, and finally identifying hidden malicious traffic. By the method, the malicious concealed flow can be efficiently and accurately detected, and less time and space can be consumed while the security is high.
Description
Technical Field
The invention relates to the technical field of network security, in particular to a hidden malicious traffic detection method integrating statistical features and graph structural features.
Background
With the continuous development of network communication technology, various infrastructures in life are informationized and digitized, and are transmitted and controlled through network traffic, new network attacks are continuously emerging, and the attacks threaten the security of military fields, industrial fields and infrastructures, and the scale of the attacks is gradually expanding.
The novel network attack flow generally has the following characteristics:
concealment: the novel network flow is generally encrypted and disguised as normal flow, the data packet content cannot be directly obtained, and the detection difficulty is greatly improved by performing measures such as rule matching, natural language processing and the like on the data packet content;
slowness: an attacker often slowly attacks so that the attack traffic can be submerged in the normal traffic, making it difficult for an defender to distinguish the hidden malicious traffic from the normal traffic where the data volume is large.
How to detect hidden malicious traffic has become a popular research direction for the last two years. Since the current attack traffic is usually encrypted traffic, the detection of the encrypted traffic is usually performed by a detection method based on statistical features or based on traffic interaction diagrams, the method based on the statistical features of the traffic only focuses on the intrinsic features of the traffic, and due to the slowness of the attack traffic, the normal data stream and the malicious data stream are difficult to distinguish based on the statistical features of the traffic, the time and the memory required by classification based on various statistical features combined with machine learning are relatively large, and meanwhile, the information of some data packet level features is lost, so that the network environment with high bandwidth and high security requirements cannot be met; the detection method based on the flow interaction diagram focuses on the structural characteristics of the flow, ignores the characteristics of the flow, and cannot accurately detect malicious hidden flow. The above malicious concealed traffic detection scheme is therefore not satisfactory for detecting novel concealed traffic.
Based on the defects and shortcomings, the prior art needs to be improved, and a hidden malicious flow detection method integrating statistical features and graph structural features is designed.
Disclosure of Invention
The invention mainly solves the technical problem of providing the hidden malicious flow detection method integrating the statistical characteristics and the graphic structural characteristics, which can efficiently and accurately detect malicious hidden flow and consume less time and space while having high safety.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for detecting the hidden malicious traffic by combining the statistical features and the graph structural features comprises the following steps:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
Preferably, in step S1, a traffic monitoring device is disposed at the gateway, and monitors traffic inside the local area network.
Preferably, in step S1, the data packets monitored in the period are aggregated into a stream according to the source IP and the destination IP.
Preferably, in step S2, the data flow is subjected to data packet granularity feature extraction, and then the data packet granularity feature vector is converted into a flow feature vector with a fixed length, and the flow feature vector is input into a graph convolution neural network for learning and training, so as to finally obtain a classification result.
Preferably, in step S3, the edges in the traffic interaction graph are converted into nodes in the traffic association graph, whether the two data flow nodes are connected is judged by whether the two data flows are communicated with the same host, the traffic association graph is input into the graph convolutional neural network for training, the hidden layer clusters the features of the collar nodes, and finally the softmax fully connected layer is connected to complete the classification of the nodes.
Preferably, the data packets are parsed, four types of characteristics of the length, the protocol, the time interval between the data packets and the last data packet and the port number of each data packet are extracted, the characteristics of each data packet in the data stream are recorded, and the time stamp and the duration of the start of a link are saved.
Preferably, the processing of the granularity characteristic of the data packet is performed, firstly, counting the characteristic values in the data stream, respectively counting different characteristics, generating four histograms for each data stream, and the length of the histograms is unpredictable because the number of characteristic values of each characteristic cannot be predicted, so that the length of the histograms needs to be unified.
And converting the characteristic histograms with the indefinite length into characteristic vectors with the definite length by using a sensitive Hash function, respectively calculating the sensitive Hash values of different characteristic histograms, and approximating the Jaccard similarity between the histograms by the Jaccard similarity between the sensitive Hash values of two different histograms.
Preferably, in step S4, the graph rolling neural network can train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, while the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
Compared with the prior art, the invention has the beneficial effects that:
the flow statistics characteristics and the structural characteristics are fused, the flow statistics characteristics represent the characteristics of data flow between two hosts, the structural characteristics are flow interaction characteristics in a local area network, the flow statistics characteristics are focused on data packets, the flow interaction characteristics are focused on global structural characteristics, at least one characteristic between normal flow and malicious flow is different, and the map convolution neural network can fuse the flow statistics characteristics and the structural characteristics by utilizing the transformation of a flow interaction map, so that the difference between the malicious flow and the normal flow is more comprehensively learned and hidden;
the data packet characteristics are counted and recorded packet by packet, and through the transformation of the histogram and the stream characteristic vector, the data packet level characteristics of the stream are completely recorded in the stream characteristic vector with fixed length, so that the information loss of the manually constructed statistical characteristics is reduced, and the fixed length characteristic vector also ensures the consumption of detection time and space.
Drawings
FIG. 1 is a system architecture diagram of a hidden malicious traffic detection method that incorporates statistical features and graph structural features.
Fig. 2 is a flow chart of a method for detecting hidden malicious traffic by combining statistical features and graph structural features.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to fig. 1 and 2, an embodiment of the present invention includes:
the method for detecting the hidden malicious traffic integrating the statistical features and the graph structural features comprises the following steps of:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
In step S1, a flow monitoring device is arranged at a gateway to monitor the flow in the local area network.
In step S1, data packets monitored in the time period are aggregated into streams according to the source IP and the destination IP, one data stream is used as one side in the traffic interaction graph, the host is used as one node of the traffic interaction graph, the traffic interaction graph contains structure information of the data stream, the attack behavior generally has an attack chain, the attacker can generate abnormal lateral movement in the local area network so as to infect other hosts, so that the structure of part of attack traffic is different from the structure of normal traffic, malicious traffic can be distinguished by learning the structure of traffic, and the invention only distinguishes each node by using IP, because if nodes are distinguished by ports and IP at the same time, random port rules are adopted by some programs to cause certain influence on detection.
In step S2, extracting data packet granularity characteristic of the data stream, converting the data packet granularity characteristic vector into a fixed-length stream characteristic vector, so that the data stream characteristic can be learned by the graph rolling neural network, and the stream characteristic vectors corresponding to the similar stream characteristic vectors are also similar and are input into the graph rolling neural network for learning and training, so that a classification result is finally obtained.
In step S3, the edges in the flow interaction graph are converted into nodes in the flow association graph, whether the two data flow nodes are connected is judged by whether the two data flows are communicated with the same host, if the two data flows can be connected to the same host in the flow interaction graph, the two data flows are connected in the flow association graph, the transition of the graph can enable the flow feature vector serving as the edge attribute in the flow interaction graph to become the node attribute, so that training of the graph convolution neural network can be participated, the flow association graph is input into the graph convolution neural network for training, the flow feature vector is learned by the input layer to the hidden layer, the information of the nodes is continuously stacked, and the last layer predicts the result by using the softmax classifier.
Analyzing the data packets, extracting the length, protocol, time interval and port number of each data packet, recording the characteristics of each data packet in the data stream, and storing the time stamp and duration of the start of a link.
The method comprises the steps of processing granularity characteristics of data packets, firstly, counting characteristic values in data streams, respectively counting different characteristics, generating four histograms for each data stream, and unifying the lengths of the histograms because the lengths of the characteristic values of the characteristics cannot be predicted.
The sensitive Hash function is used for converting the characteristic histogram with the indefinite length into the characteristic vector with the definite length, the sensitive Hash value calculation is respectively carried out on different characteristic histograms, and the Jaccard similarity between the sensitive Hash values of the two different histograms approximates to the Jaccard similarity between the histograms, so the invention uses the sensitive Hash value to approximately replace the histograms for training.
In step S4, the graph rolling neural network may train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, and the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
According to the hidden malicious flow detection method integrating the statistical features and the graph structural features, the flow feature vector is formed by processing the features of the granularity of the data packet, so that the information of the data flow is more comprehensively reserved, the feature vector with a fixed length is generated by calculating the sensitive Hash function, the consumption of space and time is reduced, and the detection cost can be reduced on the premise that the information is not lost by the feature extraction method; through the transition of the flow interaction diagram and the flow association diagram, the diagram convolution neural network can integrate the characteristics of two dimensions of the flow characteristics and the structural characteristics, and the detection accuracy is increased.
The method integrates the characteristics of the data stream and the flow graph, combines the characteristics of the stream and the global structure characteristics, and improves the detection accuracy; method for improving manual construction of flow statistics features, using flow feature histogram as medium to convert packet-by-packet features in flow into fixed-length flow feature vector, reducing time and space consumption while maintaining feature information comprehensiveness to a greater extent
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.
Claims (9)
1. A hidden malicious flow detection method integrating statistical features and graph structural features is characterized in that: the method comprises the following steps:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
2. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: in step S1, a flow monitoring device is arranged at a gateway to monitor the flow in the local area network.
3. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: the data packets monitored in the time period are aggregated into a stream according to the source IP and the destination IP in step S1.
4. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: and S2, extracting data packet granularity characteristics of the data stream, converting the data packet granularity characteristic vectors into stream characteristic vectors with fixed length, inputting the stream characteristic vectors into a graph convolution neural network for learning and training, and finally obtaining a classification result.
5. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: and step S3, converting edges in the flow interaction graph into nodes in the flow association graph, judging whether the two data flow nodes are connected or not by judging whether the two data flows are communicated with the same host computer, inputting the flow association graph into a graph convolution neural network for training, wherein a hidden layer clusters the characteristics of the collar nodes, and finally connecting a softmax full-connection layer to finish the classification of the nodes.
6. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 4, wherein the method comprises the following steps: analyzing the data packets, extracting the length, protocol, time interval and port number of each data packet, recording the characteristics of each data packet in the data stream, and storing the time stamp and duration of the start of a link.
7. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 4, wherein the method comprises the following steps: the method comprises the steps of processing granularity characteristics of data packets, firstly, counting characteristic values in data streams, respectively counting different characteristics, generating four histograms for each data stream, and unifying the lengths of the histograms because the lengths of the characteristic values of the characteristics cannot be predicted.
8. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 7, wherein the method comprises the following steps: and converting the characteristic histograms with the indefinite length into characteristic vectors with the definite length by using a sensitive Hash function, respectively calculating the sensitive Hash values of different characteristic histograms, and approximating the Jaccard similarity between the histograms by the Jaccard similarity between the sensitive Hash values of two different histograms.
9. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: in step S4, the graph rolling neural network may train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, and the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211477370.5A CN116132095A (en) | 2022-11-23 | 2022-11-23 | Hidden malicious traffic detection method integrating statistical features and graph structural features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211477370.5A CN116132095A (en) | 2022-11-23 | 2022-11-23 | Hidden malicious traffic detection method integrating statistical features and graph structural features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116132095A true CN116132095A (en) | 2023-05-16 |
Family
ID=86299882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211477370.5A Pending CN116132095A (en) | 2022-11-23 | 2022-11-23 | Hidden malicious traffic detection method integrating statistical features and graph structural features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116132095A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116743508A (en) * | 2023-08-15 | 2023-09-12 | 四川新立高科科技有限公司 | Method, device, equipment and medium for detecting network attack chain of power system |
-
2022
- 2022-11-23 CN CN202211477370.5A patent/CN116132095A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116743508A (en) * | 2023-08-15 | 2023-09-12 | 四川新立高科科技有限公司 | Method, device, equipment and medium for detecting network attack chain of power system |
CN116743508B (en) * | 2023-08-15 | 2023-11-14 | 四川新立高科科技有限公司 | Method, device, equipment and medium for detecting network attack chain of power system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110011999B (en) | IPv6 network DDoS attack detection system and method based on deep learning | |
CN109960729B (en) | Method and system for detecting HTTP malicious traffic | |
CN109450842B (en) | Network malicious behavior recognition method based on neural network | |
CN111935170B (en) | Network abnormal flow detection method, device and equipment | |
CN106713371B (en) | Fast Flux botnet detection method based on DNS abnormal mining | |
CN109067586B (en) | DDoS attack detection method and device | |
CN107040517B (en) | Cognitive intrusion detection method oriented to cloud computing environment | |
CN113079143A (en) | Flow data-based anomaly detection method and system | |
CN109450721B (en) | Network abnormal behavior identification method based on deep neural network | |
CN110611640A (en) | DNS protocol hidden channel detection method based on random forest | |
CN102420723A (en) | Anomaly detection method for various kinds of intrusion | |
CN109218321A (en) | A kind of network inbreak detection method and system | |
KR20210115991A (en) | Method and apparatus for detecting network anomaly using analyzing time-series data | |
CN110768946A (en) | Industrial control network intrusion detection system and method based on bloom filter | |
CN111245784A (en) | Method for multi-dimensional detection of malicious domain name | |
CN113821793A (en) | Multi-stage attack scene construction method and system based on graph convolution neural network | |
Sheikh et al. | Procedures, criteria, and machine learning techniques for network traffic classification: a survey | |
Patcha et al. | Network anomaly detection with incomplete audit data | |
Feng et al. | Towards learning-based, content-agnostic detection of social bot traffic | |
CN114650229B (en) | Network encryption traffic classification method and system based on three-layer model SFTF-L | |
Feng et al. | BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows | |
CN116132095A (en) | Hidden malicious traffic detection method integrating statistical features and graph structural features | |
Chiu et al. | Semi-supervised learning for false alarm reduction | |
Hsupeng et al. | Explainable malware detection using predefined network flow | |
Al-Fawa'reh et al. | Detecting stealth-based attacks in large campus networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |