CN116132095A - Hidden malicious traffic detection method integrating statistical features and graph structural features - Google Patents

Hidden malicious traffic detection method integrating statistical features and graph structural features Download PDF

Info

Publication number
CN116132095A
CN116132095A CN202211477370.5A CN202211477370A CN116132095A CN 116132095 A CN116132095 A CN 116132095A CN 202211477370 A CN202211477370 A CN 202211477370A CN 116132095 A CN116132095 A CN 116132095A
Authority
CN
China
Prior art keywords
graph
flow
data
stream
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211477370.5A
Other languages
Chinese (zh)
Inventor
卢功利
孙辉
周国栋
赵益平
郑康锋
武斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunshan jiuhua electronic equipment factory
Beijing University of Posts and Telecommunications
Original Assignee
Kunshan jiuhua electronic equipment factory
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunshan jiuhua electronic equipment factory, Beijing University of Posts and Telecommunications filed Critical Kunshan jiuhua electronic equipment factory
Priority to CN202211477370.5A priority Critical patent/CN116132095A/en
Publication of CN116132095A publication Critical patent/CN116132095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a hidden malicious flow detection method integrating statistical features and graph structural features, which comprises the following steps: monitoring gateway flow, aggregating data packets with the same source and destination addresses into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream; extracting the packet-by-packet characteristics of the data packets in each data stream, generating a stream characteristic histogram from the packet-by-packet characteristic set in the data stream, and converting the stream characteristic histograms with different lengths into stream characteristic vectors with equal lengths; converting the flow interactive graph into a new graph structure-flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute; training a flow association graph by using a graph convolution neural network, and finally identifying hidden malicious traffic. By the method, the malicious concealed flow can be efficiently and accurately detected, and less time and space can be consumed while the security is high.

Description

Hidden malicious traffic detection method integrating statistical features and graph structural features
Technical Field
The invention relates to the technical field of network security, in particular to a hidden malicious traffic detection method integrating statistical features and graph structural features.
Background
With the continuous development of network communication technology, various infrastructures in life are informationized and digitized, and are transmitted and controlled through network traffic, new network attacks are continuously emerging, and the attacks threaten the security of military fields, industrial fields and infrastructures, and the scale of the attacks is gradually expanding.
The novel network attack flow generally has the following characteristics:
concealment: the novel network flow is generally encrypted and disguised as normal flow, the data packet content cannot be directly obtained, and the detection difficulty is greatly improved by performing measures such as rule matching, natural language processing and the like on the data packet content;
slowness: an attacker often slowly attacks so that the attack traffic can be submerged in the normal traffic, making it difficult for an defender to distinguish the hidden malicious traffic from the normal traffic where the data volume is large.
How to detect hidden malicious traffic has become a popular research direction for the last two years. Since the current attack traffic is usually encrypted traffic, the detection of the encrypted traffic is usually performed by a detection method based on statistical features or based on traffic interaction diagrams, the method based on the statistical features of the traffic only focuses on the intrinsic features of the traffic, and due to the slowness of the attack traffic, the normal data stream and the malicious data stream are difficult to distinguish based on the statistical features of the traffic, the time and the memory required by classification based on various statistical features combined with machine learning are relatively large, and meanwhile, the information of some data packet level features is lost, so that the network environment with high bandwidth and high security requirements cannot be met; the detection method based on the flow interaction diagram focuses on the structural characteristics of the flow, ignores the characteristics of the flow, and cannot accurately detect malicious hidden flow. The above malicious concealed traffic detection scheme is therefore not satisfactory for detecting novel concealed traffic.
Based on the defects and shortcomings, the prior art needs to be improved, and a hidden malicious flow detection method integrating statistical features and graph structural features is designed.
Disclosure of Invention
The invention mainly solves the technical problem of providing the hidden malicious flow detection method integrating the statistical characteristics and the graphic structural characteristics, which can efficiently and accurately detect malicious hidden flow and consume less time and space while having high safety.
In order to solve the technical problems, the invention adopts a technical scheme that: the method for detecting the hidden malicious traffic by combining the statistical features and the graph structural features comprises the following steps:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
Preferably, in step S1, a traffic monitoring device is disposed at the gateway, and monitors traffic inside the local area network.
Preferably, in step S1, the data packets monitored in the period are aggregated into a stream according to the source IP and the destination IP.
Preferably, in step S2, the data flow is subjected to data packet granularity feature extraction, and then the data packet granularity feature vector is converted into a flow feature vector with a fixed length, and the flow feature vector is input into a graph convolution neural network for learning and training, so as to finally obtain a classification result.
Preferably, in step S3, the edges in the traffic interaction graph are converted into nodes in the traffic association graph, whether the two data flow nodes are connected is judged by whether the two data flows are communicated with the same host, the traffic association graph is input into the graph convolutional neural network for training, the hidden layer clusters the features of the collar nodes, and finally the softmax fully connected layer is connected to complete the classification of the nodes.
Preferably, the data packets are parsed, four types of characteristics of the length, the protocol, the time interval between the data packets and the last data packet and the port number of each data packet are extracted, the characteristics of each data packet in the data stream are recorded, and the time stamp and the duration of the start of a link are saved.
Preferably, the processing of the granularity characteristic of the data packet is performed, firstly, counting the characteristic values in the data stream, respectively counting different characteristics, generating four histograms for each data stream, and the length of the histograms is unpredictable because the number of characteristic values of each characteristic cannot be predicted, so that the length of the histograms needs to be unified.
And converting the characteristic histograms with the indefinite length into characteristic vectors with the definite length by using a sensitive Hash function, respectively calculating the sensitive Hash values of different characteristic histograms, and approximating the Jaccard similarity between the histograms by the Jaccard similarity between the sensitive Hash values of two different histograms.
Preferably, in step S4, the graph rolling neural network can train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, while the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
Compared with the prior art, the invention has the beneficial effects that:
the flow statistics characteristics and the structural characteristics are fused, the flow statistics characteristics represent the characteristics of data flow between two hosts, the structural characteristics are flow interaction characteristics in a local area network, the flow statistics characteristics are focused on data packets, the flow interaction characteristics are focused on global structural characteristics, at least one characteristic between normal flow and malicious flow is different, and the map convolution neural network can fuse the flow statistics characteristics and the structural characteristics by utilizing the transformation of a flow interaction map, so that the difference between the malicious flow and the normal flow is more comprehensively learned and hidden;
the data packet characteristics are counted and recorded packet by packet, and through the transformation of the histogram and the stream characteristic vector, the data packet level characteristics of the stream are completely recorded in the stream characteristic vector with fixed length, so that the information loss of the manually constructed statistical characteristics is reduced, and the fixed length characteristic vector also ensures the consumption of detection time and space.
Drawings
FIG. 1 is a system architecture diagram of a hidden malicious traffic detection method that incorporates statistical features and graph structural features.
Fig. 2 is a flow chart of a method for detecting hidden malicious traffic by combining statistical features and graph structural features.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to fig. 1 and 2, an embodiment of the present invention includes:
the method for detecting the hidden malicious traffic integrating the statistical features and the graph structural features comprises the following steps of:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
In step S1, a flow monitoring device is arranged at a gateway to monitor the flow in the local area network.
In step S1, data packets monitored in the time period are aggregated into streams according to the source IP and the destination IP, one data stream is used as one side in the traffic interaction graph, the host is used as one node of the traffic interaction graph, the traffic interaction graph contains structure information of the data stream, the attack behavior generally has an attack chain, the attacker can generate abnormal lateral movement in the local area network so as to infect other hosts, so that the structure of part of attack traffic is different from the structure of normal traffic, malicious traffic can be distinguished by learning the structure of traffic, and the invention only distinguishes each node by using IP, because if nodes are distinguished by ports and IP at the same time, random port rules are adopted by some programs to cause certain influence on detection.
In step S2, extracting data packet granularity characteristic of the data stream, converting the data packet granularity characteristic vector into a fixed-length stream characteristic vector, so that the data stream characteristic can be learned by the graph rolling neural network, and the stream characteristic vectors corresponding to the similar stream characteristic vectors are also similar and are input into the graph rolling neural network for learning and training, so that a classification result is finally obtained.
In step S3, the edges in the flow interaction graph are converted into nodes in the flow association graph, whether the two data flow nodes are connected is judged by whether the two data flows are communicated with the same host, if the two data flows can be connected to the same host in the flow interaction graph, the two data flows are connected in the flow association graph, the transition of the graph can enable the flow feature vector serving as the edge attribute in the flow interaction graph to become the node attribute, so that training of the graph convolution neural network can be participated, the flow association graph is input into the graph convolution neural network for training, the flow feature vector is learned by the input layer to the hidden layer, the information of the nodes is continuously stacked, and the last layer predicts the result by using the softmax classifier.
Analyzing the data packets, extracting the length, protocol, time interval and port number of each data packet, recording the characteristics of each data packet in the data stream, and storing the time stamp and duration of the start of a link.
The method comprises the steps of processing granularity characteristics of data packets, firstly, counting characteristic values in data streams, respectively counting different characteristics, generating four histograms for each data stream, and unifying the lengths of the histograms because the lengths of the characteristic values of the characteristics cannot be predicted.
The sensitive Hash function is used for converting the characteristic histogram with the indefinite length into the characteristic vector with the definite length, the sensitive Hash value calculation is respectively carried out on different characteristic histograms, and the Jaccard similarity between the sensitive Hash values of the two different histograms approximates to the Jaccard similarity between the histograms, so the invention uses the sensitive Hash value to approximately replace the histograms for training.
In step S4, the graph rolling neural network may train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, and the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
According to the hidden malicious flow detection method integrating the statistical features and the graph structural features, the flow feature vector is formed by processing the features of the granularity of the data packet, so that the information of the data flow is more comprehensively reserved, the feature vector with a fixed length is generated by calculating the sensitive Hash function, the consumption of space and time is reduced, and the detection cost can be reduced on the premise that the information is not lost by the feature extraction method; through the transition of the flow interaction diagram and the flow association diagram, the diagram convolution neural network can integrate the characteristics of two dimensions of the flow characteristics and the structural characteristics, and the detection accuracy is increased.
The method integrates the characteristics of the data stream and the flow graph, combines the characteristics of the stream and the global structure characteristics, and improves the detection accuracy; method for improving manual construction of flow statistics features, using flow feature histogram as medium to convert packet-by-packet features in flow into fixed-length flow feature vector, reducing time and space consumption while maintaining feature information comprehensiveness to a greater extent
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims (9)

1. A hidden malicious flow detection method integrating statistical features and graph structural features is characterized in that: the method comprises the following steps:
step S1: monitoring gateway flow, aggregating data packets into data streams, and constructing a flow interaction diagram, wherein nodes in the diagram represent a host, and one side represents a data stream;
step S2: carrying out data packet level feature extraction on data packets in each data stream, generating a stream feature histogram by a packet-by-packet feature set in the data stream, and converting the stream feature histograms with different lengths into stream feature vectors with equal lengths;
step S3: converting the flow interactive graph into a flow association graph according to the relation among the data flows of each node, wherein the graph takes the flow as a node and takes the flow characteristic vector as a node attribute;
step S4: training data flow characteristics and graph structure characteristics by using the graph convolution neural network, and finally identifying hidden malicious traffic.
2. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: in step S1, a flow monitoring device is arranged at a gateway to monitor the flow in the local area network.
3. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: the data packets monitored in the time period are aggregated into a stream according to the source IP and the destination IP in step S1.
4. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: and S2, extracting data packet granularity characteristics of the data stream, converting the data packet granularity characteristic vectors into stream characteristic vectors with fixed length, inputting the stream characteristic vectors into a graph convolution neural network for learning and training, and finally obtaining a classification result.
5. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: and step S3, converting edges in the flow interaction graph into nodes in the flow association graph, judging whether the two data flow nodes are connected or not by judging whether the two data flows are communicated with the same host computer, inputting the flow association graph into a graph convolution neural network for training, wherein a hidden layer clusters the characteristics of the collar nodes, and finally connecting a softmax full-connection layer to finish the classification of the nodes.
6. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 4, wherein the method comprises the following steps: analyzing the data packets, extracting the length, protocol, time interval and port number of each data packet, recording the characteristics of each data packet in the data stream, and storing the time stamp and duration of the start of a link.
7. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 4, wherein the method comprises the following steps: the method comprises the steps of processing granularity characteristics of data packets, firstly, counting characteristic values in data streams, respectively counting different characteristics, generating four histograms for each data stream, and unifying the lengths of the histograms because the lengths of the characteristic values of the characteristics cannot be predicted.
8. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 7, wherein the method comprises the following steps: and converting the characteristic histograms with the indefinite length into characteristic vectors with the definite length by using a sensitive Hash function, respectively calculating the sensitive Hash values of different characteristic histograms, and approximating the Jaccard similarity between the histograms by the Jaccard similarity between the sensitive Hash values of two different histograms.
9. The method for detecting hidden malicious traffic by combining statistical features and graph structural features according to claim 1, wherein the method comprises the following steps: in step S4, the graph rolling neural network may train the feature matrix and the graph structure at the same time, but only trains the feature matrix of the node, and the current flow feature vector represents the edge feature attribute, so as to convert the edge feature into the node feature, convert the flow interaction graph into the flow association graph, and then input the feature vector and the flow association graph together into the graph rolling neural network for training.
CN202211477370.5A 2022-11-23 2022-11-23 Hidden malicious traffic detection method integrating statistical features and graph structural features Pending CN116132095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211477370.5A CN116132095A (en) 2022-11-23 2022-11-23 Hidden malicious traffic detection method integrating statistical features and graph structural features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211477370.5A CN116132095A (en) 2022-11-23 2022-11-23 Hidden malicious traffic detection method integrating statistical features and graph structural features

Publications (1)

Publication Number Publication Date
CN116132095A true CN116132095A (en) 2023-05-16

Family

ID=86299882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211477370.5A Pending CN116132095A (en) 2022-11-23 2022-11-23 Hidden malicious traffic detection method integrating statistical features and graph structural features

Country Status (1)

Country Link
CN (1) CN116132095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743508A (en) * 2023-08-15 2023-09-12 四川新立高科科技有限公司 Method, device, equipment and medium for detecting network attack chain of power system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743508A (en) * 2023-08-15 2023-09-12 四川新立高科科技有限公司 Method, device, equipment and medium for detecting network attack chain of power system
CN116743508B (en) * 2023-08-15 2023-11-14 四川新立高科科技有限公司 Method, device, equipment and medium for detecting network attack chain of power system

Similar Documents

Publication Publication Date Title
CN110011999B (en) IPv6 network DDoS attack detection system and method based on deep learning
CN109960729B (en) Method and system for detecting HTTP malicious traffic
CN109450842B (en) Network malicious behavior recognition method based on neural network
CN111935170B (en) Network abnormal flow detection method, device and equipment
CN106713371B (en) Fast Flux botnet detection method based on DNS abnormal mining
CN109067586B (en) DDoS attack detection method and device
CN107040517B (en) Cognitive intrusion detection method oriented to cloud computing environment
CN113079143A (en) Flow data-based anomaly detection method and system
CN109450721B (en) Network abnormal behavior identification method based on deep neural network
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN102420723A (en) Anomaly detection method for various kinds of intrusion
CN109218321A (en) A kind of network inbreak detection method and system
KR20210115991A (en) Method and apparatus for detecting network anomaly using analyzing time-series data
CN110768946A (en) Industrial control network intrusion detection system and method based on bloom filter
CN111245784A (en) Method for multi-dimensional detection of malicious domain name
CN113821793A (en) Multi-stage attack scene construction method and system based on graph convolution neural network
Sheikh et al. Procedures, criteria, and machine learning techniques for network traffic classification: a survey
Patcha et al. Network anomaly detection with incomplete audit data
Feng et al. Towards learning-based, content-agnostic detection of social bot traffic
CN114650229B (en) Network encryption traffic classification method and system based on three-layer model SFTF-L
Feng et al. BotFlowMon: Learning-based, content-agnostic identification of social bot traffic flows
CN116132095A (en) Hidden malicious traffic detection method integrating statistical features and graph structural features
Chiu et al. Semi-supervised learning for false alarm reduction
Hsupeng et al. Explainable malware detection using predefined network flow
Al-Fawa'reh et al. Detecting stealth-based attacks in large campus networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination