CN112055007A - Software and hardware combined threat situation perception method based on programmable nodes - Google Patents

Software and hardware combined threat situation perception method based on programmable nodes Download PDF

Info

Publication number
CN112055007A
CN112055007A CN202010889682.1A CN202010889682A CN112055007A CN 112055007 A CN112055007 A CN 112055007A CN 202010889682 A CN202010889682 A CN 202010889682A CN 112055007 A CN112055007 A CN 112055007A
Authority
CN
China
Prior art keywords
flow
information
classifier
threat
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010889682.1A
Other languages
Chinese (zh)
Other versions
CN112055007B (en
Inventor
程光
赵玉宇
吴桦
袁帅
张慰慈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010889682.1A priority Critical patent/CN112055007B/en
Publication of CN112055007A publication Critical patent/CN112055007A/en
Application granted granted Critical
Publication of CN112055007B publication Critical patent/CN112055007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1458Denial of Service
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/164Adaptation or special uses of UDP protocol
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Abstract

The invention provides a software and hardware combined threat situation perception method based on programmable nodes, which comprises the following steps: abstracting the stream information, extracting abstract information in the message stream and transmitting the abstract information to a database; the database respectively calculates the entropy of various summary information stored in the processor and reports the calculation result to the decision server; the decision server trains a machine learning classifier model by using a training set to train a classifier capable of identifying the entropy value of the threat flow; the training set is constructed by mixing the generated abnormal flow and the normal flow; the decision server receives a message abstract entropy calculation result transmitted from a database, classifies the entropy result by using a trained classifier, identifies whether the flow is a threat flow, and displays the detailed information of the threat through a dynamic interface; and updating the classifier according to the time and the received message. The method of the invention can accurately and effectively identify the threat flow information in the network and improve the network security performance.

Description

Software and hardware combined threat situation perception method based on programmable nodes
Technical Field
The invention belongs to the technical field of network space security, relates to a situation awareness technology for perceiving network environment threats, and particularly relates to a software and hardware combined threat situation awareness method based on programmable nodes.
Background
With the rapid development of computer technology and the gradual improvement of hardware production technology, networks have become an important foundation and driving force for the development of the current information age. The network has larger scale, more and more complicated topology and more data types, which all provide challenges for the development of network security technology. In order to ensure the security of the network space environment, the network security situation awareness technology is becoming one of the hot spots of research in the network security field.
Network security situation awareness models researched abroad mainly include JDL models, Endsley models and Tim Bass models. The network security situation awareness models researched in China mainly include a Netflow-based network security situation awareness model, an information fusion-based network security situation assessment model and a large-scale network-oriented security situation awareness model, but the models all have problems and are not ideal in effect:
(1) netflow-based network security situation perception model
The Netflow-based network security situation perception model is formed by parts of stream data acquisition, event response, situation display and the like. As mass data information is processed by the system, and the visualization problem of the security situation is focused, the performance optimization problem needs to be further researched.
(2) Network security situation assessment model based on information fusion
By introducing an improved D-S evidence theory, the security situation evaluation model integrates vulnerability information and service information of multiple data sources, integrates data information of the multiple data sources, judges the security situation in a network environment by using a situation element fusion and node situation fusion method, and predicts the development trend of the network security situation by analyzing a time sequence. However, the model is established on the basis that log information of each node is error-free and accurate, and the threat situation in the network cannot be perceived.
(3) Security situation awareness model for large-scale network
Because the original network Security Situation Awareness system NSSAS (network Security Awareness architecture System) has limited processing capacity, a network Security Situation Awareness model YHSSAS which consists of four parts, namely data integration, association analysis, system indexes and event prediction, is provided, but the defects of the model YHSSAS are obvious, and the model is aimed at threat Situation analysis of a large-scale network and cannot effectively distinguish information such as threat flow, threat types and the like in the network.
Disclosure of Invention
Aiming at the network edge equipment, the invention provides a network threat situation perception method which utilizes machine learning and multi-core CPU to optimize database scheduling through programmable hardware equipment, identifies threat information in reported equipment flow, can effectively perceive the threat situation in the network and identify the attack type existing in the flow.
In order to achieve the purpose, the invention provides the following technical scheme:
a software and hardware combined threat situation perception method based on programmable nodes comprises the following steps:
(1) the programmable device uses 4 CPUs to operate the flow, CPU 0 receives and forwards the passing flow, and other three CPUs abstract the flow information, extract the source address, destination address, source port, destination port, protocol type, message length, TCP control field mark URG, ACK, PSH, RST, SYN and FIN in the message flow, and transmit the abstract information to the database;
(2) the database respectively calculates the entropy of various summary information stored in the processor and reports the calculation result to the decision server;
(3) the decision server trains a machine learning classifier model by using a training set to train a classifier capable of identifying the entropy value of the threat flow; the training set is constructed by mixing the generated abnormal flow and the normal flow; the abnormal traffic includes: the method comprises the steps that a host scans flow, a port scans flow, SYN flow, ACK flow, UDP flow and HTTP flow;
(4) the decision server receives the calculation result of the message abstract entropy value transmitted from the database, classifies the entropy value result by using the classifier trained in the step (3), identifies whether the flow is the threat flow, and displays the detailed information of the threat through a dynamic interface; updating the classifier according to the time and the received message to prevent the flow from generating concept drift; then cleaning abnormal flow based on IP address; the identified threat traffic types include: port scanning, host scanning, TCP _ ack, syn flooding, UDP flooding, http flooding.
Further, the step (1) specifically includes the following sub-steps:
(1.1) receiving and forwarding the data message by using a CPU (Central processing Unit) No. 0;
(1.2) optimizing the task scheduling of the processor, dividing task time by taking each second as a time interval, and taking each three seconds as a processor task period; in each task period, a CPU 1 receives a message in the first second; in the second, the CPU No. 1 extracts information from the message information abstract, extracts a source address, a destination address, a source port, a destination port, a protocol type, a message length and TCP control field marks URG, ACK, PSH, RST, SYN and FIN in the message, and the CPU No. 2 receives the message; in the third second, the CPU No. 1 uploads the summary information to the database, the CPU No. 2 summarizes the message, and the CPU No. 3 starts to receive the message;
and (1.3) the programmable node creates a database thread, connects the database, empties the selected table, reads the calculated entropy value information from the database, stores the information into the table and displays the information in the table.
Further, when extracting information from the message information summary in step (1.2), for each processor, first determining the standard type of the received message IP address: IPv4 and IPv6 extract the message abstract according to the message address type.
Further, in the step (2), the entropy calculation is performed by using the following formula:
Figure BDA0002656542950000031
wherein T is the length of the set for entropy calculation, n is the number of non-repetitive elements in the set, and different elements { a ] in the set1,a2,…,anThe corresponding number of occurrences is the set { d }1,d,…,dn}。
Further, the step (3) specifically includes the following sub-steps:
(3.1) firstly, acquiring the generated normal flow and abnormal flows of different types, and generating the abnormal flows by adopting an open source tool: SYN Flood traffic, ACK Flood traffic, host scan traffic, UDP Flood, HTTP Flood, and port scan traffic;
(3.2) mixing normal flow and abnormal flow in a proportional mixing mode by taking second as a unit, splicing the two flows in seconds, and constructing a training set;
and (3.3) leading the training set into a machine learning classifier for learning training, and training the classifier capable of identifying the entropy of the threat flow.
Further, in the step (3.3), an AdaBoost integration method based on a gini decision tree is adopted, each sample weight in training data is given, the sample weights are equal initially, the error rate is counted after the samples are learned by using a first weak learning algorithm, the weights of the algorithms are calculated according to the error rate, after each learning is completed, the weights of the samples are readjusted to enable the weights of the samples which are classified in the previous classification to be learned in a focused manner in the next learning, and the AdaBoost classifier is obtained finally after a plurality of rounds of learning.
Further, the step (4) comprises the following sub-steps:
(4.1) starting a blocking UDP server thread by the decision server, monitoring a database data sending port, and storing an entropy calculation result, time information and a source address sent by the database;
(4.2) initializing a machine learning classifier thread, checking whether a currently running classifier thread exists or not, and classifying the stored entropy information by using the classifier when the currently running classifier thread is detected; if no classifier is running in the thread, reading a training set in the host, and learning by using the training set to obtain the classifier;
(4.3) when the classifier detects the threat flow, the thread displays the threat flow information through a dynamic interface;
(4.4) training a new classifier by using a sample data set newly constructed based on the latest flow when the classifier is judged to meet the updating condition;
(4.5) the server end can input an instruction through the control window to inquire thread information and control the running of the thread;
and (4.6) the server issues an instruction for cleaning the abnormal flow to instruct the programmable node to only forward the normal flow to enter the protection node.
Further, in the step (4.3), the threat traffic information includes: threat type, source address, destination address, reporting time.
Further, the updating conditions in the step (4.4) are as follows: when the operation time of the classifier exceeds a threshold value, or when the data magnitude classified by the classifier reaches a set threshold value.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the invention provides a network threat perception method based on rapid threat discovery and integrated learning accurate identification of flow entropy characteristics, which can accurately and effectively identify threat flow information in a network, so that a network administrator can timely, efficiently and clearly discover detailed information of threat flow and threat in the network according to appointed time granularity, and the network security performance can be improved.
(2) The invention provides a task scheduling mode of a multiprocessor for calculating the flow abstract and the flow entropy value by using a method of combining software and hardware, and aims at the hardware part, thereby greatly improving the utilization rate of processor resources.
(3) The threat flow sensing method provided by the invention reduces the dimension of the flow twice, the dimension reduction for the first time compresses the flow into the abstract information of the data message, and the dimension reduction for the second time compresses the abstract information into various entropy values.
(4) The blocking UDP server can effectively reduce the load increased by the polling mode of the server for receiving data; updating the classifier can prevent traffic from causing conceptual drift.
Drawings
FIG. 1 is a structural framework of a threat situation awareness system;
FIG. 2 is a flow entropy based threat awareness implementation framework;
FIG. 3 is a schematic flow diagram of an AdaBoost classifier;
FIG. 4 is a sample entropy calculation of the host scan after mixing normal flows, label2 indicating the host scan flow;
FIG. 5 is a determination of machine learning hyper-parameters based on accuracy;
FIG. 6 is a machine learning dataset construction method;
FIG. 7 is an experimental topology build-up graph;
FIG. 8 is a schematic diagram of the operation of processors 1-3 in the programmable device;
fig. 9 is a final threat information presentation interface.
Detailed Description
The technical solutions provided by the present invention will be described in detail below with reference to specific examples, and it should be understood that the following specific embodiments are only illustrative of the present invention and are not intended to limit the scope of the present invention.
According to the characteristics of self-similarity, long correlation and heavy tail distribution of network flow indicated by research, the invention can describe the characteristics of the characteristics according to the entropy value, and designs the method for judging whether the network flow contains threat information by calculating the entropy values of various characteristic attributes of the network flow, including message characteristics such as a source address, a destination address, a source port, a destination port and the like.
The principle of detecting abnormal flow according to the flow entropy value is as follows:
compared with normal traffic, the entropy mode of abnormal traffic has a very obvious trend of increase and decrease, network behaviors such as DDOS attack, port scanning, host scanning and the like can cause the entropy change of the characteristic attribute of network information flow, and different modes are corresponded to the entropy change trend, by comparing the entropy changes of the whole flow, the source and destination IP, the source and destination ports, the protocol type, the message length and the Flags flag bit, the entropy characteristics of common typical abnormal traffic are shown in table 1:
TABLE 1 characteristics of entropy values of abnormal flows
Exception name H(flow) H(srcIP) H(srcPort) H(dstIP) H(dstPort) H(proto) H(len) H(flags)
Port scanning Increase of Reduce Reduce Reduce Increase of Reduce Increase of Is substantially unchanged
Host scanning Increase of Reduce Reduce Increase of Reduce Reduce Is substantially unchanged Reduce
TCP DDOS Reduce Reduce Reduce Reduce Reduce Reduce
UDP flood Reduce Reduce Increase of Reduce Reduce Reduce Reduce Reduce
HTTP flood Reduce Reduce Reduce Reduce Increase of Reduce Reduce Is substantially unchanged
In table 1, h (flow) is global flow entropy, h (srcip) is source and sink IP entropy, h (srcport) is source and sink port entropy, h (dstip) is destination IP entropy, h (dstport) is destination port entropy, h (proto) is protocol type entropy, h (len) is packet length, and h (Flags) is Flags entropy. In TCP DDOS, the message length entropy value of syn flow is increased compared with the normal flow, and the flag bit entropy value (H (flags)) is basically unchanged; the message length entropy value of ack flood is reduced compared with the flag bit entropy value of flags.
Based on the entropy characteristics, the invention provides a network threat perception method based on cooperation of rapid threat discovery and machine learning accurate threat identification, and an implementation framework of the network threat perception method is shown in fig. 1 and comprises programmable equipment, a server and protected network nodes. The main implementation flow is as shown in fig. 2, firstly, a programmable node is used for extracting network flow characteristics, then, network flow entropy calculation is performed, flow information after the entropy dimensionality reduction is uploaded to a decision server, and the decision server identifies threats and identifies the threats by adopting a machine learning method based on a training set. .
The hardware equipment used by the invention is a network processor carrying a programmable FPGA, and by utilizing the programmability of the equipment and changing the mode of processing a data packet by bottom hardware, when the flow passes through the network processor, the processor can extract quintuple and other information of the flow and store the information into a database, the database respectively calculates the entropy value of each characteristic attribute of the flow, and the calculated entropy value is reported to a server. And the decision server classifies the received entropy information through the trained machine learning classifier, identifies threat information in the flow and displays detailed information of the threat through visualization.
The machine learning classifier used in the invention is an AdaBoostClassifier classifier, and the AdaBoost machine learning classification algorithm based on gini decision trees with the accuracy rate of about 0.99962 is finally determined by carrying out the accuracy rate of entropy value classification on algorithms such as a gini impure splitting decision tree and an entrypy entropy splitting decision tree which are respectively based on fig. 5, a bagging decision forest, a boarding decision forest, a random forest integration method, an AdaBoost lifting method, a GBRT gradient lifting method and the like.
Specifically, as shown in FIG. 2, the method of the present invention comprises the following steps:
(1) the programmable device uses 4 CPUs to operate the flow, CPU 0 receives and forwards the passing flow, and other three CPUs abstract the flow information, and extract the source address, the destination address, the source port, the destination port, the protocol type, the message length, the TCP control field mark URG, ACK, PSH, RST, SYN and FIN in the message flow. And passes this summary information to the database.
The specific process of the step is as follows:
and (1.1) receiving and forwarding the data message by using a CPU No. 0.
And (1.2) optimizing the task scheduling of the processor, dividing the task time by taking each second as a time interval, and taking each three seconds as a processor task period. Receiving a message by the CPU No. 1 in the first second; in the second, the CPU No. 1 abstracts the message information, extracts information such as quintuple and the like, and the CPU No. 2 receives the message; and in the third second, the CPU No. 1 uploads the summary information to the database, the CPU No. 2 summarizes the message, and the CPU No. 3 starts to receive the message. The specific workflow is shown in fig. 8. For each processor, firstly judging the standard type of the IP address of the received message: IPv4 and IPv6 extract the abstract of the message according to the message address type, extract the source address, destination address, source port, destination port, protocol type, message length, TCP control field mark URG, ACK, PSH, RST, SYN and FIN in the message, and upload the abstract information into the database.
And (1.3) the programmable node creates a database thread, connects the database, empties the selected table, reads the calculated entropy value information from the database, stores the information into the table and displays the information in the table.
(2) The database respectively calculates entropy of each item of summary information stored in the processor, and reports the calculation result to the decision server, as shown in fig. 4.
In this step, the method for calculating the entropy of the summary information by the database is as follows:
the database uses the abstract information of the flow stored by the processor to respectively calculate the entropy values, and the formula of the entropy value calculation is as follows:
Figure BDA0002656542950000061
wherein T is the length of the set for entropy calculation, n is the number of non-repetitive elements in the set, and different elements { a ] in the set1,a2,…,anThe corresponding number of occurrences is the set { d }1,d,…,dn}。
(3) The decision server uses a training set to train a machine learning classifier model, and constructs normal flow and abnormal flow of different types through an existing tool, wherein the abnormal flow generated by using the tool comprises the following steps: the method comprises the steps that a host scans flow, port scanning flow, SYN flow, ACK flow, UDP flow and HTTP flow, and the generated abnormal flow and normal flow are mixed to form a training set; and training a classifier capable of identifying the entropy value of the threat flow through a training set.
The method specifically comprises the following steps:
(3.1) first, the generation normal traffic and the different types of abnormal traffic are acquired, and the SYN flow, the ACK flow, the host scan traffic, the UDP flow, the HTTP flow, and the port scan traffic are generated using an open source tool using data from two to two and fifteen minutes in the afternoon of 13 pm in 5/2020 of MAWI working group in japan as normal traffic data.
And (3.2) mixing the normal flow and the abnormal flow in a proportional mixing mode by taking seconds as a unit, and splicing the two flows by taking seconds as a time unit. As shown in fig. 6, where s1 is the generated pure anomaly traffic data set. In order to better meet the actual conditions of various network attacks, the method divides the s1 traffic data set into s1, s2 … … sn per second, and inserts the segmented traffic t1, t2 … … tn per second of normal traffic, wherein n is the duration of the abnormal traffic data set. And finally realizing the construction of a training set of the machine learning method.
And (3.3) leading the training set into a machine learning classifier for learning training, and training the classifier capable of identifying the entropy of the threat flow. The AdaBoost integration method based on gini decision tree is used, as shown in figure 3, the method firstly gives each sample weight in training data, each sample weight is equal initially, the error rate is counted after learning is carried out by using a first weak learning algorithm, the weight of the algorithm is calculated according to the error rate, after each learning is finished, the weight of the sample is readjusted to enable the weight of the sample which is wrongly classified in the previous classification to be mainly learned in the next learning, and the AdaBoostClassifier algorithm classifier is finally obtained after multiple rounds of learning.
(4) And the decision server receives the message abstract entropy calculation result transmitted from the database, classifies the entropy calculation result by using a trained classifier, and identifies whether the flow is the threat flow. And displaying the detailed information of the threat through a dynamic interface set up by the open source tool. The threat types that the present invention can identify include: port scanning, host scanning, TCP _ ack, syn flooding, UDP flooding, http flooding. And updating the classifier according to the time and the received message, so as to prevent the flow from generating concept drift. An IP address based flush of the exception traffic is then performed.
The method specifically comprises the following steps:
and (4.1) starting a blocking UDP server thread by the decision server, monitoring a data sending port of the database, and storing the entropy calculation result, the time information and the source address sent by the database. The blocking UDP server can effectively reduce the load increased by the polling mode for receiving data by the server.
And (4.2) initializing a machine learning classifier thread, checking whether a currently running classifier thread exists or not, and classifying the stored entropy value information by using the classifier when the currently running classifier thread is detected. And if the thread does not have a classifier which is running, reading a training set in the csv format in the host, and training by using the training set to obtain the AdaBoostClassifier classifier.
(4.3) when the classifier detects the threat flow, the thread displays the threat flow information through a dynamic interface built by using an open source tool, wherein the dynamic interface comprises threat detailed information such as threat types, source addresses, destination addresses and reporting time, and a specific result display interface is shown in fig. 9.
And (4.4) judging the updating condition of the classifier, and training a new classifier by using a sample data set newly constructed based on the latest flow to prevent the flow from generating concept drift when the operation time of the classifier exceeds a threshold value or the data magnitude of the classifier reaches a set threshold value.
And (4.5) the server end can input an instruction through the control window to inquire the thread information and control the thread to run.
And (4.6) the server issues an instruction for cleaning the abnormal flow to instruct the programmable node to only forward the normal flow to enter the protection node.
In order to verify the effectiveness of the threat situation awareness method, an experiment topology based on the combination of software and hardware is established in an experiment.
Fig. 7 shows an experimental topology building framework capable of implementing the invention, in which a host carrying a machine learning stream entropy classification algorithm is used as a server, and a next-generation network processor prototype system carrying a programmable FPGA and a multi-core CPU is used as a network flow monitoring device.
Software in the server is a training module and a classifier module of machine learning, the training module is responsible for training a classifier by utilizing a known threat sample, and the classifier module is responsible for receiving stream entropy information and visualizing a classification result. CPU 0 in the programmable device is responsible for receiving and transmitting data messages, CPU 1-3 is responsible for abstracting and transmitting data messages, and the flow entropy calculation module is responsible for calculating message abstract information into a flow entropy and uploading calculation results to the server. The server is directly connected with the programmable device, and the programmable device is placed in a network topology needing sensing and is responsible for a port forwarding function.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (9)

1. A software and hardware combined threat situation perception method based on programmable nodes is characterized by comprising the following steps:
(1) the programmable device uses 4 CPUs to operate the flow, CPU 0 receives and forwards the passing flow, and other three CPUs abstract the flow information, extract the source address, destination address, source port, destination port, protocol type, message length, TCP control field mark URG, ACK, PSH, RST, SYN and FIN in the message flow, and transmit the abstract information to the database;
(2) the database respectively calculates the entropy of various summary information stored in the processor and reports the calculation result to the decision server;
(3) the decision server trains a machine learning classifier model by using a training set to train a classifier capable of identifying the entropy value of the threat flow; the training set is constructed by mixing the generated abnormal flow and the normal flow; the abnormal traffic includes: host scanning flow, port scanning flow, SYN flow, ACK flow, UDPFlood flow and HTTP flow;
(4) the decision server receives the calculation result of the message abstract entropy value transmitted from the database, classifies the entropy value result by using the classifier trained in the step (3), identifies whether the flow is the threat flow, and displays the detailed information of the threat through a dynamic interface; updating the classifier according to the time and the received message to prevent the flow from generating concept drift; then cleaning abnormal flow based on IP address; the identified threat traffic types include: port scanning, host scanning, TCP _ ack, syn flooding, UDP flooding, http flooding.
2. The programmable node-based software and hardware combined threat situation awareness method according to claim 1, wherein the step (1) specifically comprises the following sub-steps:
(1.1) receiving and forwarding the data message by using a CPU (Central processing Unit) No. 0;
(1.2) optimizing the task scheduling of the processor, dividing task time by taking each second as a time interval, and taking each three seconds as a processor task period; in each task period, a CPU 1 receives a message in the first second; in the second, the CPU No. 1 extracts information from the message information abstract, extracts a source address, a destination address, a source port, a destination port, a protocol type, a message length and TCP control field marks URG, ACK, PSH, RST, SYN and FIN in the message, and the CPU No. 2 receives the message; in the third second, the CPU No. 1 uploads the summary information to the database, the CPU No. 2 summarizes the message, and the CPU No. 3 starts to receive the message;
and (1.3) the programmable node creates a database thread, connects the database, empties the selected table, reads the calculated entropy value information from the database, stores the information into the table and displays the information in the table.
3. The method for sensing threat situation based on combination of hardware and software of programmable node according to claim 2, wherein when extracting information from the message information summary in step (1.2), for each processor, first determining the standard type of the IP address of the received message: IPv4 and IPv6 extract the message abstract according to the message address type.
4. The programmable node-based software and hardware combined threat situation awareness method according to claim 1, wherein in the step (2), the entropy calculation is performed by using the following formula:
Figure FDA0002656542940000021
wherein T is the length of the set for entropy calculation, n is the number of non-repetitive elements in the set, and different elements { a ] in the set1,a2,...,anThe corresponding number of occurrences is the set { d }1,d,...,dn}。
5. The programmable node-based software and hardware combined threat situation awareness method according to claim 1, wherein the step (3) specifically comprises the following sub-steps:
(3.1) firstly, acquiring the generated normal flow and abnormal flows of different types, and generating the abnormal flows by adopting an open source tool: SYN Flood traffic, ACK Flood traffic, host scan traffic, UDP Flood, HTTP Flood, and port scan traffic;
(3.2) mixing normal flow and abnormal flow in a proportional mixing mode by taking second as a unit, splicing the two flows in seconds, and constructing a training set;
and (3.3) leading the training set into a machine learning classifier for learning training, and training the classifier capable of identifying the entropy of the threat flow.
6. The software and hardware combined threat situation awareness method based on the programmable node as claimed in claim 5, wherein in the step (3.3), an AdaBoost integration method based on a gini decision tree is adopted, each sample weight in training data is firstly given, each sample weight is equal initially, the error rate is counted after the training data is learned by using a first weak learning algorithm, the weight of the algorithm is calculated according to the error rate, after each learning is completed, the weight of the sample is readjusted to enable the weight of the sample which is wrongly classified in the previous classification to be learned with emphasis in the next learning, and the AdaBoostClassifier algorithm classifier is finally obtained after a plurality of rounds of learning.
7. The programmable node-based hardware and software combined threat situation awareness method according to claim 1, wherein the step (4) comprises the sub-steps of:
(4.1) starting a blocking UDP server thread by the decision server, monitoring a database data sending port, and storing an entropy calculation result, time information and a source address sent by the database;
(4.2) initializing a machine learning classifier thread, checking whether a currently running classifier thread exists or not, and classifying the stored entropy information by using the classifier when the currently running classifier thread is detected; if no classifier is running in the thread, reading a training set in the host, and learning by using the training set to obtain the classifier;
(4.3) when the classifier detects the threat flow, the thread displays the threat flow information through a dynamic interface;
(4.4) training a new classifier by using a sample data set newly constructed based on the latest flow when the classifier is judged to meet the updating condition;
(4.5) the server end can input an instruction through the control window to inquire thread information and control the running of the thread;
and (4.6) the server issues an instruction for cleaning the abnormal flow to instruct the programmable node to only forward the normal flow to enter the protection node.
8. The programmable node-based hardware and software combined threat situation awareness method according to claim 7, wherein in the step (4.3), the threat traffic information comprises: threat type, source address, destination address, reporting time.
9. The programmable node-based software and hardware combined threat situation awareness method according to claim 7, wherein the updating conditions in the step (4.4) are as follows: when the operation time of the classifier exceeds a threshold value, or when the data magnitude classified by the classifier reaches a set threshold value.
CN202010889682.1A 2020-08-28 2020-08-28 Programmable node-based software and hardware combined threat situation awareness method Active CN112055007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010889682.1A CN112055007B (en) 2020-08-28 2020-08-28 Programmable node-based software and hardware combined threat situation awareness method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010889682.1A CN112055007B (en) 2020-08-28 2020-08-28 Programmable node-based software and hardware combined threat situation awareness method

Publications (2)

Publication Number Publication Date
CN112055007A true CN112055007A (en) 2020-12-08
CN112055007B CN112055007B (en) 2022-11-15

Family

ID=73607001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010889682.1A Active CN112055007B (en) 2020-08-28 2020-08-28 Programmable node-based software and hardware combined threat situation awareness method

Country Status (1)

Country Link
CN (1) CN112055007B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112788066A (en) * 2021-02-26 2021-05-11 中南大学 Abnormal flow detection method and system for Internet of things equipment and storage medium
CN113452675A (en) * 2021-05-21 2021-09-28 济南浪潮数据技术有限公司 Network access control method, device, equipment and storage medium in cloud platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098180A (en) * 2011-02-17 2011-06-15 华北电力大学 Network security situational awareness method
CN102324007A (en) * 2011-09-22 2012-01-18 重庆大学 Method for detecting abnormality based on data mining
US20150341376A1 (en) * 2014-05-26 2015-11-26 Solana Networks Inc. Detection of anomaly in network flow data
CN109450860A (en) * 2018-10-16 2019-03-08 南京航空航天大学 A kind of detection method threatened based on entropy and the advanced duration of support vector machines
US10581887B1 (en) * 2017-05-31 2020-03-03 Ca, Inc. Employing a relatively simple machine learning classifier to explain evidence that led to a security action decision by a relatively complex machine learning classifier

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098180A (en) * 2011-02-17 2011-06-15 华北电力大学 Network security situational awareness method
CN102324007A (en) * 2011-09-22 2012-01-18 重庆大学 Method for detecting abnormality based on data mining
US20150341376A1 (en) * 2014-05-26 2015-11-26 Solana Networks Inc. Detection of anomaly in network flow data
US10581887B1 (en) * 2017-05-31 2020-03-03 Ca, Inc. Employing a relatively simple machine learning classifier to explain evidence that led to a security action decision by a relatively complex machine learning classifier
CN109450860A (en) * 2018-10-16 2019-03-08 南京航空航天大学 A kind of detection method threatened based on entropy and the advanced duration of support vector machines

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
M. BOUCADAIR等: "Distributed-Denial-of-Service Open Threat Signaling (DOTS) Server Discovery draft-ietf-dots-server-discovery-00", 《IETF 》 *
YUYANG ZHOU等: "Building an efficient intrusion detection system based on feature selection and ensemble classifier", 《COMPUTER NETWORKS》 *
刘效武等: "网络安全态势认知融合感控模型", 《软件学报》 *
李阳等: "基于流量统计特征的潜在威胁用户挖掘方法", 《山东大学学报(理学版)》 *
王力等: "大型云计算网络中威胁数据检测研究", 《信息通信》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112788066A (en) * 2021-02-26 2021-05-11 中南大学 Abnormal flow detection method and system for Internet of things equipment and storage medium
CN113452675A (en) * 2021-05-21 2021-09-28 济南浪潮数据技术有限公司 Network access control method, device, equipment and storage medium in cloud platform

Also Published As

Publication number Publication date
CN112055007B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
Ye et al. A DDoS attack detection method based on SVM in software defined network
US10505819B2 (en) Method and apparatus for computing cell density based rareness for use in anomaly detection
US8069210B2 (en) Graph based bot-user detection
US11800260B2 (en) Network telemetry with byte distribution and cryptographic protocol data elements
US7903657B2 (en) Method for classifying applications and detecting network abnormality by statistical information of packets and apparatus therefor
EP3304813A1 (en) Network behavior data collection and analytics for anomaly detection
CN105024877A (en) Hadoop malicious node detection system based on network behavior analysis
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN112055007B (en) Programmable node-based software and hardware combined threat situation awareness method
CN110825545A (en) Cloud service platform anomaly detection method and system
KR101602189B1 (en) traffic analysis and network monitoring system by packet capturing of 10-giga bit data
CN108833430B (en) Topology protection method of software defined network
Patcha et al. Network anomaly detection with incomplete audit data
Zang et al. Machine learning-based intrusion detection system for big data analytics in VANET
Viegas et al. A resilient stream learning intrusion detection mechanism for real-time analysis of network traffic
Zhao et al. Secure IoT edge: Threat situation awareness based on network traffic
CN116346418A (en) DDoS detection method and device based on federal learning
Rahal et al. Towards the development of realistic dos dataset for intelligent transportation systems
TWI704782B (en) Method and system for backbone network flow anomaly detection
CN114978976B (en) Data anomaly detection method and device for SRv6 converged network
CN112235309B (en) Multi-scale detection system for hidden channel of cloud platform network
Xu Network Behavior Analysis
CN111786903B (en) Network traffic classification method based on constrained fuzzy clustering and particle computation
CN107566187B (en) SLA violation monitoring method, device and system
CN117439763A (en) Encryption network traffic detection method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant