CN114785623A - Network intrusion detection method and device based on discretization characteristic energy system - Google Patents

Network intrusion detection method and device based on discretization characteristic energy system Download PDF

Info

Publication number
CN114785623A
CN114785623A CN202210703944.XA CN202210703944A CN114785623A CN 114785623 A CN114785623 A CN 114785623A CN 202210703944 A CN202210703944 A CN 202210703944A CN 114785623 A CN114785623 A CN 114785623A
Authority
CN
China
Prior art keywords
flow
network
feature
energy
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210703944.XA
Other languages
Chinese (zh)
Inventor
许成程
翟江涛
刘光杰
戴跃伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202210703944.XA priority Critical patent/CN114785623A/en
Publication of CN114785623A publication Critical patent/CN114785623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of network traffic identification, in particular to a network intrusion detection method and a device based on a discretization characteristic energy system, wherein the network intrusion detection method based on the discretization characteristic energy system comprises the following steps: collecting normal network flow data, and dividing network flows according to quintuple information; preprocessing network flow characteristics; discretizing the features using a feature discretization module; constructing a flow classifier based on a discretization characteristic energy system; and inputting the flow to be detected into a flow classifier based on a discretization characteristic energy system, and determining the network flow property according to a threshold value. The network intrusion detection method and device based on the discretization characteristic energy system can effectively classify the data to be tested into normal flow or malicious flow on the premise of only using normal network flow.

Description

Network intrusion detection method and device based on discretization characteristic energy system
Technical Field
The invention relates to the technical field of network traffic identification, in particular to a network intrusion detection method and a network intrusion detection device based on a discretization characteristic energy system.
Background
The traffic classification is to associate network traffic into specific categories according to requirements, and becomes a crucial component of network space security management. For example, in the field of network management, traffic may be classified according to different priorities to ensure the quality of service of the network. In the field of network space security, traffic is generally divided into normal traffic and malicious traffic, so as to achieve the purpose of network anomaly detection. In recent years, with the wide application of encryption technology in network applications, traffic encryption has become a current mainstream trend. In particular, many malware use encryption techniques such as TLS to encrypt traffic to avoid detection by firewalls and network intrusion detection systems. These practices present new challenges to traditional traffic classification approaches.
Traffic encryption techniques can be divided into application layer encryption, presentation layer encryption, and network layer encryption, depending on the network layer. Application layer encryption refers to the application program implementing its own secure data transmission protocol at the application layer, also known as conventional encryption. The presentation layer encryption and the network layer encryption refer to that an application program encrypts an entire data packet from an upper layer, and typical technologies include some tunnel technologies such as TLS and IPsec, and for example, VPN is based on these technologies. This type of encryption is also known as protocol encapsulation. In some cases, encrypted traffic through conventional encryption may be further encrypted through protocol encapsulation.
In recent years, different classifiers based on conventional machine learning and deep learning have been proposed one after another. These stream-based classifiers can achieve very high accuracy. But machine learning based classifiers need to be trained by tagging malicious traffic samples. However, true traffic identification is difficult to do, especially in the case of malicious traffic. In addition, after training specific data distribution, the classifier based on machine learning often has poor effect and lower field self-adaptive capability when being applied to other data with slightly different distributions.
Disclosure of Invention
The present invention aims to provide a network intrusion detection method and device based on a discretization characteristic energy system, so as to solve the problems proposed in the background art.
The technical scheme of the invention is as follows: the network intrusion detection method based on the discretization characteristic energy system comprises the following steps:
step 1, collecting normal network flow data, and dividing network flow according to quintuple information;
step 2, preprocessing network flow characteristics;
step 3, discretizing the features by using a feature discretization module;
step 4, constructing a flow classifier based on a discretization characteristic energy system;
and 5, inputting the flow to be detected into a flow classifier based on the discretization characteristic energy system, and determining the network flow property according to a threshold value.
Preferably, in step 1, the collected normal network traffic is captured by the traffic collector using a wireshark tool, exists in a PCAP form, and is stored after being divided according to the information of five elements, namely SrcIP, SrcPort, DstIP, DstPort, and Protocol.
Preferably, in step 2, the network flow characteristic preprocessing includes the following steps:
step 2.1, the network flow data input feature extraction tool obtains flow statistic feature vector
Figure 604549DEST_PATH_IMAGE001
Step 2.2, inputting the size of the data packet sequence into the multilayer perceptron network to extract the packet sequence characteristics, and carrying out local characteristic amplification to obtain characteristic vectors
Figure 228297DEST_PATH_IMAGE002
Step 2.3, inputting the original bytes after the network flow pretreatment into a convolutional neural network to extract the original byte characteristics, and carrying out local characteristic amplification to obtain characteristic vectors
Figure 996533DEST_PATH_IMAGE003
Step 2.4, flowStatistical features
Figure 694755DEST_PATH_IMAGE001
Packet sequence feature
Figure 232047DEST_PATH_IMAGE002
Original byte characteristics
Figure 105325DEST_PATH_IMAGE003
The combination results in a mixed feature tuple.
Preferably, in step 2.1, the stream statistical feature extraction is to use a feature extraction tool, cifcflowmeter, to perform feature extraction on the divided network streams, use an XGBoost method to perform feature dimension reduction processing, sequentially perform traversal calculation on the value of each feature through an objective function composed of a loss function and a regularization penalty term, and find the feature point of the minimized objective function, thereby obtaining a feature tuple
Figure 610124DEST_PATH_IMAGE001
Objective function of
Figure 797523DEST_PATH_IMAGE004
As shown in formula (1), wherein
Figure 766485DEST_PATH_IMAGE005
In order to be a function of the loss,
Figure 748347DEST_PATH_IMAGE006
in order to be a penalty function,
Figure 553492DEST_PATH_IMAGE007
in (1)
Figure 528270DEST_PATH_IMAGE008
The difference between the true value and the predicted value is described,
Figure 102471DEST_PATH_IMAGE009
is a sample
Figure 761293DEST_PATH_IMAGE010
First, the
Figure 725838DEST_PATH_IMAGE012
The resulting tree model is fitted in a round of fits,
Figure 317356DEST_PATH_IMAGE013
is composed of
Figure 933014DEST_PATH_IMAGE014
The first derivative of (a) is,
Figure 318996DEST_PATH_IMAGE015
is composed of
Figure 285684DEST_PATH_IMAGE016
The second derivative of (a) is,
Figure 415314DEST_PATH_IMAGE017
is the number of leaves of the tree model,
Figure 636211DEST_PATH_IMAGE018
in order to obtain the learning rate of the learning,
Figure 707941DEST_PATH_IMAGE019
for the prediction of the input samples by the decision tree,
Figure 647078DEST_PATH_IMAGE020
to control the constant parameters of the size of the penalty term,
Figure 580399DEST_PATH_IMAGE021
is the first decision tree
Figure 908000DEST_PATH_IMAGE022
Predicted values of the leaf nodes:
Figure 901363DEST_PATH_IMAGE023
(1)
Figure 577064DEST_PATH_IMAGE024
Figure 48497DEST_PATH_IMAGE025
in step 2.2, the packet sequence feature extraction is to output a feature tuple after extracting the size sequence feature of the network flow data packet by using a multilayer perceptron network
Figure 243986DEST_PATH_IMAGE026
(ii) a The linear mapping using local feature amplification between fully connected layers is shown in formula (2) to obtain feature tuples
Figure 657519DEST_PATH_IMAGE027
Will be
Figure 305669DEST_PATH_IMAGE028
Adding augmented matrices to feature tuples
Figure 33322DEST_PATH_IMAGE026
In, wherein the multilayer perceptron network structure is three full connection layers:
Figure 411214DEST_PATH_IMAGE029
。 (2)
preferably, in step 2.3, the original byte feature extraction is to extract feature tuples from the original bytes of the network stream after the original bytes are input into the convolutional neural network
Figure 418484DEST_PATH_IMAGE030
Obtaining feature tuples after linear mapping between fully connected layers using local feature amplification
Figure 337286DEST_PATH_IMAGE031
Will be
Figure 353784DEST_PATH_IMAGE032
Matrix adding for augmentationIs added to
Figure 835450DEST_PATH_IMAGE030
The convolutional neural network structure comprises two convolutional layers, two pooling layers and two full-connection layers; in step 2.4, the feature fusion is to use the statistical features of the flow after dimension reduction
Figure 13621DEST_PATH_IMAGE033
Packet sequence feature extracted by multilayer perceptron
Figure 964260DEST_PATH_IMAGE026
Original byte features extracted by convolutional neural network
Figure 768137DEST_PATH_IMAGE034
Forming a mixed feature tuple.
Preferably, in step 3, the discretization of the features is to form an ordered array for the value of each feature, representing the global distribution thereof,
Figure 855041DEST_PATH_IMAGE035
represents the minimum value of the characteristic,
Figure 718961DEST_PATH_IMAGE036
representing the maximum value of the characteristic, and taking the value of each characteristic array
Figure 891316DEST_PATH_IMAGE037
Is divided equally and increased
Figure 984037DEST_PATH_IMAGE038
And with
Figure 443225DEST_PATH_IMAGE039
Two intervals, forming H characteristic valid intervals.
Preferably, in step 4, the constructing of the flow classifier based on the discretized characteristic energy system includes the following steps:
step 4.1, establishing a network flow-energy field system, namely, corresponding each feature to particles in an energy field only according to the reconstructed features and the value tuples thereof of the normal flow, wherein all the features and the values thereof form an energy field;
4.2, the characteristic probability statistical module is used for calculating the frequency of each characteristic value in the total characteristic value and the frequency of combination occurrence among a plurality of characteristics;
and 4.3, the Hamiltonian energy calculation module is used for calculating the energy of the whole network flow so as to obtain the energy characteristic of the normal sample and finally determine a threshold value.
Preferably, in step 4.1, the building of the network flow energy field system is to instantiate the normal network flow and its feature tuple, and if (a 1 … AN) is taken as the feature N-tuple, the flow k can be instantiated as (ak 1 … akN), where aki e is
Figure 963199DEST_PATH_IMAGE040
The value of each feature is from the set
Figure 75380DEST_PATH_IMAGE040
The value ranges of all the characteristics are the number of intervals
Figure 34109DEST_PATH_IMAGE041
(ii) a Establishing a system consisting of a plurality of characteristic nodes, wherein the nodes are mutually associated, local energy exists in the nodes, and the nodes have coupling energy with each other; in step 4.2, the network flow characteristic probability statistic module is shown in formulas (3) and (4),
Figure 33289DEST_PATH_IMAGE042
is characterized in that
Figure 301328DEST_PATH_IMAGE043
Take a value of
Figure 385959DEST_PATH_IMAGE044
The probability of (a) of (b) being,
Figure 882799DEST_PATH_IMAGE045
is characterized in that
Figure 985753DEST_PATH_IMAGE046
And
Figure 175426DEST_PATH_IMAGE047
value pair of
Figure 544091DEST_PATH_IMAGE048
A joint probability of (a);
Figure 311800DEST_PATH_IMAGE049
(3)
Figure 19993DEST_PATH_IMAGE050
;(4)
in step 4.3, the Hamiltonian energy calculation module firstly calculates the coupling energy of the feature pair combination value, then calculates the local energy of each feature value in the network flow,
Figure 364256DEST_PATH_IMAGE051
taking values for the first interval after the characteristic discretization,
Figure 485796DEST_PATH_IMAGE052
the covariance matrix obtained for the feature single probability and joint probability is shown in equation (5) and the coupling energy
Figure 262122DEST_PATH_IMAGE053
Local energy as shown in equation (6)
Figure 401985DEST_PATH_IMAGE054
As shown in equation (7):
Figure 871144DEST_PATH_IMAGE055
(5)
Figure 214400DEST_PATH_IMAGE056
(6)
Figure 309264DEST_PATH_IMAGE057
(7);
and finally, calculating the Hamiltonian of each network flow, wherein the Hamiltonian obtained by local energy and coupling energy represents the total energy of each flow, and the local energy of all characteristic nodes is shown as a formula (8)
Figure 54366DEST_PATH_IMAGE058
And coupled energy
Figure 946623DEST_PATH_IMAGE059
The negative value after summation represents the total energy per stream
Figure 777176DEST_PATH_IMAGE060
Taking the energy value of the 95 th% position sample of the energy distribution of the normal flow sample as a preset threshold value;
Figure 895305DEST_PATH_IMAGE061
(8)。
preferably, in step 5, the detection of the flow to be detected is that a flow classifier monitors the network, waits for the captured network flow, when a first network flow is captured, calculates the hamilton energy of the network flow according to equation (7), compares the energy of the network flow with a preset threshold, if the energy exceeds the preset threshold, the network flow is classified as a malicious flow, and performs interception, otherwise, the flow is classified as a normal flow and passes through, and the flow classifier waits for another flow.
The device applied to the network intrusion detection method based on the discretization characteristic energy system comprises a flow classifier and a flow collector.
Compared with the prior art, the invention provides a network intrusion detection method and a device based on a discretization characteristic energy system by improvement, and has the following improvements and advantages:
the network intrusion detection method and device based on the discretization characteristic energy system can effectively classify the data to be tested into normal flow or malicious flow on the premise of only using normal network flow.
Drawings
The invention is further explained below with reference to the figures and examples:
FIG. 1 is a flow chart of a network intrusion detection method based on a discretized characteristic energy system according to the present invention;
FIG. 2 is a schematic diagram of feature extraction and feature fusion according to the present invention;
FIG. 3 is an enlarged view of the local blur feature of the present invention;
FIG. 4 is a network flow representation of a mapping to an energy field structure according to the present invention;
FIG. 5 is a schematic diagram of a discretized feature energy architecture framework of the present invention;
FIG. 6 is a diagram illustrating the distribution of energy in normal and malicious network flows according to the present invention.
Detailed Description
The present invention is described in detail below, and technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a network intrusion detection method and a device based on a discretization characteristic energy system by improvement, and the technical scheme of the invention is as follows:
as shown in fig. 1 to fig. 6, the network intrusion detection method based on the discretization characteristic energy system includes the following steps:
step 1, collecting normal network flow data, and dividing network flows according to quintuple information; the collected normal network flow is captured by a flow collector by using a wireshark tool, exists in a PCAP form, and is stored after being divided according to SrcIP, SrcPort, DstIP, DstPort and Protocol quintuple information;
step 2, preprocessing network flow characteristics; the feature extraction and fusion is to extract and fuse the features of different dimensions of each network flow, and comprises the stages of flow statistical feature extraction and dimension reduction, data packet sequence feature extraction, convolutional neural network original byte feature extraction and local feature amplification after linear mapping;
step 3, discretizing the features by using a feature discretization module;
step 4, constructing a flow classifier based on a discretization characteristic energy system;
and 5, inputting the flow to be detected into a flow classifier based on the discretization characteristic energy system, and determining the network flow property according to a threshold value.
In step 2, the network flow characteristic preprocessing includes the following steps:
step 2.1, the network flow data input feature extraction tool obtains flow statistic feature vector
Figure 9760DEST_PATH_IMAGE062
Step 2.2, inputting the sequence size of the data packet into a multilayer perceptron network to extract packet sequence characteristics, and carrying out local characteristic amplification to obtain a characteristic vector
Figure 820722DEST_PATH_IMAGE063
Step 2.3, inputting the original bytes after the network flow pretreatment into a convolutional neural network to extract the original byte characteristics, and carrying out local characteristic amplification to obtain characteristic vectors
Figure 138570DEST_PATH_IMAGE064
Step 2.4, flow statistical characterization
Figure 309658DEST_PATH_IMAGE062
Bag sequence features
Figure 29352DEST_PATH_IMAGE063
Original byte characteristics
Figure 260482DEST_PATH_IMAGE064
The combination results in a mixed feature tuple.
In step 2.1, the stream statistical feature extraction is to use a feature extraction tool CICFlowMeter to extract features of the divided network streams, use an XGboost method to perform feature dimension reduction processing, sequentially traverse and calculate the value of each feature through an objective function composed of a loss function and a regularization penalty term, and find the feature points of the minimized objective function, so as to obtain feature tuple
Figure 65627DEST_PATH_IMAGE062
The method comprises the steps of obtaining a data packet, wherein the data packet comprises a port number, a protocol type, a stream byte rate, a stream duration, a forward and backward arrival interval time, a maximum value, a minimum value, an average value, a variance and the like; objective function
Figure 791137DEST_PATH_IMAGE065
As shown in formula (1), wherein
Figure 617536DEST_PATH_IMAGE066
In order to be a function of the loss,
Figure 770299DEST_PATH_IMAGE067
in order to be a penalty function, the system,
Figure 797161DEST_PATH_IMAGE068
in
Figure 575630DEST_PATH_IMAGE069
The difference between the true value and the predicted value is described,
Figure 4337DEST_PATH_IMAGE070
is a sample
Figure 577270DEST_PATH_IMAGE072
First, the
Figure 357007DEST_PATH_IMAGE073
The resulting tree model is fitted in a round of fits,
Figure 424320DEST_PATH_IMAGE074
is composed of
Figure 956802DEST_PATH_IMAGE075
The first derivative of (a) is,
Figure 716947DEST_PATH_IMAGE076
is composed of
Figure 718401DEST_PATH_IMAGE077
The second derivative of (a) is,
Figure 576024DEST_PATH_IMAGE078
is the number of leaves of the tree model,
Figure 979323DEST_PATH_IMAGE079
in order to obtain a learning rate,
Figure 159638DEST_PATH_IMAGE080
for the prediction of the input samples by the decision tree,
Figure 382809DEST_PATH_IMAGE081
to control the constant parameters of the size of the penalty term,
Figure 323083DEST_PATH_IMAGE082
is the first decision tree
Figure 564577DEST_PATH_IMAGE083
Predicted values of individual leaf nodes:
Figure 666525DEST_PATH_IMAGE084
(1)
Figure 829522DEST_PATH_IMAGE085
Figure 104646DEST_PATH_IMAGE086
in step 2.2, after extracting the size sequence features of the network stream data packet by using the multi-layer perceptron network hidden layer, outputting feature tuples with dimension of 1 × 16 at the first full-connection layer
Figure 154641DEST_PATH_IMAGE087
(ii) a The linear mapping using local feature amplification between two fully connected layers is shown in formula (2), the mapping can amplify fuzzy local features to play a role in attention, and the dimensionality obtained after the fully connected layers is represented by
Figure 730286DEST_PATH_IMAGE088
Characteristic tuple of (2)
Figure 865732DEST_PATH_IMAGE089
Will be
Figure 944547DEST_PATH_IMAGE090
Adding augmented matrices to feature tuples
Figure 363896DEST_PATH_IMAGE063
In the formation dimension of
Figure 604384DEST_PATH_IMAGE091
Characteristic tuple of
Figure 741973DEST_PATH_IMAGE087
Wherein the multilayer perceptron network structure contains two hidden layers, two full-link layers, and each hidden layer has 214 neurons:
Figure 358900DEST_PATH_IMAGE092
。 (2)
in step 2.3, the dimensionality of the first full-link layer of the original bytes after the convolutional layer is extracted after the original bytes of the network flow are input into the convolutional neural network
Figure 383487DEST_PATH_IMAGE093
Feature tuple
Figure 44145DEST_PATH_IMAGE094
After linear mapping using local feature amplification between two fully connected layers, the dimension is obtained at the second fully connected layer as
Figure 154183DEST_PATH_IMAGE088
Characteristic tuple of
Figure 574800DEST_PATH_IMAGE095
Will be
Figure 706092DEST_PATH_IMAGE096
Is added as an augmentation matrix
Figure 475334DEST_PATH_IMAGE094
In (1) formation of
Figure 338247DEST_PATH_IMAGE097
Characteristic tuple of (2)
Figure 296976DEST_PATH_IMAGE094
The convolutional neural network structure comprises two convolutional layers, two pooling layers and two full-connection layers, the activation function uses Relu, and a Dropout layer is arranged between the pooling layers to prevent overfitting.
Wherein, in the step 3, the discretization of the features is to form an ordered array for the value of each feature to represent the global distribution of the feature,
Figure 545424DEST_PATH_IMAGE098
represents the minimum value of the characteristic,
Figure 767458DEST_PATH_IMAGE099
representing the maximum value of the characteristic, and taking the value of each characteristic array
Figure 632514DEST_PATH_IMAGE100
Is divided equally and increased
Figure 394934DEST_PATH_IMAGE101
And with
Figure 514200DEST_PATH_IMAGE102
Two intervals form H characteristic valid intervals.
In step 4, the flow classifier based on the discretization characteristic energy system is constructed by the following steps:
step 4.1, establishing the relation between the network flow and the energy field; the energy system is established by describing network flow by using an energy field concept in quantum mechanics, and forming a single flow by using the feature tuple discretized in the step 3
Figure 893753DEST_PATH_IMAGE103
Is shown in which
Figure 527997DEST_PATH_IMAGE104
The characteristic composition is shown.
Figure 15479DEST_PATH_IMAGE105
Denotes the first
Figure 989251DEST_PATH_IMAGE106
A set of all possible values of each feature, and each feature
Figure 349825DEST_PATH_IMAGE106
All have a local energy
Figure 392736DEST_PATH_IMAGE107
Figure 496958DEST_PATH_IMAGE108
Is a set of all possible pairs of features, different network flows can be represented by combinations of different features. Creating a graph of a plurality of characteristic nodes, the nodes being related to each other and having functions
Figure 574505DEST_PATH_IMAGE109
The determined correlated coupling energy;
step 4.2, feature probability and covariance matrix;
Figure 105980DEST_PATH_IMAGE110
is characterized by
Figure 652499DEST_PATH_IMAGE106
Take a value of
Figure 484714DEST_PATH_IMAGE111
The probability of (a) of (b) being,
Figure 229816DEST_PATH_IMAGE112
is characterized in that
Figure 119143DEST_PATH_IMAGE106
And
Figure 152958DEST_PATH_IMAGE113
value pair of
Figure 598983DEST_PATH_IMAGE114
Is a joint probability, covariance matrix
Figure 651122DEST_PATH_IMAGE115
In order to eliminate the influence of indirect correlation in data, an inverse matrix of the covariance matrix is used for the next calculation;
4.3, calculating coupling energy; the coupling energy is calculated according to the characteristic probability and the covariance matrix in the step 4.2;
4.4, calculating local energy; said local energy
Figure 727662DEST_PATH_IMAGE116
The calculation is obtained by calculation according to the coupling energy in the step 4.3 and the characteristic probability and covariance matrix in the step 3.2;
step 4.5, calculating network flow energy; the energy of the single stream is calculated as the negative of the sum of the coupling energy and the local energy between its features of step 4.4 and step 4.3. FIG. 4 is a network flow representation method mapped to an energy field structure, wherein ai representsMapping to the characteristics and values of the energy field, interacting each particle (characteristics and values) in the energy field to generate a plurality of coupled energy fields e (ai, aj) and local energy fields h (ai), and generating different interaction energy according to the difference of the particle size distance relation, thereby representing network flow and integrating the network flow
Figure 763620DEST_PATH_IMAGE117
Is characterized in that
Figure 13336DEST_PATH_IMAGE106
And
Figure 670713DEST_PATH_IMAGE113
the set of all possible coupled energies between them,
Figure 791309DEST_PATH_IMAGE118
is characterized in that
Figure 268557DEST_PATH_IMAGE106
A set of all possible local energies associated;
step 4.6, determining a threshold value; the determining threshold is to determine the energy of the network flow according to the energy of the network flow calculated in step 4.5, train only the normal flow samples in the data set, calculate the energy value distribution of the normal flow samples, and use the energy value of the 95 th% position sample of the energy distribution of the normal flow samples as a preset threshold.
Step 5, inputting the flow to be detected into a flow classifier based on a discretization characteristic energy system, and determining the network flow property according to a threshold value;
step 5.1, the traffic classifier monitors the network, waiting for the network flow to be captured
Figure 56385DEST_PATH_IMAGE119
Step 5.2, obtaining dimensionality after dimensionality reduction through the feature extraction and fusion module
Figure 817536DEST_PATH_IMAGE120
Flow statistics of
Figure 32617DEST_PATH_IMAGE121
The dimension extracted by the multilayer perceptron is
Figure 512009DEST_PATH_IMAGE122
Packet sequence characteristics
Figure 837948DEST_PATH_IMAGE123
And the dimension extracted by the convolutional neural network is
Figure 469917DEST_PATH_IMAGE122
Original byte characteristics of
Figure 574009DEST_PATH_IMAGE124
Is formed in the dimension of
Figure 25850DEST_PATH_IMAGE125
Mixed feature tuples
Figure 407677DEST_PATH_IMAGE126
Step 5.3, replacing the value of each feature in the mixed feature tuple with the interval value after feature discretization;
step 5.4, computing the captured network flow
Figure 159732DEST_PATH_IMAGE127
First, the
Figure 903566DEST_PATH_IMAGE106
Each characteristic takes on a value of
Figure 639441DEST_PATH_IMAGE111
Probability of (2)
Figure 776024DEST_PATH_IMAGE128
,
Figure 428591DEST_PATH_IMAGE129
Calculating
Figure 890797DEST_PATH_IMAGE127
First, the
Figure 582809DEST_PATH_IMAGE113
Each characteristic takes on a value of
Figure 506772DEST_PATH_IMAGE130
Probability of (2)
Figure 967840DEST_PATH_IMAGE131
Figure 587565DEST_PATH_IMAGE132
And calculate its first
Figure 501294DEST_PATH_IMAGE106
Figure 760106DEST_PATH_IMAGE113
Joint probability of simultaneous occurrence of individual features
Figure 872419DEST_PATH_IMAGE133
Step 5.5, calculating the coupling energy according to the single characteristic probability and the joint characteristic probability in the step 4.2
Figure 145268DEST_PATH_IMAGE134
Step 5.6, calculating characteristics according to the coupling energy in step 4.3
Figure 592299DEST_PATH_IMAGE135
Local field value of
Figure 608796DEST_PATH_IMAGE136
Step 5.7, calculating network flow by using variables in the above steps
Figure 575615DEST_PATH_IMAGE127
The energy of (a). Initialization energy
Figure 534213DEST_PATH_IMAGE137
=0, and
Figure 156955DEST_PATH_IMAGE138
step 5.8, comparing the preset threshold c with the network flow energy to be measured, if so
Figure 26079DEST_PATH_IMAGE139
If not, the flow classifier releases the flow and waits for another flow.
The network intrusion detection method and device based on the discretization characteristic energy system can effectively classify the data to be tested into normal flow or malicious flow on the premise of only using normal network flow.
The previous description is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The network intrusion detection method based on the discretization characteristic energy system is characterized by comprising the following steps: the method comprises the following steps:
step 1, collecting normal network flow data, and dividing network flow according to quintuple information;
step 2, preprocessing network flow characteristics;
step 3, discretizing the features by using a feature discretization module;
step 4, constructing a flow classifier based on a discretization characteristic energy system;
and 5, inputting the flow to be detected into a flow classifier based on the discretization characteristic energy system, and determining the network flow property according to a threshold value.
2. The network intrusion detection method based on the discretized characteristic energy system according to claim 1, wherein: in the step 1, the collected normal network flow is captured by the flow collector by using a wireshark tool, exists in a PCAP form, and is stored after being divided according to the five-element information of SrcIP, SrcPort, DstIP, DstPort and Protocol.
3. The network intrusion detection method based on the discretized characteristic energy system according to claim 1, wherein: in step 2, the network flow characteristic preprocessing includes the following steps:
step 2.1, the network flow data input feature extraction tool obtains flow statistic feature vector
Figure 849654DEST_PATH_IMAGE001
Step 2.2, inputting the sequence size of the data packet into a multilayer perceptron network to extract packet sequence characteristics, and carrying out local characteristic amplification to obtain a characteristic vector
Figure 531433DEST_PATH_IMAGE002
Step 2.3, inputting the original bytes after the network flow pretreatment into a convolutional neural network to extract the original byte characteristics, and carrying out local characteristic amplification to obtain characteristic vectors
Figure 816921DEST_PATH_IMAGE003
Step 2.4, flow statistical characteristics
Figure 491616DEST_PATH_IMAGE001
Packet sequence feature
Figure 467531DEST_PATH_IMAGE002
Original byte characteristics
Figure 569479DEST_PATH_IMAGE003
The combination results in a mixed feature tuple.
4. The network intrusion detection method based on the discretized characteristic energy system according to claim 3, wherein: in step 2.1, the stream statistical feature extraction is to use a feature extraction tool CICFlowMeter to extract features of the divided network streams, use an XGboost method to perform feature dimension reduction processing, sequentially traverse and calculate the value of each feature through an objective function composed of a loss function and a regularization penalty term, and find the feature points of the minimized objective function, so as to obtain feature tuples
Figure 233941DEST_PATH_IMAGE001
Objective function of
Figure 446748DEST_PATH_IMAGE004
As shown in formula (1), wherein
Figure 746011DEST_PATH_IMAGE005
In order to be a function of the loss,
Figure 877915DEST_PATH_IMAGE006
in order to be a penalty function, the system,
Figure 13361DEST_PATH_IMAGE007
in (1)
Figure 780591DEST_PATH_IMAGE008
The difference between the true value and the predicted value is described,
Figure 75306DEST_PATH_IMAGE009
is a sample
Figure 519057DEST_PATH_IMAGE010
First, the
Figure 656646DEST_PATH_IMAGE011
Wheel fitting productThe model of the tree is generated by the generation of the tree,
Figure 945676DEST_PATH_IMAGE012
is composed of
Figure 94898DEST_PATH_IMAGE013
The first derivative of (a) is,
Figure 194703DEST_PATH_IMAGE014
is composed of
Figure 39162DEST_PATH_IMAGE015
The second derivative of (a) is,
Figure 522096DEST_PATH_IMAGE016
is the number of leaves of the tree model,
Figure 650458DEST_PATH_IMAGE017
in order to obtain a learning rate,
Figure 170432DEST_PATH_IMAGE018
for the prediction of the input samples by the decision tree,
Figure 892401DEST_PATH_IMAGE019
to control the constant parameters of the size of the penalty term,
Figure 539545DEST_PATH_IMAGE020
is the first decision tree
Figure 538725DEST_PATH_IMAGE021
Predicted values of individual leaf nodes:
Figure 354234DEST_PATH_IMAGE022
(1)
Figure 422553DEST_PATH_IMAGE023
Figure 122656DEST_PATH_IMAGE024
in step 2.2, the packet sequence feature extraction is to output a feature tuple after extracting the size sequence feature of the network flow data packet by using a multilayer perceptron network
Figure 727075DEST_PATH_IMAGE025
(ii) a The linear mapping using local feature amplification between fully connected layers is shown in formula (2) to obtain feature tuples
Figure 979065DEST_PATH_IMAGE026
Will be
Figure 285412DEST_PATH_IMAGE027
Adding augmented matrices to feature tuples
Figure 772894DEST_PATH_IMAGE025
In, wherein the multilayer perceptron network structure is three full connection layers:
Figure 605721DEST_PATH_IMAGE028
(2)。
5. the method of claim 3, wherein the method comprises the steps of: in step 2.3, the original byte feature extraction is to extract feature tuples from the original bytes after inputting the original bytes of the network flow into the convolutional neural network
Figure 638399DEST_PATH_IMAGE029
Obtaining feature tuples after linear mapping between fully connected layers using local feature amplification
Figure 436636DEST_PATH_IMAGE030
Will be
Figure 337596DEST_PATH_IMAGE030
As an addition to the amplification matrix
Figure 165874DEST_PATH_IMAGE029
The convolutional neural network structure comprises two convolutional layers, two pooling layers and two full-connection layers; in step 2.4, the feature fusion is to use the flow statistical features after dimension reduction
Figure 618721DEST_PATH_IMAGE031
Packet sequence feature extracted by multilayer perceptron
Figure 24295DEST_PATH_IMAGE025
Original byte features extracted by convolutional neural network
Figure 604312DEST_PATH_IMAGE029
Forming a mixed feature tuple.
6. The network intrusion detection method based on the discretized characteristic energy system according to claim 1, wherein: in step 3, the discretization of the features is to form an ordered array for the value of each feature to represent the global distribution of the feature,
Figure 772250DEST_PATH_IMAGE032
represents the minimum value of the characteristic,
Figure 536944DEST_PATH_IMAGE033
representing the maximum value of the characteristic, and taking the value of each characteristic array
Figure 305180DEST_PATH_IMAGE034
Is divided equally and increased
Figure 938155DEST_PATH_IMAGE035
And
Figure 600081DEST_PATH_IMAGE036
two intervals, forming H characteristic valid intervals.
7. The method of claim 1, wherein the method comprises the steps of: in step 4, the construction of the flow classifier based on the discretization characteristic energy system comprises the following steps:
step 4.1, establishing a network flow-energy field system, namely, corresponding each feature to particles in an energy field only according to the reconstructed features and value tuples of the features after normal flow reconstruction, wherein all the features and the values of the features form an energy field;
4.2, the characteristic probability statistical module is used for calculating the frequency of each characteristic value in the total characteristic value and the frequency of combination occurrence among a plurality of characteristics;
and 4.3, the Hamiltonian energy calculation module is used for calculating the energy of the whole network flow so as to obtain the energy characteristic of the normal sample and finally determine the threshold value.
8. The network intrusion detection method based on the discretized characteristic energy system according to claim 7, wherein: in step 4.1, the step of constructing the network flow energy field system is to instantiate the normal network flow and the feature tuple thereof, and if (A1 … AN) is taken as the feature N-tuple, the flow k can be instantiated as (ak 1 … akN), wherein aki belongs to the element
Figure 411042DEST_PATH_IMAGE037
The value of each feature is from the set
Figure 151727DEST_PATH_IMAGE037
The value ranges of all the characteristics are the number of intervals
Figure 198181DEST_PATH_IMAGE038
(ii) a Establishing a system consisting of a plurality of characteristic nodes, wherein the nodes are mutually associated, local energy exists in the nodes, and the nodes have interactive coupling energy; in step 4.2, the network flow characteristic probability statistic module is shown in formulas (3) and (4),
Figure 855558DEST_PATH_IMAGE039
is characterized by
Figure 86688DEST_PATH_IMAGE040
Take a value of
Figure 563937DEST_PATH_IMAGE041
The probability of (a) of (b) being,
Figure 148502DEST_PATH_IMAGE042
is characterized in that
Figure 411118DEST_PATH_IMAGE040
And
Figure 563882DEST_PATH_IMAGE043
value pair of
Figure 918640DEST_PATH_IMAGE044
A joint probability of (a);
Figure 165951DEST_PATH_IMAGE045
(3)
Figure 797920DEST_PATH_IMAGE046
;(4)
in step 4.3, the Hamiltonian energy calculation module firstly calculates the coupling energy of the feature pair combination value, then calculates the local energy of each feature value in the network flow,
Figure 246219DEST_PATH_IMAGE047
taking values for the first interval after the characteristic discretization,
Figure 448793DEST_PATH_IMAGE048
the covariance matrix obtained for the feature single probability and joint probability is shown in equation (5), coupling energy
Figure 250527DEST_PATH_IMAGE049
Local energy as shown in formula (6)
Figure 861637DEST_PATH_IMAGE050
As shown in formula (7):
Figure 605471DEST_PATH_IMAGE051
(5)
Figure 544608DEST_PATH_IMAGE052
(6)
Figure 166344DEST_PATH_IMAGE053
(7)
and finally, calculating the Hamiltonian of each network flow, wherein the Hamiltonian obtained by local energy and coupling energy represents the total energy of each flow, and the local energy of all characteristic nodes is shown as a formula (8)
Figure 631961DEST_PATH_IMAGE054
And coupling energy
Figure 297428DEST_PATH_IMAGE055
The negative value after summation represents the total energy per stream
Figure 707550DEST_PATH_IMAGE056
And is combined withTaking the energy value of a 95% position sample of the energy distribution of the normal flow sample as a preset threshold;
Figure 506879DEST_PATH_IMAGE057
(8)。
9. the network intrusion detection method based on the discretized characteristic energy system according to claim 1, wherein: in step 5, the flow classifier monitors the network, waits for the captured network flow, calculates the hamiltonian energy of the network flow according to the formula (7) when the first network flow is captured, compares the energy of the network flow with a preset threshold, and classifies the network flow as a malicious flow if the energy exceeds the preset threshold, and captures the flow, otherwise, the flow passes through the flow classified as a normal flow, and the flow classifier waits for the other flow.
10. The apparatus for applying the network intrusion detection method based on the discretized characteristic energy system of claim 2, wherein: comprises a flow classifier and a flow collector.
CN202210703944.XA 2022-06-21 2022-06-21 Network intrusion detection method and device based on discretization characteristic energy system Pending CN114785623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210703944.XA CN114785623A (en) 2022-06-21 2022-06-21 Network intrusion detection method and device based on discretization characteristic energy system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210703944.XA CN114785623A (en) 2022-06-21 2022-06-21 Network intrusion detection method and device based on discretization characteristic energy system

Publications (1)

Publication Number Publication Date
CN114785623A true CN114785623A (en) 2022-07-22

Family

ID=82421634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210703944.XA Pending CN114785623A (en) 2022-06-21 2022-06-21 Network intrusion detection method and device based on discretization characteristic energy system

Country Status (1)

Country Link
CN (1) CN114785623A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164418A1 (en) * 2017-11-30 2019-05-30 Volkswagen Ag System and method for predicting and maximizing traffic flow
CN112398779A (en) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 Network traffic data analysis method and system
CN112910853A (en) * 2021-01-18 2021-06-04 南京信息工程大学 Encryption flow classification method based on mixed characteristics
CN113395276A (en) * 2021-06-10 2021-09-14 广东为辰信息科技有限公司 Network intrusion detection method based on self-encoder energy detection
CN114615093A (en) * 2022-05-11 2022-06-10 南京信息工程大学 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190164418A1 (en) * 2017-11-30 2019-05-30 Volkswagen Ag System and method for predicting and maximizing traffic flow
CN112398779A (en) * 2019-08-12 2021-02-23 中国科学院国家空间科学中心 Network traffic data analysis method and system
CN112910853A (en) * 2021-01-18 2021-06-04 南京信息工程大学 Encryption flow classification method based on mixed characteristics
CN113395276A (en) * 2021-06-10 2021-09-14 广东为辰信息科技有限公司 Network intrusion detection method based on self-encoder energy detection
CN114615093A (en) * 2022-05-11 2022-06-10 南京信息工程大学 Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAMILA F. T. PONTES等: "《A New Method for Flow-Based Network Intrusion Detection Using the Inverse Potts Model》", 《IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT》 *

Similar Documents

Publication Publication Date Title
Shapira et al. FlowPic: A generic representation for encrypted traffic classification and applications identification
Fang et al. Application of intrusion detection technology in network safety based on machine learning
Cao et al. An accurate traffic classification model based on support vector machines
CN111817982B (en) Encrypted flow identification method for category imbalance
CN108900432B (en) Content perception method based on network flow behavior
CN110796196B (en) Network traffic classification system and method based on depth discrimination characteristics
CN109831392B (en) Semi-supervised network flow classification method
CN113469234A (en) Network flow abnormity detection method based on model-free federal meta-learning
CN105871832A (en) Network application encrypted traffic recognition method and device based on protocol attributes
Truong-Huu et al. An empirical study on unsupervised network anomaly detection using generative adversarial networks
CN110113353A (en) A kind of intrusion detection method based on CVAE-GAN
CN113364787B (en) Botnet flow detection method based on parallel neural network
CN114615093A (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
Yu et al. An encrypted malicious traffic detection system based on neural network
Atli Anomaly-based intrusion detection by modeling probability distributions of flow characteristics
Kong et al. Identification of abnormal network traffic using support vector machine
CN112910853A (en) Encryption flow classification method based on mixed characteristics
CN114091020A (en) Anti-attack defense method and system based on feature grouping and multi-model fusion
Lu et al. A heuristic-based co-clustering algorithm for the internet traffic classification
Al-Fawa'reh et al. Detecting stealth-based attacks in large campus networks
Liu et al. A cascade forest approach to application classification of mobile traces
Ding et al. Multi-granular aggregation of network flows for security analysis
Qi Computer Real-Time Location Forensics Method for Network Intrusion Crimes.
Tavallaee et al. Online classification of network flows
CN114785623A (en) Network intrusion detection method and device based on discretization characteristic energy system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220722

RJ01 Rejection of invention patent application after publication