CN112019497A - Word embedding-based multi-stage network attack detection method - Google Patents

Word embedding-based multi-stage network attack detection method Download PDF

Info

Publication number
CN112019497A
CN112019497A CN202010660792.0A CN202010660792A CN112019497A CN 112019497 A CN112019497 A CN 112019497A CN 202010660792 A CN202010660792 A CN 202010660792A CN 112019497 A CN112019497 A CN 112019497A
Authority
CN
China
Prior art keywords
attack
stage
data
vector
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010660792.0A
Other languages
Chinese (zh)
Other versions
CN112019497B (en
Inventor
周鹏
周公延
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202010660792.0A priority Critical patent/CN112019497B/en
Publication of CN112019497A publication Critical patent/CN112019497A/en
Application granted granted Critical
Publication of CN112019497B publication Critical patent/CN112019497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a word embedding-based multi-stage network attack detection method, which comprises the following steps: 1) carrying out feature selection on a data set formed by network traffic features after attack occurs; 2) vectorizing network traffic data using a word embedding method; 3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method; 4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack. The method has the advantages that the intrusion detection system can automatically associate the attack stage from the data packet level without defining association rules, and simultaneously, the problem that no alarm is generated in part of the attack stage when multi-stage attack detection is carried out from the alarm level is avoided.

Description

Word embedding-based multi-stage network attack detection method
Technical Field
The invention relates to a word embedding-based multi-stage attack network attack detection method, which is suitable for purposefully carrying out multi-stage network attack intrusion detection by an attacker under an industrial internet boundary protection scene.
Background
The industrial internet boundary protection generally comprises five aspects of identification, protection, detection, response and recovery. The intrusion detection technology is an important ring in industrial internet boundary protection, and the attack is positioned by monitoring and detecting continuous network flow of the industrial internet and analyzing the network flow characteristics after the attack so as to identify the occurrence of a security event and provide information for a security response and security recovery mechanism.
Due to the continuous development of the industrial internet boundary protection technology, it is gradually difficult for an attacker to infiltrate the network by utilizing isolated vulnerabilities and security flaws (such as SQL injection attacks, denial of service attacks, etc.). Therefore, in order to successfully invade, an attacker often needs to combine and gradually infiltrate a series of attack means such as network detection, vulnerability discovery, defect utilization and the like, so that one invasion process is composed of a plurality of stages to form a multi-stage attack, and what is more, in order to achieve the purpose of hiding the attack, the attacker often disguises some attack stages as normal network behaviors, but the disguised attack stages are associated with other behaviors to achieve the purpose of hiding the attack.
The traditional intrusion detection technology based on machine learning generally models network traffic analysis, or identifies based on the network characteristics of existing attacks, or detects through the abnormity of network packets, basically ignores the sequence correlation characteristics of network data, and cannot detect multi-stage attacks. Therefore, detection of multi-stage attacks faces new challenges. On the other hand, the existing multi-stage attack detection methods are mainly classified into rule-based methods and statistical learning algorithm-based methods, wherein the rule-based methods need to write rules manually and are generally used for extracting multi-stage attacks from attacked data and performing association analysis. The method based on the statistical learning algorithm mainly uses a hidden Markov model, a large number of attack samples are learned through statistical analysis to obtain model parameters, but the hidden Markov model uses an independence hypothesis, namely, the current state is only related to the previous state, and the deeper multi-stage attack characteristics cannot be learned.
Disclosure of Invention
The invention aims to provide a multi-stage network attack detection method based on a word embedding method from the viewpoint that corresponding network packets have potential correlation in different attack stages of multi-stage attack. Different from the existing method, the invention develops a word embedding-based multi-stage network attack detection method aiming at the sequence characteristics of network flow data and the planned multi-stage attack behavior of an attacker.
In order to achieve the above purpose, the invention is realized by the following technical scheme:
a multi-stage attack network attack detection method based on word embedding comprises the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
The characteristic selection in the step 1) comprises the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, and making X ═ (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, and selecting n features;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),…,xi (j)) Finally, obtaining a sequence with no more than the characteristic quantity;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈Rk
Step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j));
The step 3) of constructing the current vector and the historical vector and constructing the training sample comprises the following steps:
step 3.1, creating a vector H epsilon R with the length of mm×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkInformation indicating current data, let AiRepresenting a corresponding attack phase;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, correspondingThe label is 1, and data at other moments are taken
Figure BDA0002578445520000031
As a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, and the positive and negative sample construction ratio is 1: g, g>1 is an algorithm parameter which is manually set during sampling training;
step 3.3, updating H according to the attack stage, wherein the new vector contains the information of the attack stage which is present at present, and repeating the sample construction process in the step 3.2 to construct M samples;
the establishment of the word embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating a correlation vector based on the current data vector D and the historical vector H, and recording the correlation vector as R (H, D) as the input of a classifier;
the correlation vector R (H, D) is calculated by
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmMaking a Hadamard product;
the optimization objective is
Figure BDA0002578445520000032
Wherein S represents the size of a training set, H and D are input of a model, y is a label and represents the real output of the model, and p (D, H) represents the association probability between current data D and historical data H which have been attacked;
step 4.2, initializing a history vector H;
step 4.3, reading the real-time network traffic data, selecting the characteristics used by the read-in real-time network traffic data according to the network traffic characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, Using step4.1, judging the relation between D and the occurred attack stage, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the D into the cache, and updating the H when the size of the cache reaches the specified size;
and 4.6, repeating the steps 4.3 to 4.5.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
the intrusion detection system can automatically associate the attack stage from the data packet level without defining association rules, and simultaneously avoids the problem that no alarm is generated in part of the attack stage when multi-stage attack detection is carried out from the alarm level.
Drawings
FIG. 1 is a general flow chart of the process of the present invention.
FIG. 2(a) is a diagram of a multi-stage attack detection model in the training phase of the present invention.
FIG. 2(b) is a diagram of a multi-stage attack detection model at the testing stage of the present invention.
Fig. 3(a) is a diagram of a network traffic data vectorization model according to the present invention.
Fig. 3(b) is an exemplary diagram of vectorization of network traffic data according to the present invention.
FIG. 4 is a receiver operational characteristic of a multi-stage attack detection model.
FIG. 5 is a confusion matrix corresponding to the multi-stage attack detection results in the test set.
FIG. 6(a) is a diagram showing the real-time detection result of multi-stage attack in the test set according to the present invention.
FIG. 6(b) is a diagram of the original attack phase distribution of network traffic in the test suite according to the present invention.
Fig. 6(c) is a diagram of the multi-stage attack real-time detection result in other network scenarios according to the present invention.
Fig. 6(d) is the original attack stage distribution diagram under other network scenarios.
Detailed Description
The invention is described in detail below with reference to the drawings and preferred embodiments.
The first embodiment is as follows:
as shown in fig. 1, a multi-stage network attack detection method based on word embedding includes the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
Example two:
this embodiment is substantially the same as the first embodiment, and is characterized in that:
in this embodiment, a multi-stage network attack detection method based on word embedding, the feature selection in step 1) includes the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, wherein the division ratio of the embodiment is 80%: 10%: 10%, let X be (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, selecting n features, selecting an IP address, a network protocol, a port number, data length and time difference as original features in the embodiment, and mapping the port number to a common vulnerability utilization port;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),,xi (j)) Finally, a sequence with no more than the number of features is obtained, in the embodiment, only a network, a port number and an IP are finally selected for vectorization, wherein a source port and a destination port are of the same polarity, so that only an embedded vector of a sequence learning port is constructed, and IP addresses are the same;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈RkDenotes vi j∈RkIn this embodiment, the vector dimension is set to be 8;
step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j));
The training sample construction in the step 3) comprises the following steps:
step 3.1, maintain a vector H belonging to R with length of mm×j×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkWatch, watchInformation indicating current data, order AiRepresents the corresponding attack phase, the length of H in this example is 50;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, the corresponding label is 1, and data at other time points is arbitrarily taken
Figure BDA0002578445520000051
As a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, the positive and negative sample construction ratio is 1: g, and in order to enhance the generalization capability of the model, the positive and negative sample ratio is set to be 1:50 in the embodiment;
step 3.3, updating H according to the attack stage, wherein the new vector contains the information of the attack stage which has appeared at present, repeating the sample construction process in the step 3.2, and constructing M samples, wherein M is 20400 in the embodiment;
the establishment of the word embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating an association vector based on a current data vector D and a historical vector H, recording the association vector as R (H, D), and training a multi-stage attack detection model based on word embedding;
the correlation vector calculation method comprises
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmAnd (5) making a Hadamard product.
The optimization objective is
Figure BDA0002578445520000061
Wherein S represents the size of the training set, H and D are the input of the model, y is the label and represents the real output of the model, and p (D, H) represents the association probability between the current data D and the historical data H in which the attack has occurred.
Step 4.2, initializing a history vector H, reading real-time network traffic data, wherein the initialization H is the first stage of the multi-stage attack;
4.3, selecting the characteristics used by the read-in real-time network flow data according to the network flow characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, judging the relation between the D and the occurred attack stage by using the model established in the step 4.1, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the value D into the cache, and updating the value H when the size of the cache reaches the specified size, wherein the threshold value is selected according to the recall rate and the accuracy rate of the model in the embodiment, and the size of the cache is set to be 50;
and 4.6, repeating the steps 4.3 to 4.5.
Example three:
this embodiment is basically the same as the second embodiment, and is characterized in that:
in this embodiment, when a word embedding-based multi-stage attack detection model is established, a plurality of classification algorithms are used to output association probabilities, and the results are shown in the following table through comparison analysis in multiple aspects of accuracy, recall, F1, AUC, and inference time:
Figure BDA0002578445520000071
the network traffic characteristics selected in this embodiment are as follows:
(1) protocol, network transport protocol types such as TCP, UDP, ICMP, etc.;
(2) data length, length of a single data packet;
(3) delta time, the time difference between the current packet and the last packet;
(4) source port, source host port number;
(5) destinationport, destination host port number;
(6) source IP, source host IP address;
(7) destination IP, destination host IP address;
in this embodiment, fig. 2(a) shows a multi-stage attack detection model based on word embedding in a training stage, where inputs of the model are historical attack stage data and current data, a historical vector and a current vector are obtained respectively after word embedding vectorization, an association vector is calculated by using the method mentioned in step 4.1, and finally, association between the current data and the historical stage of attack is determined by using a classifier. Fig. 2(b) is a model diagram of a test phase, where the test phase model acquires network traffic in real time, initializes a history vector by using an intrusion detection system, outputs the association probability between current data and the history vector, adds the probability value to a cache when the probability value is greater than a given threshold, and updates the history vector when the cache reaches a specified size.
In this embodiment, the multi-variable word embedding vectorization network data model in step two is shown in fig. 3(a), fig. 3(b) is an example of multi-variable word embedding vectorization, and it is noted that the example of fig. 3(b) includes three features, and there are only two sequences after serialization because the port numbers are the same between the source host and the target host, so the port numbers of the source host and the target host are regarded as the same sequence, and the same port numbers have the same word vector between the source host and the target host.
In this embodiment, fig. 4 shows the receiver operation characteristic curve of the proposed multi-stage attack detection model when using different classifiers, which shows that the ensemble learning classifier achieves the best effect, mainly because the vectors obtained by using the word embedding method are high-dimensional data, in this embodiment, the word embedding dimension is not high, and a good result can be obtained by using the conventional machine learning classification algorithm.
Fig. 5 shows the confusion matrix corresponding to the multi-stage attack detection results in the test set, and it can be seen that each attack stage is correctly detected with an accuracy of over 90%.
Fig. 6(a) shows the association probability of the data at the current time and the previous attack stages in the test set network traffic of the multi-stage attack model, fig. 6(b) shows the positions where different stages of the attack occur in the original test set data, and comparing fig. 6(a) and fig. 6(b), the model can correctly detect that the model is associated with the previous attack stages. Meanwhile, in order to test the generalization performance of the model, the multi-stage attack detection model of the invention is further evaluated in different network scenarios, and the corresponding detection result and the original attack stage distribution are as shown in fig. 6(c) and fig. 6(d), and it can be seen from the figure that the attack stage 2 to the attack stage 4 are correctly associated in the network scenario.
The present invention is not limited to the above embodiments, and those skilled in the art can implement the present invention in other various embodiments according to the disclosure of the present invention, so that all designs and concepts of the present invention can be changed or modified without departing from the scope of the present invention.

Claims (4)

1. A multi-stage network attack detection method based on word embedding is characterized by comprising the following steps:
1) carrying out feature selection on a data set formed by network traffic features after attack occurs;
2) vectorizing network traffic data using a word embedding method;
3) respectively constructing a current vector and a historical vector, and constructing a training sample by using a negative sampling method;
4) establishing a multi-stage attack detection model based on word embedding, calculating an association vector, calculating association probability by using a supervised learning classification algorithm, and judging the possibility that the current data belongs to multi-stage attack.
2. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the feature selection in the step 1) comprises the following steps:
step 1.1, randomly dividing a data set consisting of a large number of attack samples into a training set, a verification set and a test set, and making X ═ (X)1,x2,x3,…,xi) Represents a sequence data, xi=(xi (1),xi (2),xi (3),…,xi (j)) Representing a single data packet, where xi (j)Representing a data packet xiThe jth feature of (1);
step 1.2, according to the network data packet composition and the network transmission protocol, analyzing attack data, performing preliminary feature selection and feature construction, and selecting n features;
the vectorization of the network traffic data in the step 2) comprises the following steps:
step 2.1, according to n network flow characteristics obtained by characteristic selection, deleting other network flow characteristics which are not selected in the original data set, wherein the network flow refers to the information quantity passing through network equipment or transmission media in unit time;
step 2.2, the data set is divided into sequences corresponding to features, i.e. S ═ S(1),s(2),s(3),…,s(j)) Wherein the individual signature sequences are denoted as s(j)=(x1 (j),x2 (j),x3 (j),…,xi (j)) Finally, obtaining a sequence with no more than the characteristic quantity;
and 2.3, using the obtained multiple feature sequences and using a skip-gram word embedding method in word2vec, namely using a single feature sequence as a corpus, selecting a value in a window range as a sample each time, using the central word of the window as input, using the rest words as output, constructing a neural network, using the central word to predict the rest words, and using the weight of a hidden layer of the neural network as a vector of the input word. Similarly, word embedding is carried out on other characteristic sequences to obtain a word embedding vector corresponding to each characteristic, and the word embedding vector is expressed as vi j∈Rk
Step 2.4, splicing the vectors corresponding to the plurality of characteristics to obtain vector representation of the network flow data, namely vi=(vi (1),vi (2),vi (3),…,vi (j))。
3. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the constructing of the training samples in the step 3) comprises the following steps:
step 3.1, creating a vector H epsilon R with the length of mm×kStoring the history information of the multi-stage attack, and expressing the data at any time t as D e RkInformation indicating current data, let AiRepresenting a corresponding attack phase;
step 3.2, initializing H as data of the first attack stage, and taking data D in the second stage attack+∈A2As a positive example, [ H, D ]+]As a positive sample, the corresponding label is 1, and data at other time points is arbitrarily taken
Figure FDA0002578445510000021
As a negative example, [ H, D ]-]As a negative sample, the corresponding label is 0, and the positive and negative sample construction ratio is 1: g, g>1 is an algorithm parameter which is manually set during sampling training;
and 3.3, updating H according to the attack stage, wherein the new vector contains the current attack stage information, and repeating the sample construction process in the step 3.2 to construct M samples.
4. The word-embedding-based multi-stage network attack detection method according to claim 1, wherein the establishing of the word-embedding-based multi-stage attack detection model in the step 4) comprises the following steps:
4.1, converting the sequence modeling problem into a classification problem based on the training sample constructed in the step 3, calculating an association vector based on a current data vector D and a historical vector H, recording the association vector as R (H, D), and training a multi-stage attack detection model based on word embedding;
the correlation vector R (H, D) is calculated by
R(H,D)=D⊙[h1,h2,…,hm]
Wherein h ismRepresenting the m-th vector in H, the above formula represents D to each vector H in HmMake HadaProduct of Mare;
the optimization objective is
Figure FDA0002578445510000022
Wherein S represents the size of a training set, H and D are input of a model, y is a label and represents the real output of the model, and p (D, H) represents the association probability between current data D and historical data H which have been attacked;
step 4.2, initializing a history vector H, reading real-time network traffic data, wherein the initialization H is the first stage of the multi-stage attack;
4.3, selecting the characteristics used by the read-in real-time network flow data according to the network flow characteristics selected in the step 1.2, and vectorizing the characteristics according to the step 2 to obtain a real-time data vector D;
step 4.4, judging the relation between the D and the occurred attack stage by using the model established in the step 4.1, and outputting the association probability Pa=p(D|H);
Step 4.5, define the threshold, if PaIf the value is larger than the given threshold value, adding the D into the cache, and updating the H when the size of the cache reaches the specified size, wherein the threshold value is selected according to the recall rate and the accuracy rate of the model, and the size of the cache is set to be 50;
and 4.6, repeating the steps 4.3 to 4.5.
CN202010660792.0A 2020-07-10 2020-07-10 Word embedding-based multi-stage network attack detection method Active CN112019497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010660792.0A CN112019497B (en) 2020-07-10 2020-07-10 Word embedding-based multi-stage network attack detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010660792.0A CN112019497B (en) 2020-07-10 2020-07-10 Word embedding-based multi-stage network attack detection method

Publications (2)

Publication Number Publication Date
CN112019497A true CN112019497A (en) 2020-12-01
CN112019497B CN112019497B (en) 2021-12-03

Family

ID=73498505

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010660792.0A Active CN112019497B (en) 2020-07-10 2020-07-10 Word embedding-based multi-stage network attack detection method

Country Status (1)

Country Link
CN (1) CN112019497B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995209A (en) * 2021-04-20 2021-06-18 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
CN113098735A (en) * 2021-03-31 2021-07-09 上海天旦网络科技发展有限公司 Inference-oriented application flow and index vectorization method and system
CN113179256A (en) * 2021-04-12 2021-07-27 中国电子科技集团公司第三十研究所 Time information safety fusion method and system for time synchronization system
CN113221100A (en) * 2021-02-09 2021-08-06 上海大学 Countermeasure intrusion detection method for industrial internet boundary protection
CN113591971A (en) * 2021-07-28 2021-11-02 上海数鸣人工智能科技有限公司 User individual behavior prediction method based on DPI time series word embedded vector
CN117118687A (en) * 2023-08-10 2023-11-24 国网冀北电力有限公司张家口供电公司 Multi-stage attack dynamic detection system based on unsupervised learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807987A (en) * 2017-10-31 2018-03-16 广东工业大学 A kind of string sort method, system and a kind of string sort equipment
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
CN109190372A (en) * 2018-07-09 2019-01-11 四川大学 A kind of JavaScript Malicious Code Detection model based on bytecode
CN109670307A (en) * 2018-12-04 2019-04-23 成都知道创宇信息技术有限公司 A kind of SQL injection recognition methods based on CNN and massive logs
CN109766693A (en) * 2018-12-11 2019-05-17 四川大学 A kind of cross-site scripting attack detection method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807987A (en) * 2017-10-31 2018-03-16 广东工业大学 A kind of string sort method, system and a kind of string sort equipment
CN109190372A (en) * 2018-07-09 2019-01-11 四川大学 A kind of JavaScript Malicious Code Detection model based on bytecode
CN109117482A (en) * 2018-09-17 2019-01-01 武汉大学 A kind of confrontation sample generating method towards the detection of Chinese text emotion tendency
CN109670307A (en) * 2018-12-04 2019-04-23 成都知道创宇信息技术有限公司 A kind of SQL injection recognition methods based on CNN and massive logs
CN109766693A (en) * 2018-12-11 2019-05-17 四川大学 A kind of cross-site scripting attack detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈旖等: "基于一维卷积神经网络的HTTP慢速DoS攻击检测方法", 《计算机应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221100A (en) * 2021-02-09 2021-08-06 上海大学 Countermeasure intrusion detection method for industrial internet boundary protection
CN113098735A (en) * 2021-03-31 2021-07-09 上海天旦网络科技发展有限公司 Inference-oriented application flow and index vectorization method and system
CN113179256A (en) * 2021-04-12 2021-07-27 中国电子科技集团公司第三十研究所 Time information safety fusion method and system for time synchronization system
CN113179256B (en) * 2021-04-12 2022-02-08 中国电子科技集团公司第三十研究所 Time information safety fusion method and system for time synchronization system
CN112995209A (en) * 2021-04-20 2021-06-18 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
CN112995209B (en) * 2021-04-20 2021-08-17 北京智源人工智能研究院 Flow monitoring method, device, equipment and medium
CN113591971A (en) * 2021-07-28 2021-11-02 上海数鸣人工智能科技有限公司 User individual behavior prediction method based on DPI time series word embedded vector
CN113591971B (en) * 2021-07-28 2024-05-07 上海数鸣人工智能科技有限公司 User individual behavior prediction method based on DPI time sequence word embedded vector
CN117118687A (en) * 2023-08-10 2023-11-24 国网冀北电力有限公司张家口供电公司 Multi-stage attack dynamic detection system based on unsupervised learning

Also Published As

Publication number Publication date
CN112019497B (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN112019497B (en) Word embedding-based multi-stage network attack detection method
Yuan et al. DeepDefense: identifying DDoS attack via deep learning
CN112953924B (en) Network abnormal flow detection method, system, storage medium, terminal and application
CN106911669B (en) DDOS detection method based on deep learning
JP6835703B2 (en) Cyber attack detection system, feature selection system, cyber attack detection method, and program
Anil et al. A hybrid method based on genetic algorithm, self-organised feature map, and support vector machine for better network anomaly detection
CN110768946A (en) Industrial control network intrusion detection system and method based on bloom filter
CN112333195A (en) APT attack scene reduction detection method and system based on multi-source log correlation analysis
CN111367908A (en) Incremental intrusion detection method and system based on security assessment mechanism
GB2583892A (en) Adaptive computer security
Saurabh et al. Nfdlm: A lightweight network flow based deep learning model for ddos attack detection in iot domains
CN113821793A (en) Multi-stage attack scene construction method and system based on graph convolution neural network
Al-Shabi Design of a network intrusion detection system using complex deep neuronal networks
Brandao et al. Log Files Analysis for Network Intrusion Detection
CN117061254B (en) Abnormal flow detection method, device and computer equipment
Alyasiri et al. Grammatical evolution for detecting cyberattacks in Internet of Things environments
Sun et al. Deep learning-based anomaly detection in LAN from raw network traffic measurement
CN114444075B (en) Method for generating evasion flow data
Alrawashdeh et al. Optimizing Deep Learning Based Intrusion Detection Systems Defense Against White-Box and Backdoor Adversarial Attacks Through a Genetic Algorithm
Leevy et al. IoT attack prediction using big Bot-IoT data
Dharaneish et al. Comparative Analysis of Deep Learning and Machine Learning models for Network Intrusion Detection
Avram et al. Tiny network intrusion detection system with high performance
CN117574135B (en) Power grid attack event detection method, device, equipment and storage medium
CN116170237B (en) Intrusion detection method fusing GNN and ACGAN
CN114726599B (en) Artificial intelligence algorithm-based intrusion detection method and device in software defined network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant