CN116527357A - Web attack detection method based on gate control converter - Google Patents
Web attack detection method based on gate control converter Download PDFInfo
- Publication number
- CN116527357A CN116527357A CN202310460958.8A CN202310460958A CN116527357A CN 116527357 A CN116527357 A CN 116527357A CN 202310460958 A CN202310460958 A CN 202310460958A CN 116527357 A CN116527357 A CN 116527357A
- Authority
- CN
- China
- Prior art keywords
- word
- embedding
- module
- information
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 28
- 239000013598 vector Substances 0.000 claims abstract description 85
- 238000000034 method Methods 0.000 claims abstract description 70
- 230000007246 mechanism Effects 0.000 claims abstract description 38
- 239000000284 extract Substances 0.000 claims abstract description 18
- 238000012216 screening Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 31
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 20
- 238000009826 distribution Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 8
- 235000014510 cooky Nutrition 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000009827 uniform distribution Methods 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000012423 maintenance Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 18
- 238000002347 injection Methods 0.000 description 8
- 239000007924 injection Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241001397173 Kali <angiosperm> Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002354 daily effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a Web attack detection method based on a gate control converter, and relates to the technical field of network maintenance. The network model based on the gating transformers is provided by the method, the transformers are combined with the gating convolution module, global semantic information of different space dimensions is extracted by the transformers through a multi-head self-attention mechanism, local space information is extracted by the gating convolution through a one-dimensional convolution kernel, and text information is screened and filtered by the gating mechanism. The invention can effectively extract the multi-dimensional global features and the local features, and the mixed word vector table can contain more accurate and rich semantic information; the method can automatically extract the information characteristics of the effective data in the text sequence, and does not need to manually carry out information screening and vocabulary replacement; the accuracy of multi-class attack detection of the model is further improved, the false alarm rate is reduced, and the safety of the Web server system can be fully protected.
Description
Technical Field
The invention relates to the technical field of network maintenance, in particular to a Web attack detection method based on a gate control Transformer.
Background
With the rapid development of science and technology, the optimal channel for acquiring information from devices such as computers and mobile phone terminals is certainly Web application, and a browser is the most commonly used application of people in various industries almost every day. Various safety problems are caused while convenience is brought. The attack method for the Web application is continuously updated, once the attack is successful, the daily application of the user can be directly threatened, even the privacy security problem of information disclosure can be caused, if the information is not protected, the serious loss can be caused, and common Injection type attacks comprise Sql Injection, XSS attack, command Injection attack and the like.
With the continuous development of machine learning and big data analysis technologies, deep learning technologies have been gradually applied to the field of attack detection, but some existing methods have many disadvantages, such as that some deep learning models, such as CNN, have strong capability in local sequence feature extraction, but have poor capability in global information sensing, text sequence information modeling and other aspects, models, such as LSTM, RNN and the like, have poor performance in long-distance global dependency features, and text of HTTP messages is lengthy and complex. In the field of attack detection, the model needs to output results in a short time as much as possible, and the transducer model breaks through the limitation that models such as LSTM, RNN and the like cannot be calculated in parallel, so that the results can be obtained in a shorter time. For relatively long HTTP messages, which contain many information such as symbols and number that do not contain information, how to automatically screen and extract key effective information from complex information and ignore irrelevant information is also a key problem for further improving the accuracy of attack detection.
The Chinese patent CN113691542A provides a Web attack detection method based on HTTP request text and related equipment, and the method comprises the steps of replacing expert dictionary and special dictionary table to reduce dictionary space, and finally classifying multi-category attack detection. The multi-head attention mechanism BiLSTM model in the patent is poor in the aspect of long-distance global dependence characteristics and cannot break through the limitation of parallel operation. The text of the HTTP message is lengthy and complex, and contains many invalid information, such as symbols, number numbers and the like, and methods of filtering and screening the invalid information by various word list replacement, regular matching and the like depend on expert rule bases and replacement word lists, so that extraction, screening and filtering of effective information can not be truly performed from semantic level.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a Web attack detection method based on a gate control converter aiming at the defects of the prior art, so that the safety of a Web server system is effectively protected.
In order to solve the technical problems, the invention adopts the following technical scheme:
a Web attack detection method based on a gate control converter comprises the following steps:
step 1: flow collection is carried out through an sniff module of a python scapy library, a pcap flow file is collected, and application layer data is extracted from the pcap flow file;
step 2: and (3) URL decoding is carried out on the text of the message, and the text information of the URL, the parameter list and the user-agent, cookie, referer field is segmented through predefined special characters.
Step 3: the mixed word embedding module enhances the robustness of vector representation by fusing word embedding tables generated in two different modes; the two Word Embedding tables of the mixed Word Embedding module are a Word2Vec Word Embedding table based on a continuous Word bag model Cbow and a Word Embedding table based on an Embedding layer respectively; the word Embedding table based on the Embedding layer is initialized through Xavier_uniform distribution, and the distributed representation of the word vector is continuously updated in the training process; word2Vec Word embedding tables based on continuous Word bag models Cbow need to be generated before the models enter a training stage; word Embedding tables based on an Embedding layer and Word2Vec Word Embedding tables based on a continuous Word bag model Cbow map Word vectors to different discrete spaces respectively, so that distributed vector representation of words is performed;
step 4: the text information of the HTTP message is processed by the mixed word embedding module and then is converted into a series of word vectors; inputting a series of word vectors into a Transformer Encoder model for global attention feature extraction; the Transformer Encoder model includes a 3-part structure: the device comprises a position coding module, a multi-head self-attention module and a residual error layer normalization module; the Transformer Encoder model firstly adds position coding information into the vector processed by the mixed word embedding module, extracts multidimensional sequence characteristics through the multi-head self-attention mechanism module and finally inputs the multidimensional sequence characteristics into the feedforward neural network module;
step 5: inputting the output of the Transformer Encoder model into a gating convolution model, and extracting local characteristics of data in a local receptive field range by the gating convolution model; dynamically screening and filtering non-critical data through a gating convolution model;
step 6: classifying the output result of the gating convolution model through a final classifier module; the softmax function is converted into probability distribution, each dimension in the probability distribution corresponds to an attack category, and the attack category corresponding to the index of the probability maximum value in the probability distribution is the classification result of final attack detection.
Further, the specific method of the step 1 is as follows:
step 1.1: starting an sniff network card sniffing module of a python scapy library, collecting flow from a network and storing the flow as a pcap file;
step 1.2: reading and analyzing the collected pcap file through a rd_pcap module of a python scapy library, extracting application layer data from the pcap file and performing data text analysis, wherein the text information comprises URL (uniform resource locator), parameter list and user-agent, cookie, referer fields;
further, in the word segmentation process in the step 2, since the number of words of the input sequence that can be processed by the Transformer Encoder model is limited, the upper limit of the number of words after text word segmentation is caused, the problem is solved by setting the maximum length of the sentence, the part of words exceeding the maximum length of the sentence can be removed, and if the number of the words of the sentence is less than the maximum length, the filling is performed by padding.
Further, the specific method of the step 3 is as follows:
step 3.1: one-hot encoding X ε R of an input HTTP text word V ;
Step 3.2: the single thermal code X of each word is respectively combined with the input weight matrix W E R V×N Multiplying, sharing input weight matrix W for all input words, and adding and averaging the obtained vectors to obtain hidden layer vector H E R N ;
Step 3.3: the hidden layer vector is multiplied by the output weight matrix W' E R V×N Obtaining an output vector, converting the output vector into probability distribution through a softmax activation function, wherein the index position of the maximum probability is the predicted central word; in the training stage, a cross entropy loss function is adopted to carry out model training and the Word2Vec model is iteratively updated;
step 3.4: multiplying each input word with a shared input weight matrix W to obtain word embedding vectors of the words, and adopting the matrix W as a word embedding table T of the mixed word embedding module a ;
Step 3.5: initializing an Embedding layer-based word Embedding table T with Xavier_uniform uniform distribution b The method comprises the steps of carrying out a first treatment on the surface of the Word embedding table T a Training is completed before the gated transducer model is trained; and T is b Simultaneously carrying out iterative updating along with the training process of the gate control transducer model; final mixed word embedding table T f Embedding a table T from two words a And T b The result of the average pooling is generated as shown in formula (1);
T f =(T a +T b )/2 (1)。
further, the specific method in the step 4 is as follows:
step 4.1: the text of the HTTP data is processed by the mixed word embedding module in the step 3 before being input into the Transformer Encoder model, and the text words are converted into the distributed numerical vector expression X embedding ;
Step 4.2: since the word sequence position information of the word is not considered when a series of word vectors are input, position coding information is periodically added to the text word by adopting a sine and cosine function; fusing position coding information generated for each word at each position into the original text word, and fusing word vector X after position coding information embedding-pe The generation method of the (B) is shown as a formula (2) and a formula (3);
X embedding-pe =X embedding +X pos (2)
wherein pos represents the word sequence position of a word in the text, and the value range of pos is an integer between 0 and the maximum length of the sequence; to be able to add position-coding information, a position-coding vector X is generated for a word pos The dimension of the word is the same as the dimension of the word vector, and the word vector X processed by the mixed word embedding module embedding Dimension and position encoding vector X pos Are all d in dimension emb Wherein 2i+1 and 2i respectively represent the word vector X embedding And a position-coding vector X pos Odd and even positions in (a)The value range of i isd emb Representative word vector X embedding Is a dimension of (2);
step 4.3: the multi-head self-attention module extracts global sequence features from a plurality of dimensions of the text, the dimension of an output result is the same as the dimension of input data, and each word in the text is fused with the global features; to word vector X embedding-pe By three different linear mapping matrices W Q 、W K 、W V Generating three key information, including information (Q) to be queried, a word key (K) and a word value (V), as shown in a formula (4);
when global attention feature extraction is carried out, attention score calculation is carried out only through the problem information Q to be extracted and K corresponding to each word in a sentence, the attention score value calculation essentially comprises the steps of calculating the correlation coefficient between the words, and then carrying out weighted summation on the V by taking the attention score between the words as a weight, wherein the process is the principle of a self-attention mechanism; the scaling dot product adopted by the attention score calculation method is shown in a formula (5);
wherein the denominator isIn order to prevent the dot product from being too large in value, which in turn results in too extreme values after passing the softmax function, the subscript k represents the dimension of the Q, K, V matrix;
the multi-head self-attention mechanism is for word vector X embedding-pe Performing calculation of self-attention mechanism from different subspaces of multiple dimensions; when it is desired to select from h different dimensionsWhen subspace performs self-attention mechanism calculation, the linear mapping matrix is split into h blocks, and the split h blocks of linear mapping matrix respectively correspond to the calculation of h different subspace self-attention mechanismsWhere s represents the attention of a subspace, s.epsilon.1, h];
The output of the multi-head self-attention module is to extract the HTTP text words from the h different dimension subspaces by a global attention mechanism and output the self-attention of the h different heads to head s Feature stitching is carried out, and a multi-head self-attention mechanism outputs X multihead The calculation method of (2) is shown in a formula (6), wherein s represents the attention mechanism of a certain subspace;
step 4.4: residual connection and layer normalization module;
residual connection embeds a position-coded word before entering a multi-headed self-attention module into vector X embedding-pe Adding the output result of the multi-head self-attention module; carrying out standardization processing on the output data of the multi-head self-attention module by adopting a layer normalization (LayerNorm) method; applying residual connection and layer normalization modules to multi-head attention mechanism output X multihead The calculation formula of (1) is shown as (7), and the output X of the multi-head self-attention module multihead Results X after processing through residual error connection and layer normalization multihead-rn Namely, the residual connection and the final output result of the layer normalization module;
X multihead-rn =LayerNorm(X embedding-pe +X multihead ) (7)
step 4.5: output result X of residual error connection and layer normalization module through fully-connected neural network multihead-rn Further processing, extracting richer semantic information; finally, the output of the Transformer Encoder model is X encoded Neural network computational formula (8)Shown;
wherein Relu is a nonlinear activation function.
Further, the specific method in the step 5 is as follows:
step 5.1: output X after global sequence feature extraction of Transformer Encoder model encoded The information is input into a gating convolution module for information filtering and screening, and the gating convolution module comprises c one-dimensional convolution kernels Kernel with different scales j (j∈[1,c]) The calculation formula of the single convolution kernel is shown as formula (9);
g j =Relu(Conv(Kernel j∈c ,X encoded )+b j ) (9)
wherein g j The output of a single convolution block; relu is a nonlinear activation function; conv represents the convolution operation process; b j Kernel is a convolution Kernel j Corresponding offset;
step 5.2: characteristic splicing is carried out on output results of different scale convolutions, and a multiscale gating convolution value is mapped to a range of 0-1 through a Sigmoid activation function, namely a gating value Gatesv, wherein the gating value Gatesv is calculated according to a formula (10), and the value range of Gatesv is between 0 and 1; a gating value close to 0 represents information which is hardly important and is ignored by filtering; a gating value of approximately 1 represents that the part of data is key information and is reserved completely;
output X of Transformer Encoder model encoded Multiplication of element level is carried out on the gating value Gatesv, and the coding information X is completed encoded Filtering and screening to obtain the output result X of the gating convolution module gated The information filtering method is shown in a formula (11);
X gated =X encoded ⊙Gatesv,Gatesv∈[0,1] (11)
wherein, the symbol as follows represents element level multiplication;
further, the specific method in the step 6 is as follows:
output result X of gating convolution module gated Inputting the data into a Classifier formed by two layers of fully-connected networks, converting the output result of the Classifier network into probability distribution through a softmax function, wherein each dimension in the probability distribution corresponds to an attack category, and the attack category corresponding to the index of the maximum probability value in the probability distribution is the classification result X of final attack detection pred The attack detection classification process is shown in formula (12);
X pred =argmax(Softmax(Classifier(X gated ))) (12)。
the beneficial effects of adopting above-mentioned technical scheme to produce lie in: the invention provides a Web attack detection method based on a gate control Transformer, which provides a network model capable of effectively extracting multi-dimensional global features and local features, improves a word embedding method and enables a mixed word vector table to contain more accurate and rich semantic information; the method can automatically extract the information characteristics of the effective data in the text sequence, and does not need to manually carry out information screening and vocabulary replacement; the accuracy of multi-class attack detection of the model is further improved, the false alarm rate is reduced, and the safety of the Web server system can be fully protected.
Drawings
FIG. 1 is a diagram of an overall architecture of a network model according to an embodiment of the present invention;
FIG. 2 is a diagram of a mixed word embedding table model structure provided by an embodiment of the present invention;
FIG. 3 is a diagram of a gating convolutional network model according to an embodiment of the present invention;
FIG. 4 is a diagram of a simulation architecture of a network attack data set according to an embodiment of the present invention;
fig. 5 is a diagram of an example of network attack load provided in an embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
As shown in fig. 1, an overall network configuration diagram is presented by the method of the present embodiment, and the method of the present embodiment is as follows.
Step 1: traffic collection is performed through the sniff module of the python scapy library, and the pcap traffic file is collected and the application layer data is extracted from the pcap traffic file. The specific method comprises the following steps:
step 1.1: starting an sniff network card sniffing module of the scapy library, setting a monitoring network card iface and a processing function write_cap, collecting some flow from a network and storing the flow as a pcap file.
Step 1.2: and reading and analyzing the collected pcap file by using an rd_pcap module of a python scapy library, and extracting application layer data from the collected pcap file. The application layer data text to be analyzed includes URL, parameter list, user-agent, cookie, referer fields. Since injection attacks may also exist in the user-agent, cookie, referer field, the request parameter information for such field is likewise not negligible.
Step 2: preprocessing the text of the message, and segmenting the URL, the parameter list and the user-agent, cookie, referer field by predefined special characters.
URL decoding of the text of the message is performed by "/& =:? Special characters such as +/-/< >% () _ "word the URL, parameter list, user-agent, cookie, referer fields. The number of words in the text word segmentation process needs to be considered to be limited, because the Transformer Encoder model can process limited numbers of words in the input sequence. This problem is solved by setting the maximum length of the sentence, and parts of the words exceeding the maximum length of the sentence are removed, and if the number of sentence words is smaller than the maximum length, padding is performed by padding. The maximum length is set to 60 and the padding is 0 in this embodiment.
Step 3: the mixed word embedding module enhances the robustness of the vector representation by fusing two differently generated word embedding tables. The two Word Embedding tables of the mixed Word Embedding module are a Word2Vec Word Embedding table based on a continuous Word bag model (Cbow) and a Word Embedding table based on an Embedding layer respectively. The Word Embedding table based on the Embedding layer is initialized through Xavier_unique distribution, and the distributed representation of the Word vectors is continuously updated in the training process, and the Word Embedding table based on Word2Vec needs to be generated before the model enters the training stage. Word Embedding tables based on the Embedding layer and Word Embedding tables based on the Word2Vec map Word vectors to different discrete spaces respectively so as to perform distributed vector representation of words. The word embedding mode has lower dimensionality and contains more semantic information compared with the traditional single-hot coding mode. The mixed word embedding table is adopted to more effectively learn the semantic relation among HTTP text keywords, and the semantic relation contains richer semantic information, as shown in figure 2. The specific method comprises the following steps:
step 3.1: one-hot encoding X ε R of an input HTTP text word V ;
Step 3.2: the single thermal code X of each word is respectively combined with the input weight matrix W E R V×N Multiplying, sharing input weight matrix W for all input words, and adding and averaging the obtained vectors to obtain hidden layer vector H E R N ;
Step 3.3: the hidden layer vector is multiplied by the output weight matrix W' E R V×N And obtaining an output vector, converting the output vector into probability distribution through a softmax activation function, and obtaining the index position where the maximum probability is located, namely the predicted central word. In the training stage, a cross entropy loss function is adopted to carry out model training and the Word2Vec model is iteratively updated;
step 3.4: each input word is multiplied by a shared input weight matrix W to obtain a word embedding vector of the word, and the matrix W is used as a word embedding table T of the mixed word embedding module a . In this embodiment, the vocabulary Ta words number 2000, the word vector length 300, min_count=10, window=3. The method comprises the steps of carrying out a first treatment on the surface of the
Step 3.5: initializing an Embedding layer-based word Embedding table T with Xavier_uniform uniform distribution b And T is b Iterative updates are performed simultaneously with the training process of the gated transducer model, but the word embedding table T a Is trained before the gated transducer model is trained. Final mixed word embedding table T f Embedding a table T from two words a And T b The result of the averaging pooling is generated as shown in formula (1). In this embodiment, the word Embedding table based on the Embedding layer also maps words into 300-dimensional vector representations, resulting in word vector dimensions identical to Ta.
T f =(T a +T b )/2 (1)
Step 4: the text information of the HTTP message is processed by the mixed word embedding module and then is converted into a series of word vectors; a series of word vectors are input to the Transformer Encoder model for global attention feature extraction. The Transformer Encoder model includes a 3-part structure: the device comprises a position coding module, a multi-head self-attention module and a residual error layer normalization module. The Transformer Encoder model firstly adds position coding information into the vector processed by the mixed word embedding module, extracts multidimensional sequence characteristics through the multi-head self-attention mechanism module, and finally inputs the multidimensional sequence characteristics into the feedforward neural network module. In this embodiment, the number of layers of the Transformer Encoder model is two. The method comprises the following specific steps:
step 4.1: before text input of HTTP data into Transformer Encoder model, the text word is converted into distributed numerical vector X through mixed word embedding module embedding . X in the present embodiment embedding Is a 300-dimensional vector;
step 4.2: since the word sequence position information of the word is not considered when a series of word vectors are input, position coding information is periodically added to the text word by adopting a sine and cosine function; fusing position coding information generated for each word at each position into the original text word, and fusing word vector X after position coding information embedding-pe The generation method of (2) is shown as formula (2) and formula (3). Where pos represents the ordinal position of a word in the text and its value ranges from 0 to the integer between the maximum length of the sequence. To be able to add position-coding information, a position-coding vector X is generated for a word pos The dimension of the word is the same as the dimension of the word vector, and the word vector X processed by the mixed word embedding module embedding Dimension and position encoding X of (2) pos Are all d in dimension emb Wherein 2i+1 and 2i respectively represent the word vector X embedding And a position-coding vector X pos The odd and even positions in (a) so that the value range of i isWherein d is emb Representative word vector X embedding Is a dimension of (2);
X embedding-pe =X embedding +X pos (2)
step 4.3: the multi-head self-attention mechanism module extracts global sequence features from multiple dimensions, and the output result is the same as the dimension of input data but each word in the text is fused with the global features. Calculation method of attention mechanism refers to search query thought and aims at word vector X embedding-pe By three different linear mapping matrices W Q 、W K 、W V Three key information are generated: information to be queried (Q), keys of words (K), values of words (V), as shown in formula (4).
When global attention feature extraction is performed, attention score calculation is performed only through K corresponding to each word in a sentence by the problem information Q to be extracted, attention score value calculation is essentially to calculate a correlation coefficient between words, and then the attention score between the words is used as a weight to weight and sum V, so that the process is the principle of a self-attention mechanism. The attention score calculation method employs a scaling dot product as shown in equation (5), in which the denominatorTo prevent the dot product from having a value that is too large, and thus that after passing the softmax function, the subscript k represents the dimension of the Q, K, V matrix.
The multi-head self-attention mechanism is for word vector X embedding-pe The calculation of the self-attention mechanism is performed from different subspaces of multiple dimensions. When self-attention mechanism calculation is required to be carried out from subspaces with h different dimensions, the linear mapping matrix is split into h blocks respectively, and the split h blocks of linear mapping matrix correspond to the calculation of the self-attention mechanism of the h different subspaces respectivelyWhere s represents the attention of a subspace, s.epsilon.1, h];
The output of the multi-head self-attention module is to extract the HTTP text words from the h different dimension subspaces by a global attention mechanism and output the self-attention of the h different heads to head s Feature stitching is carried out, and a multi-head self-attention mechanism outputs X multihead The calculation method of (2) is shown in formula (6), wherein s represents the attention mechanism of a certain subspace. In the present embodiment, the multi-head attention head number h=6.
Step 4.4: a residual join and layer normalization (LayerNorm) module;
residual connection embeds a position-coded word before entering a multi-headed self-attention module into vector X emultihead-pe Output result X of multi-head self-attention module multihead Adding, residual connection can directly transfer gradient when model gradient counter-propagates, thereby avoiding too deep model hierarchyThe problem of instability of the gradient is generated. In order to accelerate the model convergence speed and prevent gradient disappearance and gradient explosion, a layer normalization (LayerNorm) method is adopted to normalize data, and a residual error connection and layer normalization module is applied to a multi-head attention mechanism output X multihead The calculation formula of (1) is shown as (7), and the output X of the multi-head self-attention module multihead Results X after processing through residual error connection and layer normalization multihead-rn I.e. the residual connection and layer normalization module finally outputs the result.
X multihead-rn =LayerNorm(X embedding-pe +X multihead ) (7)
Step 4.5: finally, outputting a result X of the multi-head self-attention module through the full-connection neural network Linear multihead Further processing, extracting more abundant semantic information. Finally, transformer Encoder has an output of X encoded The neural network calculation formula is shown in formula (8), wherein Relu is a nonlinear activation function.
Step 5: the HTTP text processed by the mixed word embedding module is subjected to global attention feature extraction through a Transformer Encoder model, then the output of the Transformer Encoder model is input into a gating convolution model again, and the gating convolution model is used for carrying out local feature extraction on data in a local receptive field range; the non-key information is dynamically screened and filtered through the gating convolution model, as shown in fig. 3, the model effectively solves the problem that all dimensions of input data are regarded as effective data through traditional convolution, and can automatically filter the data and further improve the attack detection accuracy. The method comprises the following specific steps:
step 5.1: the output of the Transformer Encoder model after global sequence feature extraction is X encoded X is again taken encoded The information is input into a gating convolution module for information filtering and screening, and the gating convolution module comprises c one-dimensional convolution kernels Kernel with different scales j (j∈[1,c]) The calculation formula of the single convolution kernel is shown as formula (9), wherein g j The output of a single convolution block; relu is a nonlinear activation function; conv represents the convolution operation process; b j Kernel is a convolution Kernel j Corresponding offset; in this embodiment, convolution kernels of three different dimensions, 10, 15, 25, are used.
g j =Relu(Conv(Kernel j∈c ,X encoded )+b j ) (9)
Step 5.2: output result g of convolving different scales i And performing feature splicing, and mapping the value of the multi-scale gating convolution to be a gating value Gatesv within a range of 0-1 through a Sigmoid activation function, wherein the gating value Gatesv is calculated according to a formula (10), and the value range of the Gatesv is between 0 and 1. A gating value close to 0 represents information which is hardly important and is ignored by filtering; a value near1 represents that the portion of data is critical information and will be fully retained. Output X of Transformer Encoder model encoded Multiplication of element level is carried out on the gating value Gatesv, and then the coding information X can be finished encoded Filtering and screening to obtain the output result X of the gating convolution module gated The information filtering method is shown in formula (11), wherein the symbol # indicates the element level multiplication.
X gated =X encoded ⊙Gatesv,Gatesv∈[0,1] (11)
Step 6: output result X of gating convolution module gated The data dimension is increased and then decreased by inputting the data dimension into a Classifier class consisting of two layers of fully connected networks Linear1 and Linear2, the meaning of the dimension increase is to combine various features, and the dimension decrease represents the process of information fusion of the combined features. Finally, converting the output of the fully connected network layer into probability distribution through a softmax function, wherein each dimension in the probability distribution corresponds to an attack category, and the index of the maximum probability in the probability distribution corresponds to the attack categoryI.e. the classification result X of the final attack detection pred The attack detection classification prediction method is shown in formula (12). In this implementation, the attack classification task is 10 classification.
X pred =argmax(Softmax(Lineart1(Lineart2(X gated )))) (12)
The network model based on the gating transformers combines the transformers with the gating convolution module, the transformers extract global semantic information of different space dimensions through a multi-head self-attention mechanism, the gating convolution extracts information of local space through a one-dimensional convolution kernel, and the gating mechanism is adopted to screen and filter text information. The model mainly has the following advantages:
(1) The following advantages are achieved by using a transducer model and improving it: the word embedding layer of the Transformer is initialized in two ways and is subjected to average pooling operation, namely word2vec and xavier_unique initialization based on cbow respectively, so that word vector training can be more fully performed and robustness is improved; the multi-head self-attention mechanism can effectively extract multi-dimensional global sequence features, has no time sequence dependence compared with other sequence models such as RNN and LSTM, can utilize GPU parallel computation to shorten training time, and does not generate too much computation complexity.
(2) The gating convolution module further extracts the local characteristics of the n-gram and performs effective information shielding and screening. Has the following advantages: the convolution gating unit processes data output by a Transformer by adopting convolution operation, a multi-head self-attention mechanism of the Transformer extracts global information features from a plurality of space dimensions, and the extraction of local features is possibly somewhat deficient, and because parameters in the URL are n-gram local features of a parameter=value type, gating convolution of shared parameters can more fully extract local feature information; the plurality of one-dimensional convolution kernels with different scales can more effectively cope with words with different lengths, and finally the output vectors of the convolution kernels with different scales are spliced, so that the situation that the words with different lengths cannot be fully matched due to the fact that the single-scale convolution kernels are not fully matched is avoided, and information loss caused by insufficient local feature extraction is avoided; for longer HTTP messages, which contain a plurality of information such as symbols and digital numbers without information, a substitution rule dictionary is not needed to reduce the word list space, and key effective information is automatically screened and extracted from complex information to ignore irrelevant information, so that the accuracy of attack detection is further improved.
This example conducted test experiments on the public data set CSIC2010 and the network traffic collected by the simulated network attack. The HTTP CSIC2010 dataset is given as an attachment in the paper by spanish's highest scientific research institute CSIC, and the collected web traffic is a record of normal access and web attacks of a certain e-commerce website, which contains 36000 normal requests and 25000 attack requests. The exception request samples comprise attack samples of SQL injection, file traversal, CRLF injection, XSS, SSI and the like. In order to improve generalization of the model and verify the effect of the model in an actual network environment, various attack type features can be learned more fully, a kali linux is adopted to carry out network attack simulation experiments, and traffic generated by attack is collected, wherein a main simulation access target is a web site, and the web site comprises a normal access request and an attack request. As shown in fig. 4, the attacker always opens the wireshark to collect and store the traffic. An attacker attacks a website built by a target host, mainly adopts tools such as sqlmap, nmap, metasploit framework and the like, and comprises 50000 normal requests and 100000 attack requests, and 10 kinds of data classification labels are respectively: normal, abnormal, sql injection, buffer overflow, formatting string, SSI, XPATH, XSS, CRLFi, LDAP injection, detailed attack payload is shown in fig. 5.
In this embodiment, three performance evaluation indexes commonly used in attack detection systems are adopted, wherein the indexes include an Accuracy rate (Accuracy), a Recall rate (Recall), and an F1 Score (F1 Score), and the calculation formulas are shown in formulas (11) - (13).
On the network flow collected by the data set CSIC2010 and the simulated network attack, comparison experiments are respectively carried out with a plurality of baseline models such as CNN, LSTM, biLSTM and the like, three indexes of accuracy, recall rate and F1 score of a plurality of models are evaluated in the experiments, the ten-classification experimental results of attack detection are shown in the table 1 and the table 2, and experimental data are respectively the flow generated by the data set for the CSIC2010 and the simulated attack.
Table 1 results of test for attack based on CSIC2010 dataset (Classification 10)
Table 2 results of the test for testing of a simulation dataset attack based on a network attack (10 categories)
Model | Accuracy | F1 | Rec |
DT | 87.87% | 89.35% | 94.54% |
Linear SVM | 87.23% | 88.51% | 88.15% |
BiLSTM+CNN | 94.54% | 94.12% | 94.98% |
BiLSTM | 93.15% | 91.34% | 93.46% |
CNN | 92.87% | 93.61% | 94.31% |
LSTM+CNN | 93.43% | 93.51% | 93.63% |
LSTM | 91.71% | 92.8% | 92.96% |
LSTM+GatedCNN | 93.15% | 92.31% | 93.54% |
Transformer | 94.43% | 94.32% | 94.45% |
Gated transducer | 96.64% | 96.51% | 97.54% |
The method of the invention is superior to the comparison test model in all three indexes, which proves that the invention has great improvement in the aspect of attack detection effect, and the comparison test shows that the improved gate control Transformer network model can effectively extract global features and local features, can automatically extract effective features in text sequences, omits the step of replacing manual HTTP keyword list, and can effectively protect the safety of a Web server system.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.
Claims (7)
1. A Web attack detection method based on a gate control transducer is characterized by comprising the following steps: the method comprises the following steps:
step 1: flow collection is carried out through an sniff module of a python scapy library, a pcap flow file is collected, and application layer data is extracted from the pcap flow file;
step 2: URL decoding is carried out on the message text, and text information of the URL, parameter list and user-agent, cookie, referer fields is segmented through predefined special characters;
step 3: the mixed word embedding module enhances the robustness of vector representation by fusing word embedding tables generated in two different modes; the two Word Embedding tables of the mixed Word Embedding module are a Word2Vec Word Embedding table based on a continuous Word bag model Cbow and a Word Embedding table based on an Embedding layer respectively; the word Embedding table based on the Embedding layer is initialized through Xavier_uniform distribution, and the distributed representation of the word vector is continuously updated in the training process; word2Vec Word embedding tables based on continuous Word bag models Cbow need to be generated before the models enter a training stage; word Embedding tables based on an Embedding layer and Word2Vec Word Embedding tables based on a continuous Word bag model Cbow map Word vectors to different discrete spaces respectively, so that distributed vector representation of words is performed;
step 4: the text information of the HTTP message is processed by the mixed word embedding module and then is converted into a series of word vectors; inputting a series of word vectors into a Transformer Encoder model for global attention feature extraction; the Transformer Encoder model includes a 3-part structure: the device comprises a position coding module, a multi-head self-attention module and a residual error layer normalization module; the Transformer Encoder model firstly adds position coding information into the vector processed by the mixed word embedding module, extracts multidimensional sequence characteristics through the multi-head self-attention mechanism module and finally inputs the multidimensional sequence characteristics into the feedforward neural network module;
step 5: inputting the output of the Transformer Encoder model into a gating convolution model, and extracting local characteristics of data in a local receptive field range by the gating convolution model; dynamically screening and filtering non-critical data through a gating convolution model;
step 6: classifying the output result of the gating convolution model through a final classifier module; the softmax function is converted into probability distribution, each dimension in the probability distribution corresponds to an attack category, and the attack category corresponding to the index of the probability maximum value in the probability distribution is the classification result of final attack detection.
2. The method for detecting Web attack based on gated fransformer according to claim 1, wherein the method comprises the steps of: the specific method of the step 1 is as follows:
step 1.1: starting an sniff network card sniffing module of a python scapy library, collecting flow from a network and storing the flow as a pcap file;
step 1.2: and reading and analyzing the collected pcap file through a rd_pcap module of the python scapy library, extracting application layer data from the pcap file and carrying out data text analysis, wherein the text information comprises URL (uniform resource locator), parameter list and user-agent, cookie, referer fields.
3. The method for detecting Web attack based on gated fransformer according to claim 1, wherein the method comprises the steps of: in the word segmentation process in the step 2, the number of words of the input sequence which can be processed by the Transformer Encoder model is limited, so that the upper limit of the number of words after text word segmentation is caused, the problem is solved by setting the maximum length of a sentence, partial words exceeding the maximum length of the sentence are removed, and if the number of the words of the sentence is smaller than the maximum length, padding is performed by padding.
4. The method for detecting Web attack based on gated fransformer according to claim 1, wherein the method comprises the steps of: the specific method of the step 3 is as follows:
step 3.1: one-hot encoding X ε R of an input HTTP text word V ;
Step 3.2: the single thermal code X of each word is respectively combined with the input weight matrix W E R V×N Multiplying, sharing input weight matrix W for all input words, and adding and averaging the obtained vectors to obtain hidden layer vector H E R N ;
Step 3.3: the hidden layer vector is multiplied by the output weight matrix W' E R V×N Obtaining an output vector, converting the output vector into probability distribution through a softmax activation function, and obtaining probabilityThe index position of the maximum value is the predicted central word; in the training stage, a cross entropy loss function is adopted to carry out model training and the Word2Vec model is iteratively updated;
step 3.4: multiplying each input word with a shared input weight matrix W to obtain word embedding vectors of the words, and adopting the matrix W as a word embedding table T of the mixed word embedding module a ;
Step 3.5: initializing an Embedding layer-based word Embedding table T with Xavier_uniform uniform distribution b The method comprises the steps of carrying out a first treatment on the surface of the Word embedding table T a Training is completed before the gated transducer model is trained; and T is b Simultaneously carrying out iterative updating along with the training process of the gate control transducer model; final mixed word embedding table T f Embedding a table T from two words a And T b The result of the average pooling is generated as shown in formula (1);
T f =(T a +T b )/2 (1)。
5. the method for detecting Web attack based on gated fransformer according to claim 1, wherein the method comprises the steps of: the specific method of the step 4 is as follows:
step 4.1: the text of the HTTP data is processed by the mixed word embedding module in the step 3 before being input into the Transformer Encoder model, and the text words are converted into the distributed numerical vector expression X embedding ;
Step 4.2: since the word sequence position information of the word is not considered when a series of word vectors are input, position coding information is periodically added to the text word by adopting a sine and cosine function; fusing position coding information generated for each word at each position into the original text word, and fusing word vector X after position coding information embedding-pe The generation method of the (B) is shown as a formula (2) and a formula (3);
X embedding-pe =X embedding +X pos (2)
wherein pos represents the word sequence position of a word in the text, and the value range of pos is an integer between 0 and the maximum length of the sequence; to be able to add position-coding information, a position-coding vector X is generated for a word pos The dimension of the word is the same as the dimension of the word vector, and the word vector X processed by the mixed word embedding module embedding Dimension and position encoding vector X pos Are all d in dimension emb Wherein 2i+1 and 2i respectively represent the word vector X embedding And a position-coding vector X pos The value range of i is thatd emb Representative word vector X embedding Is a dimension of (2);
step 4.3: the multi-head self-attention module extracts global sequence features from a plurality of dimensions of the text, the dimension of an output result is the same as the dimension of input data, and each word in the text is fused with the global features; to word vector X embedding-pe By three different linear mapping matrices W Q 、W K 、W V Generating three key information, including information (Q) to be queried, a word key (K) and a word value (V), as shown in a formula (4);
when global attention feature extraction is carried out, attention score calculation is carried out only through the problem information Q to be extracted and K corresponding to each word in a sentence, the attention score value calculation essentially comprises the steps of calculating the correlation coefficient between the words, and then carrying out weighted summation on the V by taking the attention score between the words as a weight, wherein the process is the principle of a self-attention mechanism; the scaling dot product adopted by the attention score calculation method is shown in a formula (5);
wherein the denominator isIn order to prevent the dot product from being too large in value, which in turn results in too extreme values after passing the softmax function, the subscript k represents the dimension of the Q, K, V matrix;
the multi-head self-attention mechanism is for word vector X embedding-pe Performing calculation of self-attention mechanism from different subspaces of multiple dimensions; when self-attention mechanism calculation is required to be carried out from subspaces with h different dimensions, the linear mapping matrix is split into h blocks respectively, and the split h blocks of linear mapping matrix correspond to the calculation of the self-attention mechanism of the h different subspaces respectivelyWhere s represents the attention of a subspace, s.epsilon.1, h];
The output of the multi-head self-attention module is to extract the HTTP text words from the h different dimension subspaces by a global attention mechanism and output the self-attention of the h different heads to head s Feature stitching is carried out, and a multi-head self-attention mechanism outputs X multihead The calculation method of (2) is shown in a formula (6), wherein s represents the attention mechanism of a certain subspace;
step 4.4: residual connection and layer normalization module;
residual connection embeds a position-coded word before entering a multi-headed self-attention module into vector X embedding-pe Adding the output result of the multi-head self-attention module; carrying out standardization processing on the output data of the multi-head self-attention module by adopting a layer normalization (LayerNorm) method; connecting and layer normalization of residual errorsApplication of a chemosynthesis module to a multi-head attention mechanism output X multihead The calculation formula of (1) is shown as (7), and the output X of the multi-head self-attention module multihead Results X after processing through residual error connection and layer normalization multihead-rn Namely, the residual connection and the final output result of the layer normalization module;
X multihead-rn =LayerNorm(X embedding-pe +X multihead ) (7)
step 4.5: output result X of residual error connection and layer normalization module through fully-connected neural network multihead-rn Further processing, extracting richer semantic information; finally, the output of the Transformer Encoder model is X encoded The calculation formula of the neural network is shown in formula (8);
wherein Relu is a nonlinear activation function.
6. The method for detecting Web attack based on gated fransformer according to claim 5, wherein the method comprises the steps of: the specific method in the step 5 is as follows:
step 5.1: output X after global sequence feature extraction of Transformer Encoder model encoded The information is input into a gating convolution module for information filtering and screening, and the gating convolution module comprises c one-dimensional convolution kernels Kernel with different scales j (j∈[1,c]) The calculation formula of the single convolution kernel is shown as formula (9);
g j =Relu(Conv(Kernel j∈c ,X encoded) +b j ) (9)
wherein g j The output of a single convolution block; relu is a nonlinear activation function; conv represents the convolution operation process; b j Kernel is a convolution Kernel j Corresponding offset;
step 5.2: characteristic splicing is carried out on output results of different scale convolutions, and a multiscale gating convolution value is mapped to a range of 0-1 through a Sigmoid activation function, namely a gating value Gatesv, wherein the gating value Gatesv is calculated according to a formula (10), and the value range of Gatesv is between 0 and 1; a gating value close to 0 represents information which is hardly important and is ignored by filtering; a gating value of approximately 1 represents that the part of data is key information and is reserved completely;
output X of Transformer Encoder model encoded Multiplication of element level is carried out on the gating value Gatesv, and the coding information X is completed encoded Filtering and screening to obtain the output result X of the gating convolution module gated The information filtering method is shown in a formula (11);
X gated =X encoded ⊙Gatesv,Gatesv∈[0,1] (11)
wherein, the symbol as follows represents the element level multiplication.
7. The method for detecting Web attack based on gated fransformer according to claim 6, wherein the method comprises the steps of: the specific method of the step 6 is as follows:
output result X of gating convolution module gated Inputting the data into a Classifier formed by two layers of fully-connected networks, converting the output result of the Classifier network into probability distribution through a softmax function, wherein each dimension in the probability distribution corresponds to an attack category, and the attack category corresponding to the index of the maximum probability value in the probability distribution is the classification result X of final attack detection pred The attack detection classification process is shown in formula (12);
X pred =argmax(Softmax(Classifier(X gated ))) (12)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310460958.8A CN116527357A (en) | 2023-04-26 | 2023-04-26 | Web attack detection method based on gate control converter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310460958.8A CN116527357A (en) | 2023-04-26 | 2023-04-26 | Web attack detection method based on gate control converter |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116527357A true CN116527357A (en) | 2023-08-01 |
Family
ID=87389673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310460958.8A Pending CN116527357A (en) | 2023-04-26 | 2023-04-26 | Web attack detection method based on gate control converter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116527357A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116913459A (en) * | 2023-09-12 | 2023-10-20 | 神州医疗科技股份有限公司 | Medicine recommendation method and system based on deep convolution network control gate model |
CN116992888A (en) * | 2023-09-25 | 2023-11-03 | 天津华来科技股份有限公司 | Data analysis method and system based on natural semantics |
CN117236323A (en) * | 2023-10-09 | 2023-12-15 | 青岛中企英才集团商业管理有限公司 | Information processing method and system based on big data |
CN118101349A (en) * | 2024-04-26 | 2024-05-28 | 西安交通大学城市学院 | Network security visual monitoring method based on artificial intelligence |
-
2023
- 2023-04-26 CN CN202310460958.8A patent/CN116527357A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116913459A (en) * | 2023-09-12 | 2023-10-20 | 神州医疗科技股份有限公司 | Medicine recommendation method and system based on deep convolution network control gate model |
CN116913459B (en) * | 2023-09-12 | 2023-12-15 | 神州医疗科技股份有限公司 | Medicine recommendation method and system based on deep convolution network control gate model |
CN116992888A (en) * | 2023-09-25 | 2023-11-03 | 天津华来科技股份有限公司 | Data analysis method and system based on natural semantics |
CN117236323A (en) * | 2023-10-09 | 2023-12-15 | 青岛中企英才集团商业管理有限公司 | Information processing method and system based on big data |
CN117236323B (en) * | 2023-10-09 | 2024-03-29 | 京闽数科(北京)有限公司 | Information processing method and system based on big data |
CN118101349A (en) * | 2024-04-26 | 2024-05-28 | 西安交通大学城市学院 | Network security visual monitoring method based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543084B (en) | Method for establishing detection model of hidden sensitive text facing network social media | |
CN116527357A (en) | Web attack detection method based on gate control converter | |
CN110162785B (en) | Data processing method and pronoun digestion neural network training method | |
CN113596007B (en) | Vulnerability attack detection method and device based on deep learning | |
CN111371806A (en) | Web attack detection method and device | |
CN113569001A (en) | Text processing method and device, computer equipment and computer readable storage medium | |
CN111984791A (en) | Long text classification method based on attention mechanism | |
Chen et al. | Malicious URL detection based on improved multilayer recurrent convolutional neural network model | |
CN116722992A (en) | Fraud website identification method and device based on multi-mode fusion | |
CN112182275A (en) | Trademark approximate retrieval system and method based on multi-dimensional feature fusion | |
CN114169447B (en) | Event detection method based on self-attention convolution bidirectional gating cyclic unit network | |
CN114881172A (en) | Software vulnerability automatic classification method based on weighted word vector and neural network | |
CN117729003A (en) | Threat information credibility analysis system and method based on machine learning | |
CN115481313A (en) | News recommendation method based on text semantic mining | |
CN115795037B (en) | Multi-label text classification method based on label perception | |
CN111782811A (en) | E-government affair sensitive text detection method based on convolutional neural network and support vector machine | |
CN116628594A (en) | User information management system and method based on AI intelligence | |
Müller-Budack et al. | Finding person relations in image data of news collections in the internet archive | |
CN114416925B (en) | Sensitive word recognition method, device, equipment, storage medium and program product | |
Yang et al. | Asymmetric deep semantic quantization for image retrieval | |
CN111984800B (en) | Hash cross-modal information retrieval method based on dictionary pair learning | |
CN117077680A (en) | Question and answer intention recognition method and device | |
CN114595324A (en) | Method, device, terminal and non-transitory storage medium for power grid service data domain division | |
CN114048749A (en) | Chinese named entity recognition method suitable for multiple fields | |
CN113822018A (en) | Entity relation joint extraction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |