CN109981631A - A kind of XSS attack detection method based on deep learning - Google Patents
A kind of XSS attack detection method based on deep learning Download PDFInfo
- Publication number
- CN109981631A CN109981631A CN201910210329.3A CN201910210329A CN109981631A CN 109981631 A CN109981631 A CN 109981631A CN 201910210329 A CN201910210329 A CN 201910210329A CN 109981631 A CN109981631 A CN 109981631A
- Authority
- CN
- China
- Prior art keywords
- xss
- deep learning
- data
- word
- sample data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The XSS attack detection method based on deep learning that the present invention provides a kind of, this method key step include: S1. building network simulating environment, acquire the sample data of XSS attack;S2. data prediction is carried out to collected sample data, is segmented later, embedded term vector is formed based on word segmentation result;S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;S4. data on flows is measured in real time, and will test in result deposit database.
Description
Technical field
The present invention relates to field of information security technology more particularly to a kind of XSS attack detection methods based on deep learning.
Background technique
Deep learning achieves great progress in fields such as computer vision, natural language processing, artificial intelligence, is pacifying
Full field also starts to show up prominently to have moved towards practical application.
In internet+epoch, the importance of Web safety is more highlighted, wherein XSS (cross site scripting) attack is a kind of allusion quotation
How the attack pattern of type effectively detects XSS attack, becomes a problem in field.
Summary of the invention
It is a primary object of the present invention to propose a kind of XSS attack detection method based on deep learning, it is intended to solve such as
How about what automatically and efficiently detects XSS attack event.
To achieve the above object, a kind of XSS attack detection method based on deep learning provided by the invention, this method master
The step is wanted to include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
S2. data prediction is carried out to collected sample data, is segmented later, insertion is formed based on word segmentation result
Formula term vector;
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
S4. data on flows is measured in real time, and will test in result deposit database.
Preferably, in step S1 further include: concentrate the sample crawled from internet public data, eliminate superfluous in url
Remaining information retains the part payload.
Preferably, in step S2, the sample data, to number and hyperlink generalized, number is replaced in pretreatment
For " 0 ", hyperlink replaces with " http://u ", to ur] encode after be stored in csv file.
Preferably, in the step S2, XSS semantic model, word space are established using the word2vec class of gensim module
Dimension takes 128 dimensions.
Preferably, the deep learning model in step S3 is using MLP, Recognition with Recurrent Neural Network, three kinds of convolutional neural networks calculations
Any one in method.
XSS attack detection method proposed by the present invention based on deep learning is located in advance by carrying out data to XSS flow
Reason, is segmented later, forms embedded term vector based on word segmentation result, and the embedded term vector is then inputted depth
Model is practised, XSS detection model is obtained, data on flows is measured in real time later, and will test in result deposit database, from
And XSS attack behavior can be effectively detected, improve the internet security of user.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
XSS attack detection method provided by the invention based on deep learning, this method key step include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
Since the public data collection of security fields is very rare, experimental data provided herein includes two portions
Point: first is that using the sample crawled from well-known open source as positive sample, sample total 50,000 is a plurality of;In addition 300,000 are collected just
Normal http get request record is used as negative sample, eliminates the information such as host, path in url, only remains payload's
Part.It is stored in csv after above data url coding, since part initial data carried out url coding, so will url twice
It could be used after decoding.
S2. data prediction is carried out to collected sample data, is segmented later, insertion is formed based on word segmentation result
Formula term vector;
The participle principle supported in the application are as follows: content ' xss ' that single double quotation marks includes, http/https link, ◇ mark
Sign<script>, ◇ beginning<h1, parameter name topic=, function body alert (, alphanumeric form word.The sample number
According in pretreatment, to number and hyperlink generalized, number is replaced with " 0 ", hyperlink replaces with " http://u ", to url
It is stored in after being encoded in csv file.XSS semantic model, word space dimension are established using the word2vec class of gensim module
Degree takes 128 dimensions.
The problem of how converting machine learning for the text after participle, the first step are to find a kind of method these words
Mathematicization.The most common method is one-hot coding (one-hot), and this method is vocabulary to be expressed as a very long vector,
The value of only one dimension is 1, other are all 0, such as " " " "<script>" it is expressed as [0,0,0,1,0,0,0,0.......].
This method there is a problem of one it is important, it is mutually indepedent between word and word that it is extremely sparse for, which constituting the vector of text,
, machine learning can not understand the semanteme of word.Embedded term vector is exactly the language for characterizing word come word vector by learning text
Adopted information, by approaching distance of the semantic similar word in space word embedded space.Space vector can express such as
Synonym as " microphone " and " Mike ", " cat ", " dog ", " words such as fish " can also be brought together in space.
Herein we will use embedded one XSS of term vector model foundation semantic model, allow machine it will be appreciated that <
Script >, html language as alert ().3000 words that frequency of occurrence is most in positive sample are taken, vocabulary is constituted,
His word is labeled as " UKN ", is modeled using the word2vec class of gensim module, word Spatial Dimension takes 128 dimensions.
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
In this application, deep learning model is using in three kinds of MLP, Recognition with Recurrent Neural Network, convolutional neural networks algorithms
Any one.
In the application, by having carried out across comparison to three kinds of algorithms to experiment.Multi-layer perception (MLP) (MLP) includes one defeated
Enter layer, output layer and several hidden layers.Keras can be used Tensorflow as rear end and easily realize multi-layer perception (MLP), most
The accuracy rate of entire algorithm is 99.9% eventually, recall rate 97.5%.Recognition with Recurrent Neural Network is a kind of time recurrent neural network,
It will be appreciated that in sequence context knowledge, equally establish network using Keras, the accuracy rate of final mask is 99.5%, is called together
The rate of returning is 98.7%.Convolutional neural networks (CNN) reduce the number of parameters for needing training relative to MLP network, reduce meter
Calculation amount, while depth characteristic can be refined and analyzed, used here as the one-dimensional convolutional neural networks for being similar to Google VGG,
Comprising four convolutional layers, two maximum pond layers, a full articulamentums, final accuracy rate is 99.5%, and recall rate is
98.3%
S4. data on flows is measured in real time, and will test in result deposit database.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
Claims (5)
1. a kind of XSS attack detection method based on deep learning, this method key step include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
S2. data prediction is carried out to collected sample data, is segmented later, embedded word is formed based on word segmentation result
Vector;
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
S4. data on flows is measured in real time, and will test in result deposit database.
2. the method as described in claim 1, it is characterised in that: in the step S1, crawled from internet public data concentration
Sample, eliminate the redundancy in url, retain the part payload.
3. the method as described in claim 1, it is characterised in that: in the step S2, the sample data is right in pretreatment
Number and hyperlink generalized, number are replaced with " 0 ", hyperlink replaces with " http://u ", is stored in after encoding to url
In csv file.
4. the method as described in claim 1, it is characterised in that: in the step S2, use the word2vec of gensim module
Class establishes XSS semantic model, and word Spatial Dimension takes 128 dimensions.
5. the method as described in claim 1, it is characterised in that: the deep learning model in the step S3 is using MLP, circulation
Any one in three kinds of neural network, convolutional neural networks algorithms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910210329.3A CN109981631A (en) | 2019-03-20 | 2019-03-20 | A kind of XSS attack detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910210329.3A CN109981631A (en) | 2019-03-20 | 2019-03-20 | A kind of XSS attack detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109981631A true CN109981631A (en) | 2019-07-05 |
Family
ID=67079611
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910210329.3A Pending CN109981631A (en) | 2019-03-20 | 2019-03-20 | A kind of XSS attack detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109981631A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111818080A (en) * | 2020-07-22 | 2020-10-23 | 中国工商银行股份有限公司 | Injection attack detection model construction method and device |
US10817665B1 (en) * | 2020-05-08 | 2020-10-27 | Coupang Corp. | Systems and methods for word segmentation based on a competing neural character language model |
CN113536678A (en) * | 2021-07-19 | 2021-10-22 | 中国人民解放军国防科技大学 | XSS risk analysis method and device based on Bayesian network and STRIDE model |
CN113596007A (en) * | 2021-07-22 | 2021-11-02 | 广东电网有限责任公司 | Vulnerability attack detection method and device based on deep learning |
WO2021258479A1 (en) * | 2020-06-22 | 2021-12-30 | 网宿科技股份有限公司 | Graph neural network-based method, system, and apparatus for detecting network attack |
-
2019
- 2019-03-20 CN CN201910210329.3A patent/CN109981631A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10817665B1 (en) * | 2020-05-08 | 2020-10-27 | Coupang Corp. | Systems and methods for word segmentation based on a competing neural character language model |
US11113468B1 (en) * | 2020-05-08 | 2021-09-07 | Coupang Corp. | Systems and methods for word segmentation based on a competing neural character language model |
WO2021258479A1 (en) * | 2020-06-22 | 2021-12-30 | 网宿科技股份有限公司 | Graph neural network-based method, system, and apparatus for detecting network attack |
CN111818080A (en) * | 2020-07-22 | 2020-10-23 | 中国工商银行股份有限公司 | Injection attack detection model construction method and device |
CN113536678A (en) * | 2021-07-19 | 2021-10-22 | 中国人民解放军国防科技大学 | XSS risk analysis method and device based on Bayesian network and STRIDE model |
CN113536678B (en) * | 2021-07-19 | 2022-04-19 | 中国人民解放军国防科技大学 | XSS risk analysis method and device based on Bayesian network and STRIDE model |
CN113596007A (en) * | 2021-07-22 | 2021-11-02 | 广东电网有限责任公司 | Vulnerability attack detection method and device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109981631A (en) | A kind of XSS attack detection method based on deep learning | |
CN112487203B (en) | Relation extraction system integrated with dynamic word vector | |
Sun et al. | A general framework for content-enhanced network representation learning | |
CN111259987B (en) | Method for extracting event main body by multi-model fusion based on BERT | |
CN110032648A (en) | A kind of case history structuring analytic method based on medical domain entity | |
CN110598000A (en) | Relationship extraction and knowledge graph construction method based on deep learning model | |
CN112084331A (en) | Text processing method, text processing device, model training method, model training device, computer equipment and storage medium | |
Yang et al. | Personalized response generation by dual-learning based domain adaptation | |
CN109413028A (en) | SQL injection detection method based on convolutional neural networks algorithm | |
CN109766693A (en) | A kind of cross-site scripting attack detection method based on deep learning | |
CN110502626A (en) | A kind of aspect grade sentiment analysis method based on convolutional neural networks | |
CN106611055A (en) | Chinese hedge scope detection method based on stacked neural network | |
CN109558492A (en) | A kind of listed company's knowledge mapping construction method and device suitable for event attribution | |
CN112183747A (en) | Neural network training method, neural network compression method and related equipment | |
CN109344404A (en) | The dual attention natural language inference method of context aware | |
CN113254652B (en) | Social media posting authenticity detection method based on hypergraph attention network | |
CN116204674B (en) | Image description method based on visual concept word association structural modeling | |
Zhou et al. | ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge | |
CN113657105A (en) | Medical entity extraction method, device, equipment and medium based on vocabulary enhancement | |
CN114490954B (en) | Document level generation type event extraction method based on task adjustment | |
CN116628186A (en) | Text abstract generation method and system | |
CN111241843B (en) | Semantic relation inference system and method based on composite neural network | |
CN112149423B (en) | Corpus labeling method and system for domain entity relation joint extraction | |
CN117235261A (en) | Multi-modal aspect-level emotion analysis method, device, equipment and storage medium | |
CN116956925A (en) | Electronic medical record named entity identification method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190705 |
|
WD01 | Invention patent application deemed withdrawn after publication |