CN109981631A

CN109981631A - A kind of XSS attack detection method based on deep learning

Info

Publication number: CN109981631A
Application number: CN201910210329.3A
Authority: CN
Inventors: 孙波; 李应博; 张伟; 司成祥; 张建松; 李胜男; 毛蔚轩; 盖伟麟; 侯美佳; 董建武; 张泽亚; 刘云昊; 亓培锋
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2019-07-05

Abstract

The XSS attack detection method based on deep learning that the present invention provides a kind of, this method key step include: S1. building network simulating environment, acquire the sample data of XSS attack；S2. data prediction is carried out to collected sample data, is segmented later, embedded term vector is formed based on word segmentation result；S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model；S4. data on flows is measured in real time, and will test in result deposit database.

Description

A kind of XSS attack detection method based on deep learning

Technical field

The present invention relates to field of information security technology more particularly to a kind of XSS attack detection methods based on deep learning.

Background technique

Deep learning achieves great progress in fields such as computer vision, natural language processing, artificial intelligence, is pacifying Full field also starts to show up prominently to have moved towards practical application.

In internet+epoch, the importance of Web safety is more highlighted, wherein XSS (cross site scripting) attack is a kind of allusion quotation How the attack pattern of type effectively detects XSS attack, becomes a problem in field.

Summary of the invention

It is a primary object of the present invention to propose a kind of XSS attack detection method based on deep learning, it is intended to solve such as How about what automatically and efficiently detects XSS attack event.

To achieve the above object, a kind of XSS attack detection method based on deep learning provided by the invention, this method master The step is wanted to include:

S1. network simulating environment is constructed, the sample data of XSS attack is acquired；

S2. data prediction is carried out to collected sample data, is segmented later, insertion is formed based on word segmentation result Formula term vector；

S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model；

S4. data on flows is measured in real time, and will test in result deposit database.

Preferably, in step S1 further include: concentrate the sample crawled from internet public data, eliminate superfluous in url Remaining information retains the part payload.

Preferably, in step S2, the sample data, to number and hyperlink generalized, number is replaced in pretreatment For " 0 ", hyperlink replaces with " http://u ", to ur] encode after be stored in csv file.

Preferably, in the step S2, XSS semantic model, word space are established using the word2vec class of gensim module Dimension takes 128 dimensions.

Preferably, the deep learning model in step S3 is using MLP, Recognition with Recurrent Neural Network, three kinds of convolutional neural networks calculations Any one in method.

XSS attack detection method proposed by the present invention based on deep learning is located in advance by carrying out data to XSS flow Reason, is segmented later, forms embedded term vector based on word segmentation result, and the embedded term vector is then inputted depth Model is practised, XSS detection model is obtained, data on flows is measured in real time later, and will test in result deposit database, from And XSS attack behavior can be effectively detected, improve the internet security of user.

Detailed description of the invention

Fig. 1 is flow chart of the method for the present invention.

Specific embodiment

XSS attack detection method provided by the invention based on deep learning, this method key step include:

Since the public data collection of security fields is very rare, experimental data provided herein includes two portions Point: first is that using the sample crawled from well-known open source as positive sample, sample total 50,000 is a plurality of；In addition 300,000 are collected just Normal http get request record is used as negative sample, eliminates the information such as host, path in url, only remains payload's Part.It is stored in csv after above data url coding, since part initial data carried out url coding, so will url twice It could be used after decoding.

The participle principle supported in the application are as follows: content ' xss ' that single double quotation marks includes, http/https link, ◇ mark Sign<script>, ◇ beginning<h1, parameter name topic=, function body alert (, alphanumeric form word.The sample number According in pretreatment, to number and hyperlink generalized, number is replaced with " 0 ", hyperlink replaces with " http://u ", to url It is stored in after being encoded in csv file.XSS semantic model, word space dimension are established using the word2vec class of gensim module Degree takes 128 dimensions.

The problem of how converting machine learning for the text after participle, the first step are to find a kind of method these words Mathematicization.The most common method is one-hot coding (one-hot), and this method is vocabulary to be expressed as a very long vector, The value of only one dimension is 1, other are all 0, such as " " " "<script>" it is expressed as [0,0,0,1,0,0,0,0.......]. This method there is a problem of one it is important, it is mutually indepedent between word and word that it is extremely sparse for, which constituting the vector of text, , machine learning can not understand the semanteme of word.Embedded term vector is exactly the language for characterizing word come word vector by learning text Adopted information, by approaching distance of the semantic similar word in space word embedded space.Space vector can express such as Synonym as " microphone " and " Mike ", " cat ", " dog ", " words such as fish " can also be brought together in space.

Herein we will use embedded one XSS of term vector model foundation semantic model, allow machine it will be appreciated that < Script >, html language as alert ().3000 words that frequency of occurrence is most in positive sample are taken, vocabulary is constituted, His word is labeled as " UKN ", is modeled using the word2vec class of gensim module, word Spatial Dimension takes 128 dimensions.

In this application, deep learning model is using in three kinds of MLP, Recognition with Recurrent Neural Network, convolutional neural networks algorithms Any one.

In the application, by having carried out across comparison to three kinds of algorithms to experiment.Multi-layer perception (MLP) (MLP) includes one defeated Enter layer, output layer and several hidden layers.Keras can be used Tensorflow as rear end and easily realize multi-layer perception (MLP), most The accuracy rate of entire algorithm is 99.9% eventually, recall rate 97.5%.Recognition with Recurrent Neural Network is a kind of time recurrent neural network, It will be appreciated that in sequence context knowledge, equally establish network using Keras, the accuracy rate of final mask is 99.5%, is called together The rate of returning is 98.7%.Convolutional neural networks (CNN) reduce the number of parameters for needing training relative to MLP network, reduce meter Calculation amount, while depth characteristic can be refined and analyzed, used here as the one-dimensional convolutional neural networks for being similar to Google VGG, Comprising four convolutional layers, two maximum pond layers, a full articulamentums, final accuracy rate is 99.5%, and recall rate is 98.3%

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims

1. a kind of XSS attack detection method based on deep learning, this method key step include:

S2. data prediction is carried out to collected sample data, is segmented later, embedded word is formed based on word segmentation result Vector；

2. the method as described in claim 1, it is characterised in that: in the step S1, crawled from internet public data concentration Sample, eliminate the redundancy in url, retain the part payload.

3. the method as described in claim 1, it is characterised in that: in the step S2, the sample data is right in pretreatment Number and hyperlink generalized, number are replaced with " 0 ", hyperlink replaces with " http://u ", is stored in after encoding to url In csv file.

4. the method as described in claim 1, it is characterised in that: in the step S2, use the word2vec of gensim module Class establishes XSS semantic model, and word Spatial Dimension takes 128 dimensions.

5. the method as described in claim 1, it is characterised in that: the deep learning model in the step S3 is using MLP, circulation Any one in three kinds of neural network, convolutional neural networks algorithms.