CN109981631A - A kind of XSS attack detection method based on deep learning - Google Patents

A kind of XSS attack detection method based on deep learning Download PDF

Info

Publication number
CN109981631A
CN109981631A CN201910210329.3A CN201910210329A CN109981631A CN 109981631 A CN109981631 A CN 109981631A CN 201910210329 A CN201910210329 A CN 201910210329A CN 109981631 A CN109981631 A CN 109981631A
Authority
CN
China
Prior art keywords
xss
deep learning
data
word
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910210329.3A
Other languages
Chinese (zh)
Inventor
孙波
李应博
张伟
司成祥
张建松
李胜男
毛蔚轩
盖伟麟
侯美佳
董建武
张泽亚
刘云昊
亓培锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201910210329.3A priority Critical patent/CN109981631A/en
Publication of CN109981631A publication Critical patent/CN109981631A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The XSS attack detection method based on deep learning that the present invention provides a kind of, this method key step include: S1. building network simulating environment, acquire the sample data of XSS attack;S2. data prediction is carried out to collected sample data, is segmented later, embedded term vector is formed based on word segmentation result;S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;S4. data on flows is measured in real time, and will test in result deposit database.

Description

A kind of XSS attack detection method based on deep learning
Technical field
The present invention relates to field of information security technology more particularly to a kind of XSS attack detection methods based on deep learning.
Background technique
Deep learning achieves great progress in fields such as computer vision, natural language processing, artificial intelligence, is pacifying Full field also starts to show up prominently to have moved towards practical application.
In internet+epoch, the importance of Web safety is more highlighted, wherein XSS (cross site scripting) attack is a kind of allusion quotation How the attack pattern of type effectively detects XSS attack, becomes a problem in field.
Summary of the invention
It is a primary object of the present invention to propose a kind of XSS attack detection method based on deep learning, it is intended to solve such as How about what automatically and efficiently detects XSS attack event.
To achieve the above object, a kind of XSS attack detection method based on deep learning provided by the invention, this method master The step is wanted to include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
S2. data prediction is carried out to collected sample data, is segmented later, insertion is formed based on word segmentation result Formula term vector;
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
S4. data on flows is measured in real time, and will test in result deposit database.
Preferably, in step S1 further include: concentrate the sample crawled from internet public data, eliminate superfluous in url Remaining information retains the part payload.
Preferably, in step S2, the sample data, to number and hyperlink generalized, number is replaced in pretreatment For " 0 ", hyperlink replaces with " http://u ", to ur] encode after be stored in csv file.
Preferably, in the step S2, XSS semantic model, word space are established using the word2vec class of gensim module Dimension takes 128 dimensions.
Preferably, the deep learning model in step S3 is using MLP, Recognition with Recurrent Neural Network, three kinds of convolutional neural networks calculations Any one in method.
XSS attack detection method proposed by the present invention based on deep learning is located in advance by carrying out data to XSS flow Reason, is segmented later, forms embedded term vector based on word segmentation result, and the embedded term vector is then inputted depth Model is practised, XSS detection model is obtained, data on flows is measured in real time later, and will test in result deposit database, from And XSS attack behavior can be effectively detected, improve the internet security of user.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Specific embodiment
XSS attack detection method provided by the invention based on deep learning, this method key step include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
Since the public data collection of security fields is very rare, experimental data provided herein includes two portions Point: first is that using the sample crawled from well-known open source as positive sample, sample total 50,000 is a plurality of;In addition 300,000 are collected just Normal http get request record is used as negative sample, eliminates the information such as host, path in url, only remains payload's Part.It is stored in csv after above data url coding, since part initial data carried out url coding, so will url twice It could be used after decoding.
S2. data prediction is carried out to collected sample data, is segmented later, insertion is formed based on word segmentation result Formula term vector;
The participle principle supported in the application are as follows: content ' xss ' that single double quotation marks includes, http/https link, ◇ mark Sign<script>, ◇ beginning<h1, parameter name topic=, function body alert (, alphanumeric form word.The sample number According in pretreatment, to number and hyperlink generalized, number is replaced with " 0 ", hyperlink replaces with " http://u ", to url It is stored in after being encoded in csv file.XSS semantic model, word space dimension are established using the word2vec class of gensim module Degree takes 128 dimensions.
The problem of how converting machine learning for the text after participle, the first step are to find a kind of method these words Mathematicization.The most common method is one-hot coding (one-hot), and this method is vocabulary to be expressed as a very long vector, The value of only one dimension is 1, other are all 0, such as " " " "<script>" it is expressed as [0,0,0,1,0,0,0,0.......]. This method there is a problem of one it is important, it is mutually indepedent between word and word that it is extremely sparse for, which constituting the vector of text, , machine learning can not understand the semanteme of word.Embedded term vector is exactly the language for characterizing word come word vector by learning text Adopted information, by approaching distance of the semantic similar word in space word embedded space.Space vector can express such as Synonym as " microphone " and " Mike ", " cat ", " dog ", " words such as fish " can also be brought together in space.
Herein we will use embedded one XSS of term vector model foundation semantic model, allow machine it will be appreciated that < Script >, html language as alert ().3000 words that frequency of occurrence is most in positive sample are taken, vocabulary is constituted, His word is labeled as " UKN ", is modeled using the word2vec class of gensim module, word Spatial Dimension takes 128 dimensions.
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
In this application, deep learning model is using in three kinds of MLP, Recognition with Recurrent Neural Network, convolutional neural networks algorithms Any one.
In the application, by having carried out across comparison to three kinds of algorithms to experiment.Multi-layer perception (MLP) (MLP) includes one defeated Enter layer, output layer and several hidden layers.Keras can be used Tensorflow as rear end and easily realize multi-layer perception (MLP), most The accuracy rate of entire algorithm is 99.9% eventually, recall rate 97.5%.Recognition with Recurrent Neural Network is a kind of time recurrent neural network, It will be appreciated that in sequence context knowledge, equally establish network using Keras, the accuracy rate of final mask is 99.5%, is called together The rate of returning is 98.7%.Convolutional neural networks (CNN) reduce the number of parameters for needing training relative to MLP network, reduce meter Calculation amount, while depth characteristic can be refined and analyzed, used here as the one-dimensional convolutional neural networks for being similar to Google VGG, Comprising four convolutional layers, two maximum pond layers, a full articulamentums, final accuracy rate is 99.5%, and recall rate is 98.3%
S4. data on flows is measured in real time, and will test in result deposit database.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (5)

1. a kind of XSS attack detection method based on deep learning, this method key step include:
S1. network simulating environment is constructed, the sample data of XSS attack is acquired;
S2. data prediction is carried out to collected sample data, is segmented later, embedded word is formed based on word segmentation result Vector;
S3. the embedded term vector is inputted into deep learning model, obtains XSS detection model;
S4. data on flows is measured in real time, and will test in result deposit database.
2. the method as described in claim 1, it is characterised in that: in the step S1, crawled from internet public data concentration Sample, eliminate the redundancy in url, retain the part payload.
3. the method as described in claim 1, it is characterised in that: in the step S2, the sample data is right in pretreatment Number and hyperlink generalized, number are replaced with " 0 ", hyperlink replaces with " http://u ", is stored in after encoding to url In csv file.
4. the method as described in claim 1, it is characterised in that: in the step S2, use the word2vec of gensim module Class establishes XSS semantic model, and word Spatial Dimension takes 128 dimensions.
5. the method as described in claim 1, it is characterised in that: the deep learning model in the step S3 is using MLP, circulation Any one in three kinds of neural network, convolutional neural networks algorithms.
CN201910210329.3A 2019-03-20 2019-03-20 A kind of XSS attack detection method based on deep learning Pending CN109981631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910210329.3A CN109981631A (en) 2019-03-20 2019-03-20 A kind of XSS attack detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910210329.3A CN109981631A (en) 2019-03-20 2019-03-20 A kind of XSS attack detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN109981631A true CN109981631A (en) 2019-07-05

Family

ID=67079611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910210329.3A Pending CN109981631A (en) 2019-03-20 2019-03-20 A kind of XSS attack detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN109981631A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111818080A (en) * 2020-07-22 2020-10-23 中国工商银行股份有限公司 Injection attack detection model construction method and device
US10817665B1 (en) * 2020-05-08 2020-10-27 Coupang Corp. Systems and methods for word segmentation based on a competing neural character language model
CN113536678A (en) * 2021-07-19 2021-10-22 中国人民解放军国防科技大学 XSS risk analysis method and device based on Bayesian network and STRIDE model
CN113596007A (en) * 2021-07-22 2021-11-02 广东电网有限责任公司 Vulnerability attack detection method and device based on deep learning
WO2021258479A1 (en) * 2020-06-22 2021-12-30 网宿科技股份有限公司 Graph neural network-based method, system, and apparatus for detecting network attack

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10817665B1 (en) * 2020-05-08 2020-10-27 Coupang Corp. Systems and methods for word segmentation based on a competing neural character language model
US11113468B1 (en) * 2020-05-08 2021-09-07 Coupang Corp. Systems and methods for word segmentation based on a competing neural character language model
WO2021258479A1 (en) * 2020-06-22 2021-12-30 网宿科技股份有限公司 Graph neural network-based method, system, and apparatus for detecting network attack
CN111818080A (en) * 2020-07-22 2020-10-23 中国工商银行股份有限公司 Injection attack detection model construction method and device
CN113536678A (en) * 2021-07-19 2021-10-22 中国人民解放军国防科技大学 XSS risk analysis method and device based on Bayesian network and STRIDE model
CN113536678B (en) * 2021-07-19 2022-04-19 中国人民解放军国防科技大学 XSS risk analysis method and device based on Bayesian network and STRIDE model
CN113596007A (en) * 2021-07-22 2021-11-02 广东电网有限责任公司 Vulnerability attack detection method and device based on deep learning

Similar Documents

Publication Publication Date Title
CN109981631A (en) A kind of XSS attack detection method based on deep learning
CN112487203B (en) Relation extraction system integrated with dynamic word vector
Sun et al. A general framework for content-enhanced network representation learning
CN111259987B (en) Method for extracting event main body by multi-model fusion based on BERT
CN110032648A (en) A kind of case history structuring analytic method based on medical domain entity
CN110598000A (en) Relationship extraction and knowledge graph construction method based on deep learning model
CN112084331A (en) Text processing method, text processing device, model training method, model training device, computer equipment and storage medium
Yang et al. Personalized response generation by dual-learning based domain adaptation
CN109413028A (en) SQL injection detection method based on convolutional neural networks algorithm
CN109766693A (en) A kind of cross-site scripting attack detection method based on deep learning
CN110502626A (en) A kind of aspect grade sentiment analysis method based on convolutional neural networks
CN106611055A (en) Chinese hedge scope detection method based on stacked neural network
CN109558492A (en) A kind of listed company&#39;s knowledge mapping construction method and device suitable for event attribution
CN112183747A (en) Neural network training method, neural network compression method and related equipment
CN109344404A (en) The dual attention natural language inference method of context aware
CN113254652B (en) Social media posting authenticity detection method based on hypergraph attention network
CN116204674B (en) Image description method based on visual concept word association structural modeling
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge
CN113657105A (en) Medical entity extraction method, device, equipment and medium based on vocabulary enhancement
CN114490954B (en) Document level generation type event extraction method based on task adjustment
CN116628186A (en) Text abstract generation method and system
CN111241843B (en) Semantic relation inference system and method based on composite neural network
CN112149423B (en) Corpus labeling method and system for domain entity relation joint extraction
CN117235261A (en) Multi-modal aspect-level emotion analysis method, device, equipment and storage medium
CN116956925A (en) Electronic medical record named entity identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190705

WD01 Invention patent application deemed withdrawn after publication