CN109670307A - A kind of SQL injection recognition methods based on CNN and massive logs - Google Patents
A kind of SQL injection recognition methods based on CNN and massive logs Download PDFInfo
- Publication number
- CN109670307A CN109670307A CN201811473916.3A CN201811473916A CN109670307A CN 109670307 A CN109670307 A CN 109670307A CN 201811473916 A CN201811473916 A CN 201811473916A CN 109670307 A CN109670307 A CN 109670307A
- Authority
- CN
- China
- Prior art keywords
- cnn
- sql injection
- sample
- model
- recognition methods
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/577—Assessing vulnerabilities and evaluating computer system security
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
- G06F21/563—Static detection by source code analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of SQL injection recognition methods based on CNN and massive logs, comprising the following steps: step 1: extracting URL access record from web log file, extracts the query argument part in URL;Step 2: the data extracted in step 1 are pre-processed;Step 3: building CNN network, the data obtained according to step 2 are trained CNN network, obtain CNN model;Step 4: the attack of the SQL injection based on query statement is measured in real time according to the model that step 3 obtains;The present invention can automatically extract the hiding common characteristic of SQL injection, and good to SQL injection detection effect, speed is fast, and rate of failing to report and rate of false alarm are low.
Description
Technical field
The present invention relates to SQL injection recognition methods, and in particular to a kind of to be identified based on the SQL injection of CNN and massive logs
Method.
Background technique
SQL injection is various informative, new method occurs inevitably around traditional detection;Existing detection method is mainly wrapped
It includes support vector machines (SVM), canonical matching, decision tree, Nae Bayesianmethod etc.;Wherein canonical matching is widely used, effect
Preferably;But SQL injection is various informative, new method inevitably occurs around canonical matching technique, simultaneously because being based on
Existing rule is difficult to identify new attack;Traditional machine learning techniques (statistical learnings such as SVM, naive Bayesian and decision tree
Method) need it is artificial provide feature, be difficult to obtain its best feature and cannot judge whether it is best feature and not
Better feature is easily judged whether there is, causes recognition accuracy not high, a large amount of wrong reports is easy to appear and fails to report.
Summary of the invention
The present invention provides a kind of hiding common characteristic of extractable SQL injection, identifies the SQL injection attack around canonical
The SQL injection recognition methods based on CNN and massive logs of flow.
The technical solution adopted by the present invention is that: a kind of SQL injection recognition methods based on CNN and massive logs, including with
Lower step:
Step 1: extracting URL access record from web log file, extract the query argument part in URL;
Step 2: the data extracted in step 1 are pre-processed;
Step 3: building CNN network, the data obtained according to step 2 are trained CNN network, obtain CNN model;
Step 4: the attack of the SQL injection based on query statement is measured in real time according to the model that step 3 obtains.
Further, the URL access extracted in the step 1 is recorded as marked good having in web log file and inquires language
The URL that sentence and requesting method are GET accesses record.
Further, the query argument part in URL is extracted in the step 1 also to need to carry out at duplicate removal sample later
Reason.
Further, the pretreatment in the step 2 includes following procedure:
S1: the average length of every sample of statistics, i.e. character quantity;
S2: term vectorization training is carried out to sample, obtains term vector model;
S3: vectorization is carried out to sample according to the model in step 2, each sample standard deviation is converted into a two-dimensional matrix;
S4: the two-dimensional matrix that step 3 is obtained, which is converted into, specifies big minor matrix.
Further, detailed process is as follows for the step 3:
S11: positive negative sample is upset, and a part of sample is remaining to be used as test set as training set;
S12: it builds CNN network and is trained;
S13: testing the obtained model of step S12 by verifying collection, stop if error meets preset requirement, no
Continue to be trained after then adjusting CNN network parameter.
The beneficial effects of the present invention are:
(1) present invention can automatically extract the hiding common characteristic of SQL injection, identify the SQL injection attack around canonical
Flow, and speed is fast;
(2) the present invention is based on single characters to carry out vectorization, and word amount is less, can retain all information of query sentence, can drop
Low trained difficulty and training cost.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention.
Fig. 2 is ROC curve in the embodiment of the present invention.
Specific embodiment
The present invention will be further described in the following with reference to the drawings and specific embodiments.
As shown in Figure 1, a kind of SQL injection recognition methods based on CNN and massive logs, comprising the following steps:
Step 1: extracting URL access record from web log file, extract the query argument part in URL;(data cleansing)
The URL access record that marked good query statement and requesting method in web log file are GET is extracted, including just
Normal SQL injection.
Such as http://www.abc.com/a/b/? query#fragment, the url with query parameter, i.e., after question mark
Face, the character string before pound sign (format is key1=value1&key2=value2 ... under normal circumstances);In order to guarantee sample
With enough representativenesses, a large amount of data (hundreds of thousands of to million ranks) are needed, it, can be on time if marked log amount is too big
Between uniform sampling.
Then the query argument part in URL is extracted, such as http://www.abc.com/a/b/? query#fragment,
Only retain query argument section.
Repetition is likely to occur for positive and negative sample standard deviation, duplicate removal can be reduced by repeating bring redundancy, improve training sample
This quality.
Step 2: (data prediction) is pre-processed to the data extracted in step 1;
Including following procedure:
S1: the average length of every sample of statistics, i.e. character quantity;Row matrix is determined by statistical sample average length
Number, is approximately equal to average length, if abc length is 3.
S2: term vectorization training is carried out to sample, obtains term vector model;
Sample is subjected to term vectorization training using word2vec method (the term vector method of Google's open source), obtains one
A term vector model;This model saves the corresponding vector of each character;It is long that vectorization dimension values are not preferably greater than sample mean
Degree, and be advisable with 2 for n times side, such as 16,32,64 etc.;Using single character as a vector unit, carried out without to it
URL decoding;Character quantity can be greatly increased after decoding (will appear a large amount of Chinese character, if filtering out Chinese character can lose
Fall part useful information, because some SQL injections can utilize Chinese character).
Such as " a " indicates 5 element vectors as [0.1,1.2,0,1,3].
S3: vectorization is carried out to sample according to the model in step 2, each sample standard deviation is converted into a two-dimensional matrix;
If there is no the character occurred in sample in model, filled with 1 vector, i.e. each element value of row vector takes 1;Often
Sample standard deviation can be converted into a two-dimensional matrix, and columns is the vector dimension that each character determines, line number is sample character
Quantity.
Such as abc vectorization schematic diagram
a[0.1,1.2,0,1,3]
b[0,1.6,10,11,2.3]
c[0.8,1.2,0.67,1.983,1.41]
S4: the two-dimensional matrix that step 3 is obtained is converted into matrix.
If matrix line number is less than designated value, big minor matrix is specified to it in one direct subsidy of bottom, 0 vector, if line number is big
In designated value, then remove last several rows so that it becomes specifying big minor matrix.
Step 3: building CNN network, the data obtained according to step 2 are trained CNN network, obtain CNN model (mould
Type training);
Detailed process is as follows:
S11: positive negative sample is upset, and a part of sample is remaining as verifying collection as training set;
Due to training when be that all samples are imported into training in batches, upset positive negative sample can prevent train when occur it is whole
A trained batch is positive sample or negative sample entirely;The marked positive negative sample in part is randomly selected as training set (ten to two ten ten thousand
Left and right), it is remaining as verifying collection.
S12: it builds CNN network and is trained;
In general, amount of training data is the bigger the better, but training cost can also rise.
S13: testing the obtained model of step S12 by verifying collection, stop if error meets preset requirement, no
Continue to be trained after then adjusting CNN network parameter.
Trained model is tested by verifying collection, if error (accuracy, loss function value etc.) is acceptable
Then stop in range, otherwise continue to train by adjusting neural network parameter, parameter includes CNN structure, the network number of plies, instruction
Practice number, convolution kernel size, quantity, pond function, classification function, activation primitive etc.;Usually trained model collects verifying
Classification accuracy rate 99% or more.
Step 4: the attack of the SQL injection based on query statement is measured in real time according to the model that step 3 obtains.
By model be deployed in server environment can real-time detection attacked based on the SQL injection of query statement, can also will
The present invention is used for detection history log.
Embodiment
The ROC curve of the method for the present invention is as shown in Fig. 2, as can be seen from the figure classification performance of the present invention is good;Structure of the present invention
The model built is in face of random, that is, without apparent visible properties, for can be around tradition canonical characteristic matching
The SQL injection of (whether having sensitive character string in URL, so there are various fancy to bypass skill) still has detection effect well
Fruit, the SQL injection that traditional canonical matching process can not be detected, the model that the present invention constructs still can detected, and speed
Degree is fast.
From figure 2 it can be seen that rate of failing to report of the present invention and rate of false alarm are very low;Prove that accuracy of the present invention is more than after tested
The field of image recognition average accuracy that CNN is directed to originally, is tested, accuracy is 99.7% on 3,000,000 datas.
ROC curve refers to Receiver operating curve (receiver operating characteristic curve),
It is the overall target for reflecting sensibility and specificity continuous variable, is the mutual pass for disclosing sensibility and specificity with composition method
System;Continuous variable by being set out multiple and different critical values by it, so that a series of sensibility and specificities are calculated, then with
Sensibility is ordinate, specificity is that abscissa is depicted as curve;Area under the curve is bigger, and diagnostic accuracy is higher;In ROC song
It is the higher critical value of sensibility and specificity near the upper left point of coordinate diagram on line.
Further progress test, respectively to 2,110,000 positive samples, 1,180,000 negative samples, 3,300,000 sample (positive samples
2117860, negative sample 1187504) it is tested, accuracy is respectively 99.97%, 99.9% and 99.95%.
The symbol occurred in the present invention is as follows:
SQL injection: by the way that sql command is inserted into the polling character that Web list is submitted or inputs domain name or page request
String;It is finally reached the sql command that spoofing server executes malice;Specifically, it is using existing application, by (malice
) sql command be injected into background data base engine execution ability, it can pass through in Web list input (malice) SQL language
Sentence obtains one, and there are the databases on the website of security breaches, rather than are intended to go to execute SQL statement according to designer.
CNN: convolutional neural networks (Convolutional Neural Network) are a kind of depth feedforward artificial neurons
Network;CNN is a neural network for problem of image recognition design, it imitates the multilayer process that the mankind identify image.
URL: refer to uniform resource locator.
SQL: structured query language (Structured Query Language).
Present invention is mainly used for the behaviors for being identified by URL query statement progress SQL injection, and URL query statement is carried out
Two classification are divided into normal and SQL injection;Due to having public affairs between normal query statement and the query statement of SQL injection
Total characteristic difference, but can not be, it is evident that therefore can be classified according to these difference.
The query statement of SQL injection often has very big redundancy, therefore can remove the characteristic phase of redundancy with CNN
Matching;There is unconspicuous hiding feature between normal query statement and the query statement of SQL injection, that is to say, that without legal person
For good characteristic of division is provided, bad feature then will lead to a large amount of errors of identification;Difference between them often exists
Unconspicuous feature can not be obtained by observation, and CNN, which can be automatically extracted, thinks unrecognized feature, therefore can using CNN
It extracts and hides common characteristic.
Existing SQL injection detection mainly uses canonical matching to be identified, speed high using canonical technology recognition accuracy
It spends fastly, but cannot identify new attack, what inevitably appearance was new bypasses method to avoid canonical, for example URL more times are compiled
Code;The present invention can then automatically extract the hiding common characteristic of SQL injection, identify the SQL injection attack traffic around canonical,
And speed is fast;Traditional machine learning method needs artificial screening feature, and good feature is difficult to obtain, and the quality of feature is directly
Influence recognition accuracy;Some deep learning identifications are pretty troublesome to URL pretreatment, can also lose part important information, such as
URL is decoded, it is possible that Chinese character, does not remove and will lead to the substantial increase of word amount, remove, can lack part SQL injection
Important feature information;The present invention is based on single characters to carry out vectorization, and word amount is less, but can retain all letters of query sentence
Breath, and word2vec method can also combined information between reserved character, such as location information, word amount is few can then to reduce training
Difficulty and training cost.
Claims (5)
1. a kind of SQL injection recognition methods based on CNN and massive logs, which comprises the following steps:
Step 1: extracting URL access record from web log file, extract the query argument part in URL;
Step 2: the data extracted in step 1 are pre-processed;
Step 3: building CNN network, the data obtained according to step 2 are trained CNN network, obtain CNN model;
Step 4: the attack of the SQL injection based on query statement is measured in real time according to the model that step 3 obtains.
2. a kind of SQL injection recognition methods based on CNN and massive logs according to claim 1, which is characterized in that institute
State the URL access extracted in step 1 be recorded as it is marked good with query statement and requesting method is GET's in web log file
URL access record.
3. a kind of SQL injection recognition methods based on CNN and massive logs according to claim 1, which is characterized in that institute
The query argument part extracted in step 1 in URL is stated also to need to carry out duplicate removal processing to sample later.
4. a kind of SQL injection recognition methods based on CNN and massive logs according to claim 1, which is characterized in that institute
The pretreatment stated in step 2 includes following procedure:
S1: the average length of every sample of statistics, i.e. character quantity;
S2: term vectorization training is carried out to sample, obtains term vector model;
S3: vectorization is carried out to sample according to the model in step 2, each sample standard deviation is converted into a two-dimensional matrix;
S4: the two-dimensional matrix that step 3 is obtained, which is converted into, specifies big minor matrix.
5. a kind of SQL injection recognition methods based on CNN and massive logs according to claim 1, which is characterized in that institute
Stating step 3, detailed process is as follows:
S11: positive negative sample is upset, and a part of sample is remaining to be used as test set as training set;
S12: it builds CNN network and is trained;
S13: the obtained model of step S12 is tested by verifying collection, stops if error meets preset requirement, otherwise adjusts
Continue to be trained after whole CNN network parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811473916.3A CN109670307A (en) | 2018-12-04 | 2018-12-04 | A kind of SQL injection recognition methods based on CNN and massive logs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811473916.3A CN109670307A (en) | 2018-12-04 | 2018-12-04 | A kind of SQL injection recognition methods based on CNN and massive logs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109670307A true CN109670307A (en) | 2019-04-23 |
Family
ID=66143542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811473916.3A Pending CN109670307A (en) | 2018-12-04 | 2018-12-04 | A kind of SQL injection recognition methods based on CNN and massive logs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670307A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135166A (en) * | 2019-05-08 | 2019-08-16 | 北京国舜科技股份有限公司 | A kind of detection method and system for the attack of service logic loophole |
CN112019497A (en) * | 2020-07-10 | 2020-12-01 | 上海大学 | Word embedding-based multi-stage network attack detection method |
CN113254942A (en) * | 2021-05-24 | 2021-08-13 | 浙江网商银行股份有限公司 | Data processing method, system and device |
CN113821791A (en) * | 2020-06-18 | 2021-12-21 | 中国电信股份有限公司 | Method, system, storage medium and apparatus for detecting SQL injection |
US20230169180A1 (en) * | 2021-11-30 | 2023-06-01 | Oracle International Corporation | Detecting security vulnerabilities in queries via an information flow graph |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908965A (en) * | 2017-11-14 | 2018-04-13 | 北京知道创宇信息技术有限公司 | The detection method and device of doubtful SQL injection type |
CN108173708A (en) * | 2017-12-18 | 2018-06-15 | 北京天融信网络安全技术有限公司 | Anomalous traffic detection method, device and storage medium based on incremental learning |
CN108259494A (en) * | 2018-01-17 | 2018-07-06 | 北京邮电大学 | A kind of network attack detecting method and device |
CN108667816A (en) * | 2018-04-19 | 2018-10-16 | 重庆邮电大学 | A kind of the detection localization method and system of Network Abnormal |
-
2018
- 2018-12-04 CN CN201811473916.3A patent/CN109670307A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908965A (en) * | 2017-11-14 | 2018-04-13 | 北京知道创宇信息技术有限公司 | The detection method and device of doubtful SQL injection type |
CN108173708A (en) * | 2017-12-18 | 2018-06-15 | 北京天融信网络安全技术有限公司 | Anomalous traffic detection method, device and storage medium based on incremental learning |
CN108259494A (en) * | 2018-01-17 | 2018-07-06 | 北京邮电大学 | A kind of network attack detecting method and device |
CN108667816A (en) * | 2018-04-19 | 2018-10-16 | 重庆邮电大学 | A kind of the detection localization method and system of Network Abnormal |
Non-Patent Citations (1)
Title |
---|
MING ZHANG,ET AL.: "A Deep Learning Method to Detect Web Attacks Using a Specially Designed CNN", <ICONIP 2017:NEURAL INFORMATION PROCESSING> * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135166A (en) * | 2019-05-08 | 2019-08-16 | 北京国舜科技股份有限公司 | A kind of detection method and system for the attack of service logic loophole |
CN110135166B (en) * | 2019-05-08 | 2021-03-30 | 北京国舜科技股份有限公司 | Detection method and system for service logic vulnerability attack |
CN113821791A (en) * | 2020-06-18 | 2021-12-21 | 中国电信股份有限公司 | Method, system, storage medium and apparatus for detecting SQL injection |
CN112019497A (en) * | 2020-07-10 | 2020-12-01 | 上海大学 | Word embedding-based multi-stage network attack detection method |
CN112019497B (en) * | 2020-07-10 | 2021-12-03 | 上海大学 | Word embedding-based multi-stage network attack detection method |
CN113254942A (en) * | 2021-05-24 | 2021-08-13 | 浙江网商银行股份有限公司 | Data processing method, system and device |
US20230169180A1 (en) * | 2021-11-30 | 2023-06-01 | Oracle International Corporation | Detecting security vulnerabilities in queries via an information flow graph |
US11860679B2 (en) * | 2021-11-30 | 2024-01-02 | Oracle International Corporation | Detecting security vulnerabilities in queries via an information flow graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670307A (en) | A kind of SQL injection recognition methods based on CNN and massive logs | |
Long et al. | Understanding membership inferences on well-generalized learning models | |
CN106096405B (en) | A kind of Android malicious code detecting method abstract based on Dalvik instruction | |
CN110233849B (en) | Method and system for analyzing network security situation | |
Pirscoveanu et al. | Analysis of malware behavior: Type classification using machine learning | |
CN109547423B (en) | WEB malicious request deep detection system and method based on machine learning | |
CN108833409B (en) | Webshell detection method and device based on deep learning and semi-supervised learning | |
CN109005145B (en) | Malicious URL detection system and method based on automatic feature extraction | |
CN107786575A (en) | A kind of adaptive malice domain name detection method based on DNS flows | |
CN108920954B (en) | Automatic malicious code detection platform and method | |
CN109859029A (en) | Abnormal application detection method, device, computer equipment and storage medium | |
CN108200054A (en) | A kind of malice domain name detection method and device based on dns resolution | |
CN108667816A (en) | A kind of the detection localization method and system of Network Abnormal | |
CN109684840A (en) | Based on the sensitive Android malware detection method for calling path | |
CN109194677A (en) | A kind of SQL injection attack detection, device and equipment | |
CN105069355A (en) | Static detection method and apparatus for webshell deformation | |
CN107360152A (en) | A kind of Web based on semantic analysis threatens sensory perceptual system | |
CN108616545A (en) | A kind of detection method, system and electronic equipment that network internal threatens | |
CN110266675A (en) | A kind of xss attack automated detection method based on deep learning | |
CN111324797B (en) | Method and device for precisely acquiring data at high speed | |
US10990672B2 (en) | Method and apparatus for obtaining virus library, device, server, and system | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
CN108718306A (en) | A kind of abnormal flow behavior method of discrimination and device | |
CN110191096A (en) | A kind of term vector homepage invasion detection method based on semantic analysis | |
CN115361176B (en) | SQL injection attack detection method based on FlexUDA model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190423 |