CN116781330A

CN116781330A - SQL injection detection method of improved Bayesian theory and electronic equipment

Info

Publication number: CN116781330A
Application number: CN202310614671.6A
Authority: CN
Inventors: 凌颖; 黎新; 宾冬梅; 谢铭; 杨春燕; 韩松明; 明少锋; 卢杰科; 唐福川; 崔志美; 黄伟翔
Original assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Guangxi Power Grid Co Ltd
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-09-19

Abstract

The application discloses an SQL injection detection method of an improved Bayesian theory, which comprises the following steps: collecting SQL request data; preprocessing the collected SQL request data; feature selection is carried out on the preprocessed SQL request data; counting characteristic quantities; storing SQL request data in a database; performing secondary processing on SQL request data in a database to avoid excessive fitting and excessive complexity; establishing an improved SQL injection detection model; training and optimizing the model to obtain a trained improved SQL injection detection model; aiming at new SQL request data, judging whether the new SQL request has SQL injection attack behaviors or not through a trained improved SQL injection detection model. Compared with the prior art, the technical scheme has the advantages of being high in learning speed and accuracy, capable of effectively identifying and intercepting SQL injection attack behaviors and guaranteeing enterprise safety and user privacy, and the electronic equipment has the advantages.

Description

SQL injection detection method of improved Bayesian theory and electronic equipment

Technical Field

The application relates to the technical field of network communication, in particular to an SQL injection detection method based on an improved Bayesian theory.

Background

With the rapid development of the internet, various Web applications are continuously emerging, great convenience is brought to enterprises and individuals, however, subsequent network attacks are unavoidable, SQL injection attack is one of the network attacks, and one of the most common attacks, and an effective SQL injection detection method is very important.

In the prior art, a Naive Bayes algorithm (Naive Bayes) is generally adopted in the SQL injection detection method based on the Bayes theory, and in the practical application process, the technical problems of high misjudgment rate and long learning time exist in the prior art.

Therefore, how to provide an improved Bayesian theory SQL injection detection method, which can overcome the technical problems, has high learning speed and high accuracy, and is a technical problem to be solved by the technicians in the field.

Disclosure of Invention

In order to solve the technical problems, the application provides an improved SQL injection detection method based on Bayesian theory, which can overcome the technical problems, has the technical effects of high learning speed and high accuracy, is suitable for online processing of data streams, can effectively identify and intercept SQL injection attack behaviors, and ensures benefits in aspects of enterprise safety, user privacy and the like.

The technical scheme provided by the application is as follows:

the application provides an SQL injection detection method of an improved Bayesian theory, which comprises the following steps: collecting SQL request data; preprocessing the collected SQL request data; feature selection is carried out on the preprocessed SQL request data; counting characteristic quantities; storing SQL request data in a database; performing secondary processing on SQL request data in a database to avoid excessive fitting and excessive complexity; establishing an improved SQL injection detection model; training and optimizing the improved SQL injection detection model to obtain a trained improved SQL injection detection model; and aiming at new SQL request data, predicting through a trained improved SQL injection detection model, and judging whether the new SQL request has SQL injection attack behaviors or not.

Among other things, in learning data, efficient features need to be selected to improve model accuracy. In the feature selection process, a specific algorithm is adopted to perform feature selection according to the preprocessed data, so that excessive fitting and excessive complexity are avoided.

The data related to the application are stored in a database to be managed so as to train and predict the model. This part of the work requires specialized database management techniques to ensure the security and reliability of the data.

Further, in a preferred mode of the present application, the step of "preprocessing the collected SQL request data" includes: character set conversion, HTML tag removal, transcoding, URL extraction.

Further, in a preferred mode of the present application, the feature quantity in the step "statistical feature quantity" includes a category; attribute information; and (5) characteristic distribution.

Further, in a preferred form of the application, the "category" includes: SQL attack/normal request; "attribute information" includes request type, parameter number; the "feature distribution" includes: probability distribution of SQL injection occurs.

In the establishment of the modified bayesian algorithm, statistical feature quantities required for calculating conditional probabilities are required, and these feature quantities include categories (SQL attacks/normal requests), attribute information (request types, parameter numbers, etc.), feature distributions (probability distributions where SQL injection occurs, etc.), and the like.

Further, in a preferred mode of the present application, the step of "performing secondary processing on the SQL request data in the database to avoid overfitting and excessive complexity" includes the steps of: extracting features by adopting a method comprising character features, bag-of-word model features and semantic features; selecting the characteristics by an embedded characteristic selecting method; and adopting a PCA method to perform characteristic dimension reduction.

Further, in a preferred form of the application, the step of "building a modified SQL injection detection model" comprises: introducing priori knowledge into a traditional SQL injection detection model based on Bayesian theory; gaussian process regression methods are used to optimize bayesian algorithm performance.

Further, in a preferred form of the application, the step of "training and optimizing the modified SQL injection detection model" comprises the steps of: training a model by using the labeling data, and calculating the probability of each parameter; and testing the performance of the model by using the test set, and calculating the test index.

Further, in a preferred mode of the present application, the test index includes: classification accuracy, recall, F1 value.

Further, in a preferred form of the application, collecting the data source of the SQL request data comprises: access log of Web server and message data grabbed by network sniffing tool. The method aims at collecting SQL request data in a network, and the data quantity is sufficient to be noticed when the data is collected and the application scenes as much as possible are covered.

In addition, the application also provides electronic equipment, which comprises: a computer program for executing the SQL injection detection method of the improved Bayesian theory as described above; a memory for storing a computer program; a processor for executing a computer program.

Compared with the prior art, the SQL injection detection method of the improved Bayesian theory provided by the application comprises the following steps: collecting SQL request data; preprocessing the collected SQL request data; feature selection is carried out on the preprocessed SQL request data; counting characteristic quantities; storing SQL request data in a database; performing secondary processing on SQL request data in a database to avoid excessive fitting and excessive complexity; establishing an improved SQL injection detection model; training and optimizing the improved SQL injection detection model to obtain a trained improved SQL injection detection model; and aiming at new SQL request data, predicting through a trained improved SQL injection detection model, and judging whether the new SQL request has SQL injection attack behaviors or not. Compared with the prior art, the technical scheme has the technical effects of high learning speed and high accuracy, is suitable for online processing of data streams and the like, and can effectively identify and intercept SQL injection attack behaviors, so that the benefits of enterprise safety, user privacy and the like are ensured. In addition, the application also relates to electronic equipment, which also has the beneficial effects.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall flow diagram of an SQL injection detection method of the improved bayesian theory according to an embodiment of the present application.

FIG. 2 is a flow chart of a method for performing secondary processing on SQL request data in a database to avoid overfitting and excessive complexity according to an embodiment of the application.

Detailed Description

In order that those skilled in the art will better understand the technical solutions of the present application, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It will be understood that when an element is referred to as being "fixed" or "disposed" on another element, it can be directly on the other element or be indirectly on the other element; when an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "first," "second," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are merely for convenience in describing and simplifying the description based on the orientation or positional relationship shown in the drawings, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus are not to be construed as limiting the application.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" or "a number" means two or more, unless specifically defined otherwise.

It should be understood that the structures, proportions, sizes, etc. shown in the drawings are for the purpose of understanding and reading the disclosure, and are not intended to limit the scope of the application, which is defined by the claims, but rather by the claims, unless otherwise indicated, and that any structural modifications, proportional changes, or dimensional adjustments, which would otherwise be apparent to those skilled in the art, would be made without departing from the spirit and scope of the application.

Referring to fig. 1 to 2, an embodiment of the present application provides an SQL injection detection method of an improved bayesian theory, including the steps of: collecting SQL request data; preprocessing the collected SQL request data; feature selection is carried out on the preprocessed SQL request data; counting characteristic quantities; storing SQL request data in a database; performing secondary processing on SQL request data in a database to avoid excessive fitting and excessive complexity; establishing an improved SQL injection detection model; training and optimizing the improved SQL injection detection model to obtain a trained improved SQL injection detection model; and aiming at new SQL request data, predicting through a trained improved SQL injection detection model, and judging whether the new SQL request has SQL injection attack behaviors or not. Compared with the prior art, the technical scheme has the technical effects of high learning speed and high accuracy, is suitable for online processing of data streams and the like, and can effectively identify and intercept SQL injection attack behaviors, so that the benefits of enterprise safety, user privacy and the like are ensured. In addition, the application also relates to electronic equipment, which also has the beneficial effects.

Specifically, in an embodiment of the present application, the step of "preprocessing the collected SQL request data" includes: character set conversion, HTML tag removal, transcoding, URL extraction.

Specifically, in the embodiment of the present application, the feature quantity in the step "statistical feature quantity" includes a category; attribute information; and (5) characteristic distribution.

Specifically, in an embodiment of the present application, "category" includes: SQL attack/normal request; "attribute information" includes request type, parameter number; the "feature distribution" includes: probability distribution of SQL injection occurs.

Specifically, in an embodiment of the present application, the step of "performing secondary processing on SQL request data in a database to avoid overfitting and excessive complexity" includes the steps of: extracting features by adopting a method comprising character features, bag-of-word model features and semantic features; adopting an embedded feature selection method to select features; and adopting a PCA method to perform characteristic dimension reduction.

Specifically, in an embodiment of the present application, the step of "building a modified SQL injection detection model" includes: introducing priori knowledge into a traditional SQL injection detection model based on Bayesian theory; gaussian process regression methods are used to optimize bayesian algorithm performance.

Specifically, in an embodiment of the present application, the step of "training and optimizing the modified SQL injection detection model" includes the steps of: training a model by using the labeling data, and calculating the probability of each parameter; and testing the performance of the model by using the test set, and calculating the test index.

Specifically, in an embodiment of the present application, the test index includes: classification accuracy, recall, F1 value.

Specifically, in an embodiment of the present application, collecting data sources of SQL request data includes: access log of Web server and message data grabbed by network sniffing tool. The method aims at collecting SQL request data in a network, and the data quantity is sufficient to be noticed when the data is collected and the application scenes as much as possible are covered. In addition, an embodiment of the present application further provides an electronic device, including: a computer program for executing the SQL injection detection method of the improved Bayesian theory as described above; a memory for storing a computer program; a processor for executing a computer program.

More specifically, with the rapid development of the internet, various Web applications are continuously emerging, which brings great convenience to enterprises and individuals, however, the following network attack is unavoidable, the SQL injection attack is one of the network attacks, and one of the most common attacks, and the adoption of an effective SQL injection detection method is very important.

In the prior art, a Naive Bayes algorithm (Naive Bayes) is generally adopted, and the algorithm has the advantages of high speed, capability of processing high-dimensional data, high accuracy and the like, but has the defects of high misjudgment rate, long learning time and the like in the practical application process, so that the improved SQL injection detection method based on the Bayes theory is needed.

The implementation process of the SQL injection detection method based on the improved Bayesian theory mainly comprises the following steps:

1. and (5) data acquisition and preprocessing.

And collecting SQL request data, and filtering and formatting the SQL request data to improve the efficiency of data processing. Mainly comprises the following steps:

(1) Data acquisition

The method aims at collecting SQL request data in a network, wherein the data sources can be access logs of a Web server side, message data grabbed by a network sniffing tool and the like. When collecting data, the data volume should be sufficient and cover as many application scenes as possible.

(2) Data preprocessing

The original data collected in the SQL injection detection method exists in a text form, and needs to be preprocessed for subsequent processing. Preprocessing includes character set conversion, removal of HTML tags, transcoding, extraction of URLs, and the like.

(3) Feature selection

In learning data, efficient features need to be selected to improve model accuracy. In the feature selection process, a specific algorithm is adopted to perform feature selection according to the preprocessed data, so that excessive fitting and excessive complexity are avoided.

(4) Statistics of feature quantity

In the establishment of the improved bayesian algorithm, the statistical feature quantity required for calculating the conditional probability is required. These feature quantities include categories (SQL attacks/normal requests), attribute information (request type, number of parameters, etc.), feature distribution (probability distribution of occurrence of SQL injection, etc.), etc.

(5) Data storage

The data after the search is stored in a database to be managed so as to train and predict the model. This part of the work requires specialized database management techniques to ensure the security and reliability of the data.

2. Feature extraction and selection.

And extracting the characteristics of the input data through a specific algorithm, and selecting the characteristics to avoid excessive fitting and excessive complexity. Mainly comprises the following steps:

(1) Feature extraction

The purpose of feature extraction is to convert the raw data into feature vectors for subsequent analysis and processing of the feature vectors. The feature extraction method is numerous, and the feature extraction method adopts the methods of character features, word bag model features and semantic features to extract the features.

(2) Feature selection

The purpose of feature selection is to select an optimal feature subset from among all possible features to improve the accuracy and efficiency of the model. The present application uses an embedded feature selection method.

The embedded feature selection method fuses the training process of feature selection and classification algorithm together to avoid the problem of low search efficiency caused by large space of feature search.

(3) Feature dimension reduction

Feature dimension reduction is the mapping of high-dimensional feature vectors to low-dimensional space to reduce computation effort and complexity of the model. The application adopts a Principal Component Analysis (PCA) method to perform characteristic dimension reduction.

3. Establishment of an improved Bayesian algorithm.

In the traditional SQL injection detection model based on Bayesian theory, the correlation calculation is carried out on the characteristics by introducing the technologies such as conditional probability and the like, so as to obtain the final mark and statistics required by evaluation. Priori knowledge and model optimization technology are introduced into the improved Bayesian algorithm to increase the stability and accuracy of the model, and model parameters are adjusted to improve the accuracy of the model.

(1) Introduction of a priori knowledge

When the actual problem is processed, the model is adjusted according to prior knowledge in the related field, and the accuracy of the model is improved. In bayesian algorithms, experience priors can be determined from historical data or expertise and then added to the model. In addition, the Bayesian optimization algorithm can integrate experience priors into the optimization process to improve the optimization effect.

(2) Introduction of model optimization techniques

Gaussian Process Regression (GPR) methods are used to optimize the performance of bayesian algorithms.

4. Model training and prediction.

The model is trained using the annotation data, and probabilities for the various parameters are calculated. Similar to the basic bayesian algorithm.

And testing the performance of the model by using a testing set, and calculating indexes such as classification accuracy, recall rate, F1 value and the like. According to the test result, the algorithm can be adjusted and optimized to improve the performance.

And training the model by taking part of data as a learning sample, and optimizing the model to obtain a final SQL injection detection model. And aiming at new SQL request data, predicting through a trained model, and judging whether SQL injection attack behaviors exist in the request.

Compared with the naive Bayes algorithm, the improved Bayes algorithm has many advantages such as high learning speed, high accuracy, suitability for online processing of data streams and the like while solving the defects of the naive Bayes algorithm. The method can effectively identify and intercept SQL injection attack behaviors, thereby guaranteeing benefits in aspects of enterprise safety, user privacy and the like.

It is to be understood that the construction and arrangement of the application herein shown in the various exemplary embodiments is illustrative only. Although only a few embodiments have been described in detail in this disclosure, those skilled in the art who review this disclosure will readily appreciate that many modifications are possible (e.g., variations in sizes, dimensions, structures, shapes and proportions of the various elements, values of parameters (e.g., temperature, pressure, etc.), mounting arrangements, use of materials, colors, orientations, etc.) without materially departing from the novel teachings and advantages of the subject matter described in this application. For example, elements shown as integrally formed may be constructed of multiple parts or elements, the position of elements may be reversed or otherwise varied, and the nature or number of discrete elements or positions may be altered or varied. Accordingly, all such modifications are intended to be included within the scope of present application. The order or sequence of any process or method steps may be varied or re-sequenced according to alternative embodiments. In the claims, any means-plus-function clause is intended to cover the structures described herein as performing the recited function and not only structural equivalents but also equivalent structures. Other substitutions, modifications, changes and omissions may be made in the design, operating conditions and arrangement of the exemplary embodiments without departing from the scope of the present applications. Therefore, the application is not limited to the specific embodiments, but extends to various modifications that nevertheless fall within the scope of the appended claims.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The SQL injection detection method of the improved Bayesian theory is characterized by comprising the following steps:

collecting SQL request data;

preprocessing the collected SQL request data;

feature selection is carried out on the preprocessed SQL request data;

counting characteristic quantities; storing SQL request data in a database;

performing secondary processing on SQL request data in a database to avoid excessive fitting and excessive complexity;

establishing an improved SQL injection detection model;

training and optimizing the improved SQL injection detection model to obtain a trained improved SQL injection detection model;

and aiming at new SQL request data, predicting through a trained improved SQL injection detection model, and judging whether the new SQL request has SQL injection attack behaviors or not.

2. The method for detecting SQL injection of the modified bayesian theory according to claim 1, wherein the step of preprocessing the collected SQL request data comprises: character set conversion, HTML tag removal, transcoding, URL extraction.

3. The method for detecting the SQL injection of the improved Bayesian theory according to claim 2, wherein the feature quantity in the step of counting the feature quantity comprises a category; attribute information; and (5) characteristic distribution.

4. A modified bayesian-based SQL injection detection method according to claim 3, wherein the "category" comprises: SQL attack/normal request; "attribute information" includes request type, parameter number; the "feature distribution" includes: probability distribution of SQL injection occurs.

5. The method for detecting the SQL injection of the improved Bayesian theory according to claim 4, wherein the step of performing the secondary processing on the SQL request data in the database to avoid the overfitting and the excessive complexity comprises the steps of: extracting features by adopting a method comprising character features, bag-of-word model features and semantic features; adopting an embedded feature selection method to select features; and adopting a PCA method to perform characteristic dimension reduction.

6. The method for detecting SQL injection of modified bayesian theory according to claim 3, wherein the step of establishing a modified SQL injection detection model comprises: introducing priori knowledge into a traditional SQL injection detection model based on Bayesian theory; gaussian process regression methods are used to optimize bayesian algorithm performance.

7. The method for detecting the SQL injection of the modified Bayesian theory according to claim 5, wherein the step of training and optimizing the modified SQL injection detection model comprises the steps of: training a model by using the labeling data, and calculating the probability of each parameter; and testing the performance of the model by using the test set, and calculating the test index.

8. The method for detecting SQL injection according to claim 7, wherein the test indicators comprise: classification accuracy, recall, F1 value.

9. The improved bayesian theory of SQL injection detection method according to any one of claims 1 to 8, wherein collecting the data source of the SQL request data comprises: access log of Web server and message data grabbed by network sniffing tool.

10. An electronic device, comprising:

a computer program for executing the SQL injection detection method of the modified bayesian theory according to any one of claims 1 to 9;

a memory for storing a computer program;

a processor for executing a computer program.