CN117370975B

CN117370975B - Sql injection detection method and system based on deep learning

Info

Publication number: CN117370975B
Application number: CN202311678372.5A
Authority: CN
Inventors: 李东明; 高云; 肖振峰
Original assignee: Guoren Property Insurance Co ltd
Current assignee: Guoren Property Insurance Co ltd
Priority date: 2023-12-08
Filing date: 2023-12-08
Publication date: 2024-03-26
Anticipated expiration: 2043-12-08
Also published as: CN117370975A

Abstract

The invention discloses a deep learning-based sql injection detection method and a deep learning-based sql injection detection system, wherein the method comprises the following steps: acquiring a data set of the sql query statement; for each sql query statement, selecting a most relevant feature subset using a sparse group lasso method; each participant builds a local deep learning model based on the selected feature subset, wherein the parameters of each deep learning model are regularized by using a sparse group lasso method; aggregating model parameters of each participant in a federal learning mode; each participant trains the respective deep learning model; and detecting a new sql query statement based on the trained deep learning model. The invention can effectively protect the data privacy, reduce the risks of data leakage and attack and enhance the safety of the data; the interpretation and generalization capability of the model are improved, and the risk of overfitting is reduced.

Description

Sql injection detection method and system based on deep learning

Technical Field

The invention belongs to the field of computers, and particularly relates to an sql injection detection method and system based on deep learning.

Background

In the rapid development of information technology, network security is a problem that must be faced. The frequent occurrence of network attack events causes significant losses in all respects. Intrusion detection is a common network security defense technique. The network traffic is analyzed by an effective detection means, from which traffic data having characteristics different from normal traffic are identified, in particular, attack behavior against various malicious programs. The deep learning is proposed to further learn features from a large amount of chaotic and unordered high-dimensional data, and has the advantage that a learning model can be built by setting reasonable training parameters to select optimal features. There has been much research currently applied to deep learning in sql injection detection systems for security defense. Deep learning requires a huge data set to train, and the larger the training set size, the better the performance of the model. The richer the network traffic data set, the higher the accuracy of the final model intrusion detection, but the network traffic is collected centrally, which will involve privacy concerns. Existing intrusion detection based on deep learning relies on local network traffic for model training. Different network operators, different organizations, typically do not share the network traffic sets together to construct a complete intrusion detection network traffic data set.

The sql injection detection method based on the deep learning is more, the detection accuracy is higher, and the feasibility of the deep learning in the sql injection detection is proved. Most methods also present problems such as insufficient data sets. Common centralized deep learning methods require various organizations to collect network traffic, which can lead to privacy concerns. If the method such as changing the model structure or generating a new data set is complicated, the method is difficult to apply to the real network scene.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an sql injection detection method based on deep learning, which comprises the following steps:

step S101, acquiring a dataset of sql query sentences, wherein the dataset comprises normal query sentences and malicious sql query sentences;

step S103, extracting features related to sql injection attack for each sql query statement, and selecting a most related feature subset by using a sparse group lasso method;

step S105, each participant builds a local deep learning model based on the selected feature subset, wherein the parameters of each deep learning model are regularized by using a sparse group lasso method;

step S107, aggregating model parameters of all the participants in a federal learning mode;

step S109, each participant trains the respective deep learning model;

and step S1011, detecting a new sql query statement based on the trained deep learning model.

The features related to the sql injection attack at least comprise keywords, special characters and query structures.

Wherein selecting the most relevant feature subset using sparse lasso in step S103 comprises: and calculating the feature weights by using a sparse group lasso method, and selecting the feature with the largest weight as the most relevant feature subset.

The method comprises the steps of calculating feature weights by using a sparse group lasso method, and selecting features with the largest weights as the most relevant feature subsets, wherein the method specifically comprises the following steps:

the features in the dataset are organized into a feature matrix, each row representing a query, and each column representing a feature;

for each query, determining the category to which the query belongs, and encoding the category as a target vector;

constructing an optimization problem, and adding a sparse group lasso penalty term into an objective function;

solving the optimization problem by using a gradient descent method to obtain a characteristic weight vector w;

and setting a threshold according to the weight, and selecting the feature exceeding the threshold as the most relevant feature.

Wherein the objective function is expressed as follows:

，

wherein y is a target vector; x is a feature matrix; w is a feature weight vector;representing the weight for each feature +.>Summing and applying L2 regularization; />Representing summing the different feature groups, corresponding to the keyword, the special character and the query structure, j representing the j-th feature, k representing the k-th feature group; />Representing an L2 norm within the feature set for measuring weights of the feature set; />And->The super-parameters for controlling L2 regularization and sparse group lasso punishment intensity are adjusted according to requirements.

The local deep learning model is an initialized global model, and is distributed to all participants by a central server before federal learning.

Wherein, the step S105 includes:

distributing the selected feature subset to each participant;

each participant uses the distributed feature subsets to construct a local deep learning model;

defining a cross entropy loss function for the local model of each participant;

each participant trains the local model using a local training set to optimize the loss function with sparse group lasso penalty terms.

Wherein optimizing the loss function with sparse group lasso penalty term comprises:

adding a sparse group lasso penalty term to a local loss function, regularizing model parameters, wherein the formula is as follows:

wherein,representing the local loss function,/->Weights representing model parameters +.>Summing and applying L1 regularization; />An L2 norm representing a model parameter for measuring a weight of the model; />And->Is a super-parameter for controlling the regularization intensity of L1 and L2, and can be adjusted according to the requirement.

Wherein, the step S107 includes:

after each participant completes the local model training, model parameters are sent back to a central server for aggregation;

after receiving the model parameters of each participant, the central server aggregates the model parameters according to a preset aggregation method to obtain an updated global model;

distributing the updated global model to each participant;

repeating the above three steps until reaching the iteration stop condition.

The invention also provides an sql injection detection system based on deep learning, which is characterized by comprising:

the acquisition module is used for acquiring a data set of the sql query statement, wherein the data set comprises a normal query statement and a malicious sql query statement;

the feature subset extraction module is used for extracting features related to the sql injection attack for each sql query statement and selecting the most relevant feature subset by using a sparse group lasso method;

the model construction module is used for constructing a local deep learning model by each participant based on the selected feature subset, wherein the parameters of the respective deep learning model are regularized by using a sparse group lasso method;

the parameter aggregation module is used for aggregating model parameters of all the participants in a federal learning mode;

the model training module is used for training the respective deep learning model by each participant;

the detection module is used for detecting a new sql query statement based on a trained deep learning model;

and the central server is used for assisting the modules to update parameters of the deep learning model of each participant.

Compared with the prior art, the invention has the following beneficial effects:

1. privacy protection: federal learning methods allow participants to train models locally without sharing sensitive raw data. Thus, the data privacy can be effectively protected, and the method is particularly important for tasks related to sensitive information (such as sql injection detection);

2. data security: federal learning does not require data to be transmitted to a central server, only model parameters. The risk of data leakage and attack is reduced, and the safety of the data is enhanced;

3. distributed feature selection: sparse lasso may be used to make feature selection on each participant's local data. By selecting the feature with the greatest weight, the most relevant feature subset can be obtained. This helps to reduce feature dimensions, improving the interpretability and generalization ability of the model;

4. model generalization ability: the sparse group lasso method regularizes model parameters, encouraging the model to generate sparse weights. This helps to improve the generalization ability of the model, reducing the risk of overfitting;

5. merging multiparty models: federal learning allows model parameters of participants to be aggregated on a central server to generate a global model. Knowledge and characteristics of all parties can be fused by combining the multiparty models, and the model performance is further improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a flowchart showing an sql injection detection method based on deep learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.

It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are only used to distinguish … …. For example, the first … … may also be referred to as the second … …, and similarly the second … … may also be referred to as the first … …, without departing from the scope of embodiments of the present invention.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or device comprising such element.

Alternative embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Embodiment 1,

As shown in fig. 1, the invention discloses an sql injection detection method based on deep learning, which comprises the following steps:

step S109, each participant trains the respective deep learning model;

In one embodiment, the aggregated model is used to detect new sql query statements. The participants take the input query statement as input, and output corresponding prediction results by using the aggregation model to indicate whether the sql injection attack exists.

Embodiment II,

The invention provides an sql injection detection method based on deep learning, which comprises the following steps:

step S109, each participant trains the respective deep learning model;

Features associated with sql injection attacks may be extracted as feature extraction is performed locally for each participant. These features may include the following:

keywords (Keywords): keywords commonly used in sql query statements, such as SELECT, INSERT, UPDATE, DELETE, etc. These keywords are typically associated with sql injection attacks.

Special character (Special Characters): special characters in sql query statements, such as quotation marks (') and semicolons (;). These special characters are often used in sql injection attacks to bypass input verification and inject malicious code.

Query Structure (Query Structure): the structure and components of sql query statements, such as table names, column names, operators, logical statements, and the like. sql injection attacks typically attempt to modify the query structure for attack purposes.

Each participant (e.g., a different organization or device) collects a local dataset containing normal and malicious sql query statements. These query statements should be labeled and vectorized.

In one embodiment, feature extraction in sql injection is typically as follows:

keyword feature extraction:

One-Hot Encoding: each keyword is mapped into a binary vector, the position of the corresponding keyword in the vector is 1, and the other positions are 0.

TF-IDF (word frequency-inverse document frequency): the method is used for measuring the importance of the keywords in the query statement, and the calculation formula is as follows:

tf= (number of occurrences of keyword in query sentence)/(total number of words in query sentence)

Idf=log ((total number of query sentences)/(number of query sentences including the keyword))

TF-IDF = TF * IDF

Special character feature extraction:

counting the occurrence times: the number of times a special character appears in the query statement is calculated as a feature.

Proportion statistics: and calculating the proportion of the special characters to the total characters in the query sentence as a characteristic.

Inquiring structural feature extraction:

n-gram characteristics: the query statement is divided into N consecutive sub-sequences, each of which is taken as a feature.

Syntax parsing tree: by analyzing the grammar structure of the query statement, the nodes and edges in the tree are extracted as features.

for each query, the category to which it belongs (normal query or sql injection query) is determined and the category is encoded as a target vector. For example, a normal query may be represented using 0 and an sql injection query may be represented using 1.

Wherein the objective function is expressed as follows:

，

wherein y is a target vector; x is a feature matrix; w is a feature weight vector;representing the weight for each feature/>Summing and applying L2 regularization; />Representing summing the different feature groups, corresponding to the keyword, the special character and the query structure, j representing the j-th feature, k representing the k-th feature group; />Representing an L2 norm within the feature set for measuring weights of the feature set; />And->The super-parameters for controlling L2 regularization and sparse group lasso punishment intensity are adjusted according to requirements.

Common deep learning models, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), may be used to learn patterns and relationships between features.

Wherein, the step S105 includes:

distributing the selected feature subset to each participant;

defining a cross entropy loss function for the local model of each participant;

Wherein, the step S107 includes:

distributing the updated global model to each participant;

repeating the above three steps until reaching the iteration stop condition.

In one embodiment, in federal learning, to protect the data privacy of the participants, secure aggregation algorithms, such as cryptographic aggregation or differential private aggregation, are often employed.

Encryption aggregation (Encrypted Aggregation):

encryption aggregation uses encryption techniques to protect model parameters of a participant. Common cryptographic aggregation algorithms include homomorphic encryption and secure multiparty computing.

Homomorphic encryption: homomorphic encryption is a special encryption method that allows data to be calculated in the encrypted state. The party can encrypt the model parameters by using homomorphic encryption and then send the encrypted model parameters to the aggregation party, and the aggregation party uses homomorphic decryption technology to conduct aggregation calculation on the encrypted parameters without knowing the original parameter values. Secure multiparty computing: secure Multi-party computing (SMPC) is a protocol that allows multiple parties to compute without revealing private data. The participants can encrypt the model parameters of the participants by using a secure multiparty computing protocol and then send the encrypted model parameters to the aggregator, and the aggregator uses the secure multiparty computing protocol to conduct aggregate computation on the encrypted parameters.

Differential private aggregation (Differential Privacy Aggregation):

differential private aggregation protects the privacy of participants by adding noise. Differential privacy is a privacy protection method that protects the privacy of individual data by introducing a degree of uncertainty in the calculation results. The participants may apply a differential privacy mechanism locally to their model parameters and then send the noisy parameters to the aggregator for aggregate computation.

The specific differential private aggregation formula will depend on the differential privacy mechanism used. Common differential privacy mechanisms include the laplace mechanism and the gaussian mechanism. These mechanisms achieve privacy protection by adding noise to the model parameters that conforms to the laplace or gaussian distribution.

Taking the laplace mechanism as an example, the laplace mechanism achieves differential privacy protection by adding laplace noise to the model parameters. The Laplace noise conforms to the Laplace distribution with the following probability density function:

；

where x is the noise value, and where,is the center position of the noise, and b is the scale parameter of the noise.

In differential private aggregation, the participants can apply the laplace mechanism locally to their model parameters, and then send the parameters with laplace noise to the aggregator for aggregation computation. The specific differential private aggregation calculation formula is as follows:

；

where N is the number of participants,representing the summation operation, private_parameter is the Private parameter of the participant, and Laplace_noise is the noise sampled from the Laplace distribution.

Third embodiment,

Fourth embodiment,

The disclosed embodiments provide a non-transitory computer storage medium storing computer executable instructions that perform the method steps described in the embodiments above.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local Area Network (AN) or a Wide Area Network (WAN), or can be connected to AN external computer (for example, through the Internet using AN Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The foregoing description of the preferred embodiments of the present invention has been presented for purposes of clarity and understanding, and is not intended to limit the invention to the particular embodiments disclosed, but is intended to cover all modifications, alternatives, and improvements within the spirit and scope of the invention as outlined by the appended claims.

Claims

1. An sql injection detection method based on deep learning, which is characterized by comprising the following steps:

step S109, each participant trains the respective deep learning model;

step S1011, detecting a new sql query statement based on a trained deep learning model;

the local deep learning model is an initialized global model, and is distributed to all participants by a central server before federal learning;

wherein the step S105 includes:

distributing the selected feature subset to each participant;

defining a cross entropy loss function for the local model of each participant;

each participant uses a local training set to train the local model, and a loss function with sparse group lasso penalty term is optimized;

wherein,representing the local loss function,/->Weights representing model parameters +.>Summing and applying L1 regularization; />An L2 norm representing a model parameter for measuring a weight of the model; />And->The super-parameters for controlling the regularization intensity of the L1 and the L2 are adjusted according to the requirements.

2. The method of claim 1, wherein the sql injection attack-related features include at least keywords, special characters, and query structures.

3. The method of claim 2, wherein selecting the most relevant feature subset using sparse lasso in step S103 comprises: and calculating the feature weights by using a sparse group lasso method, and selecting the feature with the largest weight as the most relevant feature subset.

4. A method according to claim 3, wherein the feature weights are calculated using a sparse group lasso method, and the feature with the greatest weight is selected as the most relevant feature subset, comprising in particular:

5. The method of claim 1, wherein said step S107 comprises:

distributing the updated global model to each participant;

repeating the above three steps until reaching the iteration stop condition.

6. An sql injection detection system based on deep learning, the system comprising:

the central server is used for assisting the modules to update parameters of the deep learning model of each participant;

the model construction module is specifically used for:

distributing the selected feature subset to each participant;

defining a cross entropy loss function for the local model of each participant;