CN113965377A

CN113965377A - Attack behavior detection method and device

Info

Publication number: CN113965377A
Application number: CN202111226985.6A
Authority: CN
Inventors: 杨鹤
Original assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd; Beijing Topsec Network Security Technology Co Ltd; Beijing Topsec Software Co Ltd
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-01-21

Abstract

The application provides an attack behavior detection method and device, which are applied to the field of network security, and the method comprises the following steps: acquiring URL data to be detected, and extracting a parameter domain field to be detected in the URL data to be detected; preprocessing a parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected; and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected. In the above scheme, the parameter domain field to be detected is preprocessed, so that the parameter domain field to be detected is converted into the corresponding feature vector to be detected, and the detection of the attack behavior is performed based on the feature vector to be detected by using the attack behavior detection model. The parameter domain field in the URL data is directly detected in the detection process, so that the detection accuracy can be improved.

Description

Attack behavior detection method and device

Technical Field

The application relates to the field of network security, in particular to an attack behavior detection method and device.

Background

With the increasing popularization of network services, the behavior of lawless persons to attack by using networks is increasingly common. A common means of network attack adopted by lawbreakers is to introduce special parameters into a parameter field of Uniform Resource Locator (URL) data, thereby implementing attack behaviors such as Structured Query Language (SQL) injection attack, Cross Site Scripting (XSS) attack, and information leakage attack.

In the prior art, detection methods such as blacklist, rule matching, machine learning and the like are generally adopted to detect the attack behavior. However, the accuracy of the above-described various detection methods is low.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for detecting an attack behavior, so as to solve the technical problem that the detection method provided in the prior art is low in accuracy.

In a first aspect, an embodiment of the present application provides an attack behavior detection method, including: acquiring URL data of a uniform resource locator to be detected, and extracting a parameter domain field to be detected in the URL data to be detected; preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected; and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected. In the above scheme, the parameter domain field to be detected in the URL data is extracted, so that the parameter domain field in the URL data is detected. The parameter domain field to be detected is converted into a corresponding characteristic vector to be detected by preprocessing the parameter domain field to be detected, so that the attack behavior detection model is utilized to detect the attack behavior based on the characteristic vector to be detected. The parameter domain field in the URL data is directly detected in the detection process, so that the detection accuracy can be improved.

In an optional embodiment, the preprocessing the parameter domain field to be detected to convert the parameter domain field to be detected into a corresponding feature vector to be detected includes: converting the parameter domain field to be detected into parameter domain mapping data to be detected by using a predetermined skip-gram data mapping table; and extracting the characteristic vector to be detected corresponding to the parameter domain mapping data to be detected by using a pre-trained characteristic extraction model. In the scheme, the parameter domain field to be detected in the character form can be converted into the detection parameter domain mapping data in the vector form by using the skip-gram data mapping table, and the relation between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced. In addition, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

In an optional implementation manner, before the obtaining URL data of the to-be-detected uniform resource locator, the method further includes: acquiring sample data; the sample data comprises positive sample data and negative sample data, wherein the positive sample data comprises a normal parameter domain field, and the negative sample data comprises an abnormal parameter domain field; preprocessing the sample data to convert the sample data into corresponding sample characteristic vectors; and training a random forest algorithm model by using the sample feature vector to obtain the attack behavior detection model. In the scheme, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

In an optional embodiment, the training a random forest algorithm model by using the sample feature vector to obtain the attack behavior detection model includes: carrying out random back sampling on the sample feature vectors by using a self-service method to generate n training sets; wherein n is a positive integer greater than zero; respectively training n decision tree models by using the n training sets to obtain n trained decision tree models; and forming the attack behavior detection model according to the n trained decision tree models. In the scheme, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

In an optional embodiment, after the obtaining of the sample data, the method further comprises: generating a corresponding character table according to the URL parameter domain field; training a skip-gram encoder by using the character table and the sample data to obtain a trained skip-gram encoder; and generating the skip-gram data mapping table according to the skip-gram encoder. In the scheme, the skip-gram data mapping table generated by the skip-gram encoder can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced.

In an optional embodiment, after the obtaining of the sample data, the method further comprises: inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector; training a skip-gram model by using the sample vector to obtain a trained skip-gram model; and generating the skip-gram data mapping table according to the skip-gram encoder and the skip-gram model. In the scheme, the skip-gram data mapping table generated by the skip-gram encoder and the skip-gram model can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is shortened.

In an optional embodiment, after the obtaining of the sample data, the method further comprises: converting the sample data into sample mapping data by using a predetermined skip-gram data mapping table; and training a convolutional neural network model by using the sample mapping data to obtain the feature extraction model. In the scheme, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

In a second aspect, an embodiment of the present application provides an attack behavior detection apparatus, including: the device comprises a first acquisition module, a second acquisition module and a parameter domain extraction module, wherein the first acquisition module is used for acquiring URL data of a uniform resource locator to be detected and extracting a parameter domain field to be detected in the URL data to be detected; the first preprocessing module is used for preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding feature vector to be detected; and the first input module is used for inputting the feature vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected. In the above scheme, the parameter domain field to be detected in the URL data is extracted, so that the parameter domain field in the URL data is detected. The parameter domain field to be detected is converted into a corresponding characteristic vector to be detected by preprocessing the parameter domain field to be detected, so that the attack behavior detection model is utilized to detect the attack behavior based on the characteristic vector to be detected. The parameter domain field in the URL data is directly detected in the detection process, so that the detection accuracy can be improved.

In an optional embodiment, the first preprocessing module is specifically configured to: converting the parameter domain field to be detected into parameter domain mapping data to be detected by using a predetermined skip-gram data mapping table; and extracting the characteristic vector to be detected corresponding to the parameter domain mapping data to be detected by using a pre-trained characteristic extraction model. In the scheme, the parameter domain field to be detected in the character form can be converted into the detection parameter domain mapping data in the vector form by using the skip-gram data mapping table, and the relation between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced. In addition, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

In an optional embodiment, the attack behavior detection apparatus further includes: the second acquisition module is used for acquiring sample data; the sample data comprises positive sample data and negative sample data, wherein the positive sample data comprises a normal parameter domain field, and the negative sample data comprises an abnormal parameter domain field; the second preprocessing module is used for preprocessing the sample data so as to convert the sample data into corresponding sample characteristic vectors; and the first training module is used for training a random forest algorithm model by using the sample characteristic vector to obtain the attack behavior detection model. In the scheme, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

In an optional embodiment, the first training module is specifically configured to: carrying out random back sampling on the sample feature vectors by using a self-service method to generate n training sets; wherein n is a positive integer greater than zero; respectively training n decision tree models by using the n training sets to obtain n trained decision tree models; and forming the attack behavior detection model according to the n trained decision tree models. In the scheme, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

In an optional embodiment, the attack behavior detection apparatus further includes: the first generation module is used for generating a corresponding character table according to the URL parameter domain field; the second training module is used for training the skip-gram encoder by utilizing the character table and the sample data to obtain a trained skip-gram encoder; and the second generation module is used for generating the skip-gram data mapping table according to the skip-gram encoder. In the scheme, the skip-gram data mapping table generated by the skip-gram encoder can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced.

In an optional embodiment, the attack behavior detection apparatus further includes: the second input module is used for inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector; the third training module is used for training the skip-gram model by using the sample vector to obtain the trained skip-gram model; and the third generation module is used for generating the skip-gram data mapping table according to the skip-gram encoder and the skip-gram model. In the scheme, the skip-gram data mapping table generated by the skip-gram encoder and the skip-gram model can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is shortened.

In an optional embodiment, the attack behavior detection apparatus further includes: the conversion module is used for converting the sample data into sample mapping data by utilizing a predetermined skip-gram data mapping table; and the fourth training module is used for training a convolutional neural network model by using the sample mapping data to obtain the feature extraction model. In the scheme, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory, and a bus; the processor and the memory are communicated with each other through the bus; the memory stores program instructions executable by the processor, the processor invoking the program instructions to be able to perform the attack behavior detection method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions, which, when executed by a computer, cause the computer to perform the method for detecting an attack behavior according to the first aspect.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of an attack behavior detection method provided in an embodiment of the present application;

fig. 2 is a block diagram of an attack behavior detection apparatus according to an embodiment of the present application;

fig. 3 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The inventor notices that, among various attack behaviors performed on the URL data, the SQL injection attack mode and the XSS attack mode implemented based on the URL parameter domain account for a large proportion and have high harm. Attack behavior detection is performed on SQL injection attack mode, XSS attack mode and the like in URL data, and modes of blacklist detection, rule matching detection and machine learning detection are generally adopted.

And the blacklist detection means that security personnel predefine some common sensitive character blacklists for constructing malicious attacks such as SQL injection, XSS attack and the like, and after URL data provided by a user is sent to a server, the URL data is firstly matched with sensitive characters in the sensitive character blacklist. If the sensitive characters in the sensitive character blacklist are matched, the URL data is shown to have attack risk, and the server side discards the request data; if the sensitive characters in the sensitive character blacklist are not matched, the server side can analyze and respond to the sensitive characters. The blacklist detection is easy to cause false alarm and false negative, and meanwhile, the sensitive character blacklist is high in maintenance cost and low in updating rate.

And the rule detection means that security personnel construct a rule base according to the URL format characteristics under the malicious attack conditions such as SQL injection, XSS attack and the like, and perform rule matching on the format of the URL data provided by the user. If the matching is successful, the URL data is shown to have attack risk, and the server side discards the request data; if the matching is unsuccessful, the server side can analyze and respond to the matching. The method has the advantages that the whole URL data is required to be detected by adopting rule detection, more data redundancy exists, and the detection can be only carried out aiming at the known attack characteristics, so the detection accuracy is low.

The machine learning detection is to detect malicious attack behaviors by using a decision tree, a support vector machine, a multi-layer neural network algorithm and the like based on statistical characteristic data of sensitive words or characters as characteristic attributes of the whole detection technology, such as the length of a URL, the entropy of the URL, the number of the sensitive words, the type of the sensitive words, the number of parameters, the type of the parameters and the like. The detection accuracy is low because the sensitive words or characters predefined in the machine learning detection may not be comprehensive enough.

In order to solve the problem of low accuracy of detection of the attack behavior, the inventor finds that detection of the attack behavior can be realized by detecting parameter domain data in the URL data, so that the accuracy of detection of the attack behavior is improved.

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a flowchart of an attack behavior detection method according to an embodiment of the present application, where the attack behavior detection method may be applied to an electronic device. The attack behavior detection method can comprise the following steps:

step S101: and acquiring URL data to be detected, and extracting parameter domain fields to be detected in the URL data to be detected.

Step S102: and preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected.

Step S103: and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected.

Specifically, the URL data is a unique address for identifying resources on the internet, and generally includes three parts: resource type, host domain name where the resource is stored, and resource file name. As an embodiment, the format of URL data may be represented as protocal:// hostname [: port ]/path/[? query ] (IETF request for comment 1738, RFC1738), where "[ ]" denotes optional content.

There are various ways for the electronic device to acquire the URL data to be detected, for example: receiving URL data sent by external equipment; or collecting real-time hypertext Transfer Protocol (HTTP) data and extracting URL data from the HTTP data; or reading URL data and the like in a cloud database or locally stored. The embodiments of the present application are not specifically limited, and those skilled in the art can appropriately select the embodiments according to actual situations.

After the electronic device acquires the URL data to be detected, the parameter domain field to be detected in the URL data to be detected may be extracted. The URL data can be divided into five parts, i.e., a protocol domain, a host name domain, a path name domain, a file name domain and a parameter domain, according to the structural characteristics of the URL data, and the extracted parts in step S101 are the parameter domain parts in the URL data.

It is understood that, for a URL data, only one parameter domain may be included, and the electronic device may extract only the one parameter domain; the URL data may also include a plurality of parameter fields, and the electronic device needs to extract the plurality of parameter fields from the URL data.

For example, taking a URL data as an example, assume that the URL data can be expressed as: http:// www.topsec.comc.cn/product/test/testhtml luer ═ blackbox & room ═ 5ca481e99a88ab02be37bdf3& container ═ true; the http is a protocol domain, www.topsec.comc.cn is a host name domain, product/test is a path name domain, testhtml is a file name domain, a user is a first parameter domain, a room is 5ca481e99a88ab02be37bdf3 is a second parameter domain, and a console is a third parameter domain. As can be seen. The URL data includes three parameter domains.

As an implementation manner, after the step S101, the method for detecting an attack behavior provided by the embodiment of the present application may include the following steps:

judging whether the extracted parameter domain field to be detected is complete;

if the extracted parameter domain field to be detected is complete, executing step S102;

and if the extracted parameter domain field to be detected is incomplete, re-extracting the parameter domain field to be detected in the URL data to be detected.

It can be understood that the above-mentioned determining whether the extracted parameter domain field to be detected is complete includes two cases: in the first case, whether each field of the parameter domain to be detected is complete is judged; in the second case, when the URL data to be detected includes a plurality of parameter domain fields to be detected, it is determined whether all the parameter domain fields to be detected are extracted. The skilled person can select to execute the first case or the second case, or to execute the first case and the second case according to actual situations, and the embodiments of the present application are not limited specifically.

After the to-be-detected parameter domain field is extracted, the electronic device can preprocess the to-be-detected parameter domain field, so that the to-be-detected parameter domain field is converted into a corresponding to-be-detected feature vector. The embodiment of the present application also does not specifically limit the specific implementation manner of the pretreatment, for example: preprocessing the field of the parameter domain to be detected by adopting a convolutional neural network; or, a skip-gram encoder can be adopted to preprocess the field of the parameter domain to be detected; or, a skip-gram model may be adopted to perform preprocessing and the like on the parameter domain field to be detected, and a person skilled in the art may appropriately select the field according to the actual situation.

Then, the electronic device can detect the feature vector to be detected by using the pre-trained attack behavior detection model, so as to obtain an attack behavior detection result corresponding to the URL data to be detected. Wherein, the attack behavior detection result has a plurality of expression forms.

As an embodiment, the attack behavior detection result may include: two results, namely existence of attack behavior and nonexistence of attack behavior; as another embodiment, the attack behavior detection result may include specific attack behaviors, such as: the attack behavior detection result can comprise no attack behavior, existence of SQL injection attack never, existence of XSS attack behavior and existence of other attack behavior. Those skilled in the art can also appropriately adjust the specific implementation manner of the attack behavior detection result according to the actual situation.

In the above scheme, the parameter domain field to be detected in the URL data is extracted, so that the parameter domain field in the URL data is detected. The parameter domain field to be detected is converted into a corresponding characteristic vector to be detected by preprocessing the parameter domain field to be detected, so that the attack behavior detection model is utilized to detect the attack behavior based on the characteristic vector to be detected. The parameter domain field in the URL data is directly detected in the detection process, so that the detection accuracy can be improved.

Further, the step S102 may specifically include the following steps:

and converting the parameter domain field to be detected into parameter domain mapping data to be detected by using a predetermined skip-gram data mapping table.

And extracting the feature vector to be detected corresponding to the parameter domain mapping data to be detected by using a pre-trained feature extraction model.

Specifically, the electronic device may store a predetermined skip-gram data mapping table, or the electronic device may receive a predetermined skip-gram data mapping table sent by an external device, and convert the parameter domain field to be detected into parameter domain mapping data to be detected according to the skip-gram data mapping table.

The skip-gram is one of methods based on distributed thought coding, has strong representation capability, can map the minimum semantic unit of a text corpus into a real number vector, and represents semantic similarity by using the spatial distance of different semantic units; meanwhile, context information can be predicted through the given target character, the application range is wide, and the method is suitable for a large amount of data set operation. Therefore, the embodiment of the application can process the field of the parameter domain to be detected by adopting the skip-gram idea, and complete the information conversion of the field of the parameter domain to be detected.

Similarly, a pre-trained feature extraction model may be stored in the electronic device, or the electronic device may receive the pre-trained feature extraction model sent by the external device, and may extract the feature vector to be detected corresponding to the parameter domain mapping data to be detected by using the feature extraction model. As an embodiment, the feature extraction model may employ a convolutional neural network.

The convolutional neural network is a feedforward neural network which comprises convolution calculation and has a depth structure. The convolutional neural network can comprise three structures of convolution (convolution), activation (activation) and pooling (displacement), has strong characteristic learning capacity, and can perform translation invariant classification on input information according to the hierarchical structure of the input information.

It should be noted that the specific implementation of the skip-gram data mapping table and the specific implementation of the pre-trained feature extraction model will be described in detail in the following embodiments, which will not be described here.

In the scheme, the parameter domain field to be detected in the character form can be converted into the detection parameter domain mapping data in the vector form by using the skip-gram data mapping table, and the relation between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced. In addition, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

Further, as an implementation manner, the attack behavior detection model may adopt a random forest algorithm model. The random forest algorithm model is a statistical learning theory, a decision tree is used as a basic classifier, and the final classification result is determined by the final output result voting of a single decision tree.

In this embodiment, before the step S101, the random forest algorithm model may be trained in advance to obtain a pre-trained attack behavior detection model. That is to say, in step S101, the attack behavior detection method provided in the embodiment of the present application may further include the following steps:

and acquiring sample data.

And preprocessing the sample data to convert the sample data into corresponding sample characteristic vectors.

And training the random forest algorithm model by using the sample feature vector to obtain an attack behavior detection model.

Specifically, the sample data may include two parts, positive sample data including a normal parameter domain field and negative sample data including an abnormal parameter domain field. It is understood that the normal parameter domain field may be extracted from normal (i.e., no aggressive behavior) URL data, and the abnormal parameter domain field may be extracted from abnormal (i.e., no aggressive behavior) URL data.

The normal URL data may include URL data that is accumulated in a plurality of ways in advance and does not have an attack behavior. For example, the normal URL data may include URL data in HTTP data extracted from the firewall gateway; alternatively, the normal URL data may include URL data extracted from the traffic data, and the like.

Similarly, the abnormal URL data may include URL data in which an attack behavior exists, which is accumulated in advance in various ways. For example, the abnormal URL data may include URL data obtained by filling a data field having an attack behavior in the normal URL data according to the rule and manner of the URL parameter domain attack principle; alternatively, the abnormal URL data may include URL data extracted from the collected related open source data.

It is to be understood that, in the embodiment of the present application, a specific obtaining manner of the normal URL data and the abnormal URL data is not specifically limited, and those skilled in the art may appropriately select the data according to actual situations.

After obtaining the sample data, the electronic device may perform preprocessing on the sample data. The method for preprocessing the sample data should be the same as the method for preprocessing the parameter domain field to be detected in step S102, so as to convert the sample data into a sample feature vector having the same form as the feature vector to be detected. Therefore, the input of the random forest algorithm model to be trained can be ensured to be the same as the input of the pre-trained attack behavior detection model, and the accuracy of the detection result is ensured.

After the electronic equipment obtains the sample feature vector, the random forest algorithm model can be trained by using the sample feature vector to obtain an attack behavior detection model. Because the random forest algorithm model comprises a plurality of decision trees, in the trained attack behavior detection model, the attack behavior detection result corresponding to the URL data to be detected can be obtained by voting according to the classification result of each decision tree.

In the scheme, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

Further, a specific implementation of training the random forest algorithm model is described below. The step of training the random forest algorithm model by using the sample feature vector to obtain the attack behavior detection model specifically includes the following steps:

carrying out random back-putting sampling on the sample feature vectors by using a self-service method to generate n training sets; wherein n is a positive integer greater than zero.

And respectively training n decision tree models by using n training sets to obtain n trained decision tree models.

And forming an attack behavior detection model according to the n trained decision tree models.

Specifically, for the sample feature vector, a self-help (bootstrapping) method may be adopted to perform random sample-back sampling, and m samples are taken from the sample feature vector and sampled n times in total, so that n training sets may be generated.

The bootstrapping method is a model verification (evaluation) method, and is based on a Bootstrap Sampling (bootstrapping Sampling) method, that is, Sampling with replacement or repeated Sampling. As an embodiment, the bootstrapping method may specifically be: in a data set (corresponding to the sample characteristic direction in the embodiment of the application), randomly selecting a sample at a time, taking the sample as a training sample, and putting the sample back into the data set, so that the sample is repeatedly sampled m times to generate a data set with the same size as the original data set, wherein the new data set is the training set; repeating the steps n times to obtain n training sets.

Then, n decision tree models can be trained respectively by using the n training sets obtained in the above steps, that is, each training set correspondingly trains one decision tree model. Wherein, for a single decision tree model, assuming that the number of training sample features is n, splitting every timeThe best features may be selected for splitting based on the kini index. For example, for a general decision tree, given a total of K classes, the probability that a sample belongs to class K is: p is a radical of_kThe Gini (p) index can be expressed as:

each decision tree is split until all training examples for that node belong to the same class. Wherein pruning may not be performed during the decision tree splitting process.

Then, the n decision tree models trained by the method form an attack behavior detection model.

Further, a specific embodiment of predetermining the skip-gram data mapping table is described below. After the step of obtaining sample data, the method for detecting an attack behavior provided by the embodiment of the present application may further include the following steps:

and generating a corresponding character table according to the URL parameter domain field.

And training the skip-gram encoder by using the character table and the sample data to obtain the trained skip-gram encoder.

And generating a skip-gram data mapping table according to the skip-gram encoder.

In particular, a composition may first be constructed from all possible occurrences of characters in the URL parameter domain fieldsetting omega as character in character table V and d as dimension of character vector, then w belongs to R^dIs a vector representation of ω ∈ V. Training data (omega, c) is obtained according to the size of the sliding window, and the skip-gram encoder can be trained by the training data to obtain the trained skip-gram encoder.

Where the probability of observing character c in the context of target character ω is:

the probability that character c is not observed in the context of the target character ω is:

the vector w and the vector C are vector representations of the character ω and the character C, respectively.

In the scheme, the skip-gram data mapping table generated by the skip-gram encoder can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced.

Further, another specific embodiment of predetermining the skip-gram data mapping table is described below. After the step of obtaining sample data, the method for detecting an attack behavior provided by the embodiment of the present application may further include the following steps:

and inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector.

And training the skip-gram model by using the sample vector to obtain the trained skip-gram model.

And generating a skip-gram data mapping table according to the skip-gram encoder and the skip-gram model.

Specifically, the skip-gram encoder may be trained first, where the way of training the skip-gram encoder is similar to that in the above embodiments, and is not described here again. Then, inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector as the sample data for training the skip-gram model. In the embodiment of the application, the skip-gram model can further enhance the capability of feature expression output by a skip-gram encoder.

After the electronic equipment obtains the sample vector, the electronic equipment can directly train the skip-gram model; and after sampling the sample vector by using a negative sampling technology, training a skip-gram model by using the sampled data, so that the real target has higher probability as much as possible.

As an embodiment, the structure of the skip-gram model may include: an input layer, a hidden layer (linear neurons without activation function) and an output layer (using softmax activation function), there being a hidden layer weight matrix between the input layer and the hidden layer.

As another embodiment, Noise Contrast Estimation (NCE) may be used to predict the target character. Setting a sample vector D and a corresponding negative-sampling random vector D^′Then the objective function J (θ) in the skip-gram model can be expressed as:

J(θ)＝∑_ω.c∈DPr(D＝1|ω,c)+∑_ω.c∈D′Pr(D＝0|ω,c)。

after the skip-gram model is trained, the trained skip-gram model and a hidden layer weight matrix in the skip-gram model can be obtained. Based on the hidden layer weight matrix, a dense coded vector of characters can be calculated:

w′＝w·W_h；

where vector W' is a dense encoded vector of character ω, vector W is a one-hot vector of character ω, W_hAnd obtaining a hidden layer weight matrix after the skip-gram model training is finished.

Therefore, after the parameter domain field to be detected is input to the skip-gram encoder and the skip-gram model, dense coding of the parameter domain field to be detected can be achieved, and the coded vector is obtained.

In the scheme, the skip-gram data mapping table generated by the skip-gram encoder and the skip-gram model can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is shortened.

Further, a specific embodiment of the pre-training feature extraction model is described below. After the step of obtaining sample data, the method for detecting an attack behavior provided by the embodiment of the present application may further include the following steps:

and converting the sample data into sample mapping data by using a predetermined skip-gram data mapping table.

And training the convolutional neural network model by using the sample mapping data to obtain a feature extraction model.

Specifically, after the skip-gram data mapping table is determined, sample data can be input into the skip-gram data mapping table to convert the sample data into sample mapping data, and then the sample mapping data can be used for training a convolutional neural network model to obtain a feature extraction model.

The structure of the convolutional neural network model may include: input layer, convolutional layer (activation function is ReLU), and pooling layer (with average pooling). Wherein, as an embodiment, the number of the convolution layers may be one; as another embodiment, the number of convolutional layers may be plural, for example: two convolutional layers, three convolutional layers, etc., which are not specifically limited in the embodiments of the present application.

Taking the number of convolutional layers as an example, the input of the convolutional neural network model can be set as sample mapping data X, and since the length of the sample mapping data X is n, the sample mapping data X can be expressed as X_1:n＝X₁X₂…X_n(X_i∈Rⁿ). The feature vectors obtained after two convolutions are respectivelyC and C', the characteristic vector obtained after pooling is P, and the calculation formula is as follows:

wherein, c₁、c₂For convolution kernel size, setp₁For the step size of the convolution,

weight matrix being a convolution kernel, b₁∈R、b₂e.R is a convolution bias term, c₃For the size of the pooling Filter, setp₂Is the pooling step size.

In the scheme, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

Furthermore, when the number of the parameter domain fields to be detected is multiple, after the parameter domain fields to be detected are preprocessed, merging and vectorizing data obtained after preprocessing the multiple parameter domain fields to be detected, so as to obtain the feature vectors to be detected corresponding to the multiple parameter domain fields to be detected.

Referring to fig. 2, fig. 2 is a block diagram of an attack behavior detection apparatus according to an embodiment of the present disclosure, where the attack behavior detection apparatus 200 may include: the first obtaining module 201 is configured to obtain URL data of a uniform resource locator to be detected, and extract a parameter domain field to be detected in the URL data to be detected; a first preprocessing module 202, configured to preprocess the parameter domain field to be detected, so as to convert the parameter domain field to be detected into a corresponding feature vector to be detected; the first input module 203 is configured to input the feature vector to be detected into a pre-trained attack behavior detection model, so as to obtain an attack behavior detection result corresponding to the URL data to be detected.

In the embodiment of the application, the parameter domain field to be detected in the URL data to be detected is extracted, so that the detection of the parameter domain field in the URL data is realized. The parameter domain field to be detected is converted into a corresponding characteristic vector to be detected by preprocessing the parameter domain field to be detected, so that the attack behavior detection model is utilized to detect the attack behavior based on the characteristic vector to be detected. The parameter domain field in the URL data is directly detected in the detection process, so that the detection accuracy can be improved.

Further, the first preprocessing module 202 is specifically configured to: converting the parameter domain field to be detected into parameter domain mapping data to be detected by using a predetermined skip-gram data mapping table; and extracting the characteristic vector to be detected corresponding to the parameter domain mapping data to be detected by using a pre-trained characteristic extraction model.

In the embodiment of the application, the parameter domain field to be detected in the character form can be converted into the detection parameter domain mapping data in the vector form by using the skip-gram data mapping table, and the relation between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time of feature extraction is reduced. In addition, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

Further, the attack behavior detection apparatus 200 further includes: the second acquisition module is used for acquiring sample data; the sample data comprises positive sample data and negative sample data, wherein the positive sample data comprises a normal parameter domain field, and the negative sample data comprises an abnormal parameter domain field; the second preprocessing module is used for preprocessing the sample data so as to convert the sample data into corresponding sample characteristic vectors; and the first training module is used for training a random forest algorithm model by using the sample characteristic vector to obtain the attack behavior detection model.

In the embodiment of the application, before attack detection is carried out, the random forest algorithm model can be trained by using sample data, so that a pre-trained attack behavior detection model is obtained. The random forest algorithm model has good tolerance to noise and abnormal values, the over-fitting problem of a decision tree cannot occur, and the high-dimensional data classification problem has good expandability and parallelism, so that the accuracy of feature extraction can be improved. In addition, the random forest algorithm model is a data-driven non-parameter classification method, and prior knowledge of classification and the like are not needed during training, so that the maintenance cost is low.

Further, the first training module is specifically configured to: carrying out random back sampling on the sample feature vectors by using a self-service method to generate n training sets; wherein n is a positive integer greater than zero; respectively training n decision tree models by using the n training sets to obtain n trained decision tree models; and forming the attack behavior detection model according to the n trained decision tree models.

Further, the attack behavior detection apparatus 200 further includes: the first generation module is used for generating a corresponding character table according to the URL parameter domain field; the second training module is used for training the skip-gram encoder by utilizing the character table and the sample data to obtain a trained skip-gram encoder; and the second generation module is used for generating the skip-gram data mapping table according to the skip-gram encoder.

In the embodiment of the application, the skip-gram data mapping table generated by the skip-gram encoder can be used for converting the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relation between the character meaning and the character can be fully expressed by using the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time for feature extraction is reduced.

Further, the attack behavior detection apparatus 200 further includes: the second input module is used for inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector; the third training module is used for training the skip-gram model by using the sample vector to obtain the trained skip-gram model; and the third generation module is used for generating the skip-gram data mapping table according to the skip-gram encoder and the skip-gram model.

In the embodiment of the application, the skip-gram data mapping table generated by the skip-gram encoder and the skip-gram model can convert the parameter domain field to be detected in the character form into the detection parameter domain mapping data in the vector form, and the relationship between the character meaning and the character can be fully expressed by the skip-gram data mapping table, so that the complexity of feature extraction of the feature extraction model can be reduced, and the time of feature extraction is shortened.

Further, the attack behavior detection apparatus 200 further includes: the conversion module is used for converting the sample data into sample mapping data by utilizing a predetermined skip-gram data mapping table; and the fourth training module is used for training a convolutional neural network model by using the sample mapping data to obtain the feature extraction model.

In the embodiment of the application, the feature extraction model can be used for extracting the features of the parameter domain mapping data to be detected, and the feature extraction model has strong representation learning capacity, so that the accuracy rate can be improved by performing attack detection based on the features.

Referring to fig. 3, fig. 3 is a block diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 300 includes: at least one processor 301, at least one communication interface 302, at least one memory 303, and at least one communication bus 304. Wherein the communication bus 304 is used for realizing direct connection communication of these components, the communication interface 302 is used for communicating signaling or data with other node devices, and the memory 303 stores machine readable instructions executable by the processor 301. When the electronic device 300 is in operation, the processor 301 and the memory 303 communicate via the communication bus 304, and the machine-readable instructions, when called by the processor 301, perform the above-described attack behavior detection method.

For example, the processor 301 of the embodiment of the present application may implement the following method by reading the computer program from the memory 303 through the communication bus 304 and executing the computer program: step S101: and acquiring URL data to be detected, and extracting parameter domain fields to be detected in the URL data to be detected. Step S102: and preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected. Step S103: and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected.

The processor 301 may be an integrated circuit chip having signal processing capabilities. The Processor 301 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. Which may implement or perform the various methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The Memory 303 may include, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like.

It will be appreciated that the configuration shown in fig. 3 is merely illustrative and that electronic device 300 may include more or fewer components than shown in fig. 3 or have a different configuration than shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof. In the embodiment of the present application, the electronic device 300 may be, but is not limited to, an entity device such as a desktop, a notebook computer, a smart phone, an intelligent wearable device, and a vehicle-mounted device, and may also be a virtual device such as a virtual machine. In addition, the electronic device 300 is not necessarily a single device, but may also be a combination of multiple devices, such as a server cluster, and the like.

Embodiments of the present application further provide a computer program product, including a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can perform the steps of the attack behavior detection method in the foregoing embodiments, for example, including: acquiring URL data of a uniform resource locator to be detected, and extracting a parameter domain field to be detected in the URL data to be detected; preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected; and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An attack behavior detection method, comprising:

acquiring URL data of a uniform resource locator to be detected, and extracting a parameter domain field to be detected in the URL data to be detected;

preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding characteristic vector to be detected;

and inputting the characteristic vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected.

2. The method according to claim 1, wherein the preprocessing the parameter domain field to be detected to convert the parameter domain field to be detected into a corresponding feature vector to be detected comprises:

converting the parameter domain field to be detected into parameter domain mapping data to be detected by using a predetermined skip-gram data mapping table;

and extracting the characteristic vector to be detected corresponding to the parameter domain mapping data to be detected by using a pre-trained characteristic extraction model.

3. The method according to claim 2, wherein before the obtaining URL data of the URL to be detected, the method further comprises:

acquiring sample data; the sample data comprises positive sample data and negative sample data, wherein the positive sample data comprises a normal parameter domain field, and the negative sample data comprises an abnormal parameter domain field;

preprocessing the sample data to convert the sample data into corresponding sample characteristic vectors;

and training a random forest algorithm model by using the sample feature vector to obtain the attack behavior detection model.

4. The method according to claim 3, wherein the training of the random forest algorithm model by using the sample feature vectors to obtain the attack behavior detection model comprises:

carrying out random back sampling on the sample feature vectors by using a self-service method to generate n training sets; wherein n is a positive integer greater than zero;

respectively training n decision tree models by using the n training sets to obtain n trained decision tree models;

and forming the attack behavior detection model according to the n trained decision tree models.

5. The method according to claim 3 or 4, wherein after the obtaining of the sample data, the method further comprises:

generating a corresponding character table according to the URL parameter domain field;

training a skip-gram encoder by using the character table and the sample data to obtain a trained skip-gram encoder;

and generating the skip-gram data mapping table according to the skip-gram encoder.

6. The method according to claim 3 or 4, wherein after the obtaining of the sample data, the method further comprises:

inputting the sample data into a pre-trained skip-gram encoder to obtain a sample vector;

training a skip-gram model by using the sample vector to obtain a trained skip-gram model;

and generating the skip-gram data mapping table according to the skip-gram encoder and the skip-gram model.

7. The method according to claim 3 or 4, wherein after the obtaining of the sample data, the method further comprises:

converting the sample data into sample mapping data by using a predetermined skip-gram data mapping table;

and training a convolutional neural network model by using the sample mapping data to obtain the feature extraction model.

8. An attack behavior detection apparatus, comprising:

the device comprises a first acquisition module, a second acquisition module and a parameter domain extraction module, wherein the first acquisition module is used for acquiring URL data of a uniform resource locator to be detected and extracting a parameter domain field to be detected in the URL data to be detected;

the first preprocessing module is used for preprocessing the parameter domain field to be detected so as to convert the parameter domain field to be detected into a corresponding feature vector to be detected;

and the first input module is used for inputting the feature vector to be detected into a pre-trained attack behavior detection model to obtain an attack behavior detection result corresponding to the URL data to be detected.

9. An electronic device, comprising: a processor, a memory, and a bus;

the processor and the memory are communicated with each other through the bus;

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of attack detection according to any one of claims 1 to 7.

10. A non-transitory computer-readable storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the method of detecting an offensive behavior of any one of claims 1-7.