CN117349834A

CN117349834A - Malicious request detection method, device, equipment and storage medium

Info

Publication number: CN117349834A
Application number: CN202311351778.2A
Authority: CN
Inventors: 程艳宇; 许祥
Original assignee: Zhongdian Cloud Computing Technology Co ltd
Current assignee: Zhongdian Cloud Computing Technology Co ltd
Priority date: 2023-10-17
Filing date: 2023-10-17
Publication date: 2024-01-05

Abstract

The disclosure relates to a malicious request detection method, a device, equipment and a storage medium, wherein the malicious request detection method comprises the following steps: acquiring a request log to be detected; predicting initial malicious scores of the request logs through a pre-trained two-classification model; under the condition that the initial malicious score is larger than a first preset threshold value, predicting the malicious type of the request log through a pre-trained multi-classification model, and outputting a probability value that the request log belongs to a target malicious type; and calculating a weighted malicious score according to the initial malicious score and the probability value, and alarming and intercepting the request log if the weighted malicious score is larger than a second preset threshold value. The method provided by the disclosure can detect the malicious request with low cost, high precision and high performance.

Description

Malicious request detection method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of data detection, and in particular relates to a malicious request detection method, a malicious request detection device, malicious request detection equipment and a storage medium.

Background

Currently, detection of malicious requests is widely used in the traditional security fields, such as WAF, NTA, RASP.

The traditional detection method is mainly based on rules, for example, in WAF, the rules are often required to be adapted one by one for given type or general type attacks for detection, and the maintenance and development costs of the rules are high.

At present, a detection method based on an algorithm model is generally adopted, but the detection method also has various defects, for example, the detection performance of some deep learning models is relatively poor, and the detection method is difficult to fall to scenes with high requirements on detection instantaneity, such as WAF (wireless local area network), and the like; according to the detection method based on the machine learning model, because of different designs of the machine learning algorithms, although some learning models can accurately detect but cannot identify malicious request types, other learning models can identify malicious request types but have poor detection performance or directly and simply combine a plurality of models, and the problems of detection performance degradation, report missing rate increase and the like are caused instead of effectively combining the plurality of models.

Therefore, it is desirable to provide a high-precision and high-performance malicious request detection method.

Therefore, the advantages of various algorithm models can be fused, and the combination of the disadvantages is avoided, so that the overall detection capability of the model is improved.

Also, in the aspect of alarm response, especially the alarm response of the malicious request with high confidence, the problems of interception omission, interception mistake or untimely interception and the like exist.

Disclosure of Invention

In order to solve the technical problems, the disclosure provides a malicious request detection method, a malicious request detection device and a malicious request detection storage medium, wherein the malicious request detection method, the malicious request detection device and the malicious request detection storage medium can detect the malicious request with low cost, high precision and high performance.

In a first aspect, an embodiment of the present disclosure provides a malicious request detection method, including:

acquiring a request log to be detected;

predicting initial malicious scores of the request logs through a pre-trained two-classification model;

under the condition that the initial malicious score is larger than a first preset threshold value, predicting the malicious type of the request log through a pre-trained multi-classification model, and outputting a probability value that the request log belongs to a target malicious type;

and calculating a weighted malicious score according to the initial malicious score and the probability value, and alarming and intercepting the request log if the weighted malicious score is larger than a second preset threshold value.

Optionally, after the obtaining the request log to be detected, the method further includes:

cleaning the request log, and removing error data and null data to obtain a cleaned request log;

extracting a request head and a request body from the cleaned request log to form target data, wherein the request head comprises a head row, a complete row and a content length row;

And replacing the target characters in the target data with corresponding preset characters to obtain a preprocessed request log.

Optionally, the classification model includes a first classification module, and predicting, by the pre-trained classification model, the initial malicious score of the request log includes:

word segmentation is carried out on the preprocessed request log, and a first word segmentation list is obtained;

extracting features of the word segmentation list to obtain a first feature vector;

and calling the first classification module to predict the first feature vector to obtain the initial malicious score of the request log.

Optionally, the multi-classification model includes a second classification model that predicts multiple types of malicious requests.

Optionally, the predicting the malicious type of the request log through a pre-trained multi-classification model, outputting a probability value that the request log belongs to a target malicious type, includes:

converting the preprocessed request log into a second feature vector;

and calling the second classification model to predict the malicious type of the second feature vector, and outputting a probability value that the request log belongs to a target malicious type.

Optionally, the classification model is obtained through training the following steps:

preprocessing the acquired request logs to obtain a plurality of first samples, and marking at least one log type label for each first sample, wherein the log type labels comprise non-malicious request logs and malicious request logs;

for each first sample, if the first samples are marked with log type labels of a non-malicious request log and a malicious request log at the same time, determining the non-malicious request log as a final log type label of the first samples, and obtaining a cleaned first sample;

and training the classification model according to the plurality of cleaned first samples to obtain a trained classification model.

Optionally, the classification model includes a first word segmentation module, a first feature extraction module and a first classification module, and the training is performed on the classification model according to a plurality of cleaned first samples to obtain a trained classification model, including;

dividing the plurality of cleaned first samples into a first training set and a first testing set according to a preset proportion;

the first training set is segmented through the first segmentation module to obtain a sample segmentation list;

Training the first feature extraction module by using the sample word segmentation list to obtain a trained first feature extraction module, and converting each first sample in the first training set and the first test set into a sample feature vector by the trained first feature extraction module;

and training and testing the first classification module according to the sample feature vector to obtain a trained classification model.

Optionally, the multi-classification model is obtained through training of the following steps:

preprocessing the acquired request logs to obtain a plurality of second samples, and marking at least one log type label for each second sample, wherein the log type label comprises a non-malicious request log and a plurality of types of malicious request logs;

for each second sample, if the second samples are marked with log type labels of non-malicious request logs and malicious request logs at the same time, determining the log type labels of the second samples as the non-malicious request logs; or if the second samples are marked with a plurality of malicious request logs at the same time, determining the most types in the plurality of malicious request logs as a final target type label of each second sample, and obtaining cleaned second samples;

And training the multi-classification model according to the plurality of cleaned second samples to obtain a trained multi-classification model.

In a second aspect, an embodiment of the present disclosure provides a malicious request detection apparatus, including:

the acquisition unit is used for acquiring a request log to be detected;

a first prediction unit, configured to predict an initial malicious score of the request log through a pre-trained classification model;

the second prediction unit is used for predicting the malicious type of the request log through a pre-trained multi-classification model under the condition that the initial malicious score is larger than a first preset threshold value, and outputting a probability value that the request log belongs to a target malicious type;

and the detection unit is used for calculating a weighted malicious score according to the initial malicious score and the probability value, and alarming and intercepting the request log if the weighted malicious score is larger than a second preset threshold value.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the malicious request detection method as described in any one of the above.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a malicious request detection method as described in any of the above.

The embodiment of the disclosure provides a malicious request detection method, which comprises the following steps: acquiring a request log to be detected; predicting initial malicious scores of the request logs through a pre-trained two-classification model; under the condition that the initial malicious score is larger than a first preset threshold value, predicting the malicious type of the request log through a pre-trained multi-classification model, and outputting a probability value that the request log belongs to a target malicious type; and calculating a weighted malicious score according to the initial malicious score and the probability value, and alarming and intercepting the request log if the weighted malicious score is larger than a second preset threshold value. The method provided by the disclosure can detect the malicious request with low cost, high precision and high performance.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a flow chart of a training method for a classification model according to an embodiment of the disclosure;

FIG. 2 is a flow chart of a multi-classification model training method according to an embodiment of the disclosure;

fig. 3 is a flow chart of a malicious request detection method according to an embodiment of the disclosure;

FIG. 4 is a weighted malicious score line graph provided by an embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a malicious request detection method according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a malicious request detection apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

Aiming at the technical problems, the disclosed embodiments provide a malicious request detection method, which mainly collects and cleans request logs of systems or platforms such as a website application level intrusion prevention system (Web Application Firewall, WAF), network address translation (Network Address Translation, NAT), application self Protection (RASP) and the like, extracts required data from a request header and a request body of the request log, and filters and preprocesses the data to obtain the preprocessed request log. The preprocessed historical data enters a model training platform, firstly, labeling is carried out on the data, and aiming at a two-class model, labeling is carried out on the data by 0 and 1, and then operations such as cross cleaning of two-class samples, division of a training set and a testing set, sample enhancement of the training set, feature extraction, training of the two-class model and the like are carried out, so that a two-class model for malicious request detection is obtained; for the multi-classification model, assuming that n malicious request types exist, labeling data by 0, 1, 2, … and n, and then performing operations such as multi-classification sample cross cleaning, training set and test set division, training set sample enhancement, multi-classification data feature extraction, multi-classification model training and the like to obtain the multi-classification model for malicious request detection. Then, acquiring a request log in real time, preprocessing the request log to obtain preprocessed real-time data, detecting the preprocessed real-time data by using a trained classification model, if the malicious score S detected by the classification model exceeds a certain threshold, continuing to detect the multi-classification model, and if the detected malicious score S does not exceed the certain threshold, not alarming and not needing to detect the multi-classification model; if the detection result of the multi-classification model belongs to a certain malicious request, alarming according to the type of the malicious request, weighting the malicious score output by the two-classification model, outputting a weighted malicious score wS, and if the detection result does not belong to a certain malicious request, alarming according to the malicious request, and outputting a malicious score S; if the weighted malicious score in the alarm reaches a certain threshold, the confidence of the malicious request is considered to be higher, and automatic interception response is carried out. And in particular may be described in more detail by one or more of the following examples.

Specifically, the malicious request detection method may be executed by a terminal or a server. The terminal or the server can perform malicious detection on the request log through a classification model. The execution subject of the classification model training method and the execution subject of the malicious request detection method may be the same or different.

For example, in one application scenario, a server trains a classification model. The terminal acquires the trained classification model from the server, and the terminal carries out malicious detection on the request log through the trained classification model. The request log may be obtained by the terminal. Alternatively, the request log is obtained by the terminal from other devices. In another application scenario, the server trains the classification model. Further, the server detects maliciously the request log through the trained classification model. The manner in which the server obtains the request log may be similar to the manner in which the terminal obtains the request log as described above, and will not be described here again. In yet another application scenario, the terminal trains the classification model. Further, the terminal carries out malicious detection on the request log through the trained classification model. It can be appreciated that the classification model training method and the malicious request detection method provided by the embodiments of the present disclosure are not limited to the several possible scenarios described above. Since the trained classification model can be applied to the malicious request detection method, the classification model training method can be introduced below before introducing the malicious request detection method.

The following describes a training method of the classification model, namely a training process of the classification model, taking a server training classification model as an example. It can be appreciated that the classification model training method is also applicable to the scene of training the classification model by the terminal.

Fig. 1 is a flow chart of a training method for a classification model according to an embodiment of the present disclosure, which is applied to a server, and specifically includes the following steps S101 to S103 shown in fig. 1:

s101, preprocessing the acquired request logs to obtain a plurality of first samples, and marking at least one log type label for each first sample.

Wherein the log type tag includes a non-malicious request log and a malicious request log.

It can be understood that, as a plurality of first samples, a plurality of request logs sent or received by a system or a platform such as WAF, NTA, RASP are collected, the plurality of first samples can be understood as two kinds of samples, and all collected request logs are cleaned for the first time, specifically, the first cleaning can remove error data in the request logs and data with empty request heads and request bodies, and the first cleaning can not remove the error data when one of the request heads and the request bodies is empty, and support cleaning of other types of data. After the first cleaning is completed, data is extracted from the cleaned request log, specifically, the first line POST, the cookie line, the content-length line and the request body (body) of the request header in the request log are extracted to form data, for example, the original request header includes 5 lines of data, wherein:

Line 1 POST/dashboard/uploadID.php HTTP/1.1;

row 2 Host 220.202.55.18:443;

cookie line 3 lang=en-US; i_like_gagis=. I./etc/passwd;

line 4 Content-Length 265;

line 5X-Requested-With XMLHttpRequest.

According to the extraction principle about the request header described above, 3 lines are extracted from 5 lines, and the three extracted lines include POST, cookie, and content-length.

Also for example, the original request body includes 1 line of data, as specifically shown below: the method comprises the steps of forming a film-data by using a film-forming agent, wherein the film-data comprises the steps of forming a film-data by using a film-forming agent, and forming a film-data by using the film-data; name= \email_id\ "; filename = \poc. Php\r\ncontent-Type image/png\r\n <? php\r\n$cmd= $get [ \cmd\' ]; r\sensor ($cmd); ? R/n, 5825462663702204104870787337/r/n.

Based on the above example, the extracted data includes data of 3 lines of request header and data of 1 line of request body, specifically as follows:

POST/dashboard/uploadID.php HTTP/1.1

Cookie:lang＝en-US；i_like_gogits＝../../../../etc/passwd

Content-Length:265

-----------------------------5825462663702204104870787337\r\nContent-Disposition:form-data；name＝\"employee_ID\"；filename＝\"poc.php\"\r\nContent-Type:image/png\r\n\r\n<？php\r\n$cmd＝$_GET[\'cmd\']；\r\nsystem($cmd)；\r\n？>\r\n-----------------------------5825462663702204104870787337–\r\n

it can be understood that after data extraction is completed, filtering the extracted 4 lines of data, specifically, if each line of data exceeds a certain threshold value, filtering the data, wherein the certain threshold value may be 20, that is, if any line length in the 4 lines of data exceeds 20, the data is reserved. It will be appreciated that other forms of filtering processing are also supported.

It can be understood that after the data filtering is completed, rule matching and symbol replacement are performed on the special type data in the filtered data, specifically, ip address, random character, random number, time, date and the like in the data are replaced by preset unicode respectively, for example, ip is replaced by '{ ip }', random character is replaced by "{ hash }, random number is replaced by '{ randnum }', time data is replaced by '{ HMS }', date data is replaced by '{ YMD }', and the like, for example, a plurality of continuous spaces are replaced by 1 space, a plurality of continuous tab are replaced by 1 tab, and a plurality of continuous line feed are replaced by 1 line feed. Optionally, other forms of preprocessing are also supported. For example, ip address (e.g., 220.181.111.147) is replaced with "{ ip }, random characters of a certain length (e.g., 15-76 bits) are replaced with" { hash }, random numbers of a certain length (e.g., 6-7 bits) are replaced with "{ randnum }, date data (e.g., 2021, 11 days) are replaced with" { YMD }, time data (e.g., 12:10:02) are replaced with "{ HMS }", etc., while configuration custom rules and symbols are also supported to automatically recognize replacement of special types of data. It is understood that the data after replacement can also be subjected to deduplication processing.

It can be understood that after preprocessing of the two classification samples is completed, at least one log type label is marked on each first sample, wherein the type of the malicious request log does not need to be considered for the two classification model, so that the log type labels marked on the first samples comprise a non-malicious request log and a malicious request log, data extracted from the non-malicious request log is used as a white sample, the label is 0, data extracted from the malicious request log is used as a black sample, and the label is 1.

S102, aiming at each first sample, if each first sample is marked with a log type label of a non-malicious request log and a malicious request log at the same time, determining the non-malicious request log as a final log type label of each first sample, and obtaining a cleaned first sample.

It can be understood that, on the basis of S101, after the pretreatment and labeling are performed on the two-class samples, cross cleaning is performed on all the two-class samples, specifically, for each first sample, if the first sample is labeled with a plurality of labels, the plurality of labels include malicious request logs and also include non-malicious request logs, the non-malicious request logs are used as the final labels of the first sample, the second cleaning of the samples is completed, and the cleaned first sample is obtained, that is, if the same data is labeled as 0 and 1, 0 is taken.

And S103, training the classification model according to the plurality of cleaned first samples to obtain a trained classification model.

The classification model comprises a first word segmentation module, a first feature extraction module and a first classification module.

Optionally, in S103, training the classification model according to the plurality of cleaned first samples to obtain a trained classification model, which may be implemented specifically by the following steps;

dividing the plurality of cleaned first samples into a first training set and a first testing set according to a preset proportion; the first training set is segmented through the first segmentation module to obtain a sample segmentation list; training the first feature extraction module by using the sample word segmentation list to obtain a trained first feature extraction module, and converting each first sample in the first training set and the first test set into a sample feature vector by the trained first feature extraction module; and training and testing the first classification module according to the sample feature vector to obtain a trained classification model.

It can be appreciated that, on the basis of S102, all the first samples marked with the labels and cross-cleaned are divided into the training set and the test set according to a preset ratio (for example, 4:1), specifically divided into the first training set and the first test set, and then the training set is subjected to sample enhancement, so that the black-white samples are balanced. For example, if the black sample set in the training set is (data_black: 1), the white sample set is (data_white 0:0, data_white1:0, data_white2: 0), the enhanced black sample set is (data_black: 1 ), and the white sample set is (data_white 0:0, data_white1:0, data_white2: 0).

It can be understood that the data in the first training set is subjected to character-level word segmentation through a first word segmentation model, so as to obtain a sample word segmentation list, wherein the first word segmentation model can be a language model (3-gram model). And then, calling a first feature extraction module to perform feature extraction on the sample word segmentation list to obtain a trained first feature extraction module, wherein the first feature extraction module can be a tfidf model. And then, converting the data in the enhanced first training set and the first test set into sample feature vectors by utilizing a trained first feature extraction module, namely, extracting features of the first training set and the first test set. Training and testing the first classification module through the first training set and the first testing set after the feature extraction to obtain a trained first classification module, wherein the first classification module can be a logistic regression model, and other models with equivalent effects and performances can also be obtained.

According to the training method for the two-class model, the feature vectors which can be used for high-performance detection and accurate training are obtained through extracting, cleaning, filtering, preprocessing and the like of the request head and the request body in the request log, meanwhile, the model samples are subjected to cross cleaning and sample enhancement, and the accuracy of the two-class model obtained through training is high.

Fig. 2 is a flow chart of a multi-classification model training method provided by an embodiment of the present disclosure, where a classification model includes a second word segmentation module, a second feature extraction module, and a second classification module, and specifically includes the following steps S201 to S203 shown in fig. 2:

s201, preprocessing the acquired request logs to obtain a plurality of second samples, and marking at least one log type label for each second sample.

Wherein the log type tag includes one non-malicious request log and a plurality of types of malicious request logs.

It can be understood that the obtained plurality of request logs are preprocessed to obtain a plurality of second samples, the second samples can be understood as multi-classification samples, and the preprocessing step can refer to the preprocessing step of the two-classification samples, which is not described herein, wherein the training data of the multi-classification model can be the same as the training data of the two-classification model. The data extracted from the non-malicious request log is a white sample, the tag is 0, the data extracted from the malicious request log is a black sample, and if n malicious request types, such as sql injection, code execution, file uploading, information leakage, scanning, permission bypass and the like, exist, the tag is 1,2, 3.

S202, aiming at each second sample, if each second sample is marked with a log type label of a non-malicious request log and a malicious request log at the same time, determining the log type label of each second sample as the non-malicious request log; or if the second samples are marked with a plurality of malicious request logs at the same time, determining the most types in the plurality of malicious request logs as the final target type label of the second samples, and obtaining the cleaned second samples.

It can be understood that, on the basis of S201, the multi-classification samples are cross-cleaned, specifically, for each second sample, if the second sample is marked with a plurality of labels, where the plurality of labels include both the non-malicious request log and the malicious request log, the label of the second sample is determined to be the non-malicious request log; or if the plurality of labels only comprise a plurality of malicious request logs, determining that a certain malicious request type with the largest number is the final target type label of the second sample, and obtaining the cleaned second sample. For example, if the same data is marked as 0 and not 0 (1-n) at the same time, then 0 is taken; if the same data is marked as non-0, such as 1 and 2, the label with the largest label number is fetched.

And S203, training the multi-classification model according to the plurality of cleaned second samples to obtain a trained multi-classification model.

It can be appreciated that, on the basis of S202, after the multi-classification sample cross cleaning is completed, the labeled multi-classification sample is subjected to training set and test set division according to a certain proportion (e.g. 4:1), so as to obtain a second training set and a second test set. Then, the second training set is subjected to sample enhancement so that the black and white samples and various black samples are balanced. And performing character-level word segmentation on data in the second training set through a second word segmentation module, and then calling a second feature extraction module to perform feature extraction on the segmented training set to obtain a trained second feature extraction module type, wherein the second feature extraction module can be a tfidf model. And finally, converting the data in the enhanced second training set and the second testing set into feature vectors by using a trained second feature extraction module so as to perform multi-classification model training. Specifically, the second classification module is trained and tested through the second training set and the second testing set after the feature extraction, so as to obtain a trained multi-classification model, and the second classification module can be a logistic regression model. Optionally, the second classification module is not limited to the logistic regression model, and other models with equivalent effects and performances may be also used, which will not be described herein.

According to the multi-classification model training method, the multi-classification samples are subjected to cross cleaning and sample enhancement, so that the training precision of the multi-classification model is effectively improved.

Fig. 3 is a flowchart of a malicious request detection method according to an embodiment of the present disclosure, which specifically includes steps S301 to S304 shown in fig. 3:

s301, acquiring a request log to be detected.

It can be appreciated that the request log to be detected is obtained in real time.

Optionally, after obtaining the request log to be detected, the method further includes:

cleaning the request log, and removing error data and null data to obtain a cleaned request log; extracting a request head and a request body from the cleaned request log to form target data, wherein the request head comprises a head row, a complete row and a content length row; and replacing the target characters in the target data with corresponding preset characters to obtain a preprocessed request log.

It can be understood that the real-time request log is preprocessed to obtain the real-time data, and specific description of preprocessing the real-time request log refers to the above embodiment, which is not described herein.

S302, predicting the initial malicious score of the request log through a pre-trained two-classification model.

The classification model comprises a first feature extraction module and a first classification module.

Optionally, the step S302 may be specifically implemented by the following steps:

word segmentation is carried out on the preprocessed request log, and a first word segmentation list is obtained; extracting features of the word segmentation list to obtain a first feature vector; and calling the first classification module to predict the first feature vector to obtain the initial malicious score of the request log.

It can be understood that, on the basis of S301, the trained first feature extraction module is invoked to convert the preprocessed real-time data into feature vectors, and then the trained first classification module is invoked to predict the feature vectors, and quantize the prediction result into the percentage malicious score S, so as to obtain the initial malicious score.

S303, under the condition that the initial malicious score is larger than a first preset threshold value, predicting the malicious type of the request log through a pre-trained multi-classification model, and outputting a probability value that the request log belongs to a target malicious type.

Wherein the multi-classification model comprises a second classification model.

It can be appreciated that, based on the above step S302, it is determined whether the initial malicious score is greater than a first preset threshold, where the first preset threshold may be 60, if the initial malicious score is greater than the first preset threshold, the multi-classification model is invoked to detect, and if the initial malicious score is less than or equal to the first preset threshold, no alarm is given.

Optionally, in S303, the malicious type of the request log is predicted by a pre-trained multi-classification model, and a probability value that the request log belongs to the target malicious type is output, which specifically may be implemented by the following steps:

converting the preprocessed request log into a second feature vector; and calling the second classification model to predict the malicious type of the second feature vector, and outputting a probability value that the request log belongs to a target malicious type.

It can be understood that, under the condition that the initial malicious score is greater than the first preset threshold, the multi-classification model is called to further detect real-time data predicted as a malicious request by the classification model, specifically, the trained second feature extraction module is called to convert the preprocessed real-time data into a vector, the trained second classification module is called to predict the feature vector, and a probability value that the real-time data belongs to a malicious request type or a non-malicious request type is output. If the detection result output by the multi-classification model is a non-malicious request, namely the detection result output by the two-classification model is a malicious request, but the detection result output by the multi-classification model is a non-malicious request, the detection result output by the two-classification model is directly adopted under the condition, and the alarm is given according to the malicious request, and meanwhile, the initial malicious score S is given. If the detection result output by the multi-classification model is a malicious request type, alarming according to the malicious request type, and giving out a weighted malicious score wS calculated according to the probability value and the initial malicious score.

S304, calculating weighted malicious scores according to the initial malicious scores and the probability values, and alarming and intercepting the request log if the weighted malicious scores are larger than a second preset threshold.

Optionally, the calculating a weighted malicious score according to the initial malicious score and the probability value is specifically implemented by the following steps:

calculating weights according to the initial malicious score, the maximum malicious score and the probability value; and weighting the initial malicious score based on the weight to obtain a weighted malicious score.

It can be understood that, on the basis of S303, a weight is calculated according to the initial malicious score output by the classification model, the preset malicious score maximum value and the probability value output by the multi-classification model, and then the initial malicious score is weighted based on the weight, so as to obtain the weighted malicious score. The specific calculation mode of the weighted malicious score is shown in the formula (1):

wherein, wS is weighted malicious score, S is the initial malicious score of the malicious request of the request log output by the classification model, P is the probability value of the malicious request belonging to the request log output by the multi-classification model, and the maximum value of the malicious score is 100.

It can be understood that, the higher the probability value P of a request log belonging to a malicious request, the greater the weighting of the initial malicious score S detected by the classification model, and in particular: if ws=0 or ws=100 or wS monotonically increases, then when s=0, ws=0; when s=100, ws=100; when S epsilon (0, 100),and because e ^P-1 >0, so the derivative of wS with respect to P +.>So wS monotonically increases.

It can be appreciated that the weighted malicious score wS is at least S and at most 100, and specifically: when s=0, ws=0, so ws=s; when s=100, ws=100; when S is E (0, 100), the value of wS is monotonically increased, and when P is more than or equal to 0 and less than or equal to 1, i.e. < -> Due to->Is parabolic with the S-shaped opening facing downwards, so whenAt the time, the peak maximum value is taken, because of S ₀ >100, thus when S.epsilon.0, 100 ∈ ->Monotonically increasing, so-> I.e.Also because of-> So wS<100. S is less than or equal to wS is less than or equal to 100.

It can be appreciated that, in the weighted malicious score, the duty ratio of the multi-classification model detection result is not more than the classification model detection result, specifically: when S is E [0,100]In the time-course of which the first and second contact surfaces,when P is E [0,1]Time e ^P-1 -e ^-1 ∈[0,1-e ^-1 ]Thus, it is Because of 1-e ^-1 <1, therefore->

For example, referring to fig. 4, fig. 4 is a weighted malicious score line graph provided by an embodiment of the present disclosure, where (a) in fig. 4 is a weighted malicious score line graph when s=70 and p=0:0.01:1, and the initial malicious score S is greater than 60, that is, a multi-classification model detection is required, a specific malicious type is detected, and a weighted malicious score is output; fig. 4 (b) is a weighted malicious score discount chart with s=85 and p=0:0.01:1, reflecting the relationship between the weighted malicious score and the malicious probability, and fig. 4 (c) is a weighted malicious score discount chart with s=90 and p=0:0.01:1, where it is clear from fig. 4 that the weighted malicious score is monotonically increasing and greater than 60 and less than the maximum value 100.

According to the malicious request detection method, whether the request meets the characteristics of the malicious request is identified through extraction, processing, training and detection of the request log, the malicious request can be accurately identified through fusion detection of the two-class model and the multi-class model and calculation of weighted malicious score, meanwhile, malicious requests with high confidence level are further quantized and intercepted automatically, calculation complexity is fully considered in the algorithm design process, a characteristic extraction module with higher performance and a class detection model are selected, multi-class model detection is only carried out on the malicious requests generated by the two-class model with fewer numbers during model fusion detection, and the problem that the malicious requests cannot be accurately identified with high performance in the past is solved.

On the basis of the above embodiment, fig. 5 is a schematic flow chart of a malicious request detection method according to an embodiment of the present disclosure, a request log is obtained, and operations such as data cleaning, data extraction, data filtering, data preprocessing, etc. are performed on the request log; if the data is obtained after preprocessing, inputting the historical data into a model training platform, tagging the data as a two-class sample, specifically tagging the data with a two-class 0 and 1 tag, wherein 0 represents a non-malicious request and 1 represents a malicious request, then performing cross cleaning on the two-class sample, then dividing a two-class training set and a testing set, then extracting the characteristics of the two-class training set sample after enhancement, and training to obtain a two-class model; marking data with multi-classification 0-n labels, wherein 1-n represents a specific malicious request type, and then performing multi-classification sample cross cleaning, multi-classification training set and test set division, multi-classification training set sample enhancement and multi-classification data feature extraction to complete training of a multi-classification model; if the data is real-time data obtained after preprocessing, inputting the real-time data into a model prediction platform, extracting feature vectors through a feature extraction module in a trained two-class model, then predicting malicious requests and non-malicious requests based on the feature vectors based on a class module in the two-class model, and outputting initial malicious scores; judging whether the initial malicious score S is larger than a first preset threshold value, if the initial malicious score S is smaller than or equal to the first preset threshold value, generating no alarm, and if the initial malicious score S is larger than the first preset threshold value, predicting the probability that the real-time data belongs to a certain malicious request through a trained multi-classification model, wherein the probability is specific: and calling a feature extraction model to perform feature extraction on the real-time data, then calling a classification module to predict the probability of a certain malicious request based on the feature vector, if the detection result of the multiple classification models is a non-malicious request, performing malicious request alarming and giving an initial malicious score, if the detection result of the multiple classification models is a certain malicious request, performing malicious request type alarming and giving a weighted malicious score wS, then judging whether the weighted malicious score is larger than a second preset threshold, if the weighted malicious score is larger than the second preset threshold, automatically intercepting a request log, and if the weighted malicious score is smaller than or equal to the second preset threshold, performing manual research, judging and disposing alarming.

It will be appreciated that the detailed description of the steps shown in fig. 5 is given with reference to the above embodiments and is not limited thereto.

Fig. 6 is a schematic structural diagram of a malicious request detection apparatus according to an embodiment of the present disclosure. The malicious request detection apparatus provided in the embodiment of the present disclosure may execute a processing flow provided in the embodiment of the malicious request detection method, as shown in fig. 6, where the malicious request detection apparatus 600 includes an obtaining unit 601, a first prediction unit 602, a second prediction unit 603, and a detection unit 604, where:

an obtaining unit 601, configured to obtain a request log to be detected;

a first prediction unit 602, configured to predict an initial malicious score of the request log through a pre-trained classification model;

a second prediction unit 603, configured to predict, when the initial malicious score is greater than a first preset threshold, a malicious type of the request log through a multi-classification model trained in advance, and output a probability value that the request log belongs to a target malicious type;

and the detection unit 604 is configured to calculate a weighted malicious score according to the initial malicious score and the probability value, and if the weighted malicious score is greater than a second preset threshold, alarm and intercept the request log.

Optionally, the apparatus 600 is further configured to:

Optionally, the classification model in the apparatus 600 includes a first classification module.

Optionally, the first prediction unit 602 is configured to:

Optionally, the multiple classification model in the apparatus 600 includes a second classification model, where the multiple classification model may predict multiple malicious request types.

Optionally, the second prediction unit 603 is configured to:

converting the preprocessed request log into a second feature vector;

Optionally, the apparatus 600 is further configured to:

Optionally, the classification model in the apparatus 600 includes a first word segmentation module, a first feature extraction module, and a first classification module.

Optionally, the apparatus 600 is further configured to;

Optionally, the apparatus 600 is further configured to:

The malicious request detection apparatus of the embodiment shown in fig. 6 may be used to implement the technical solution of the above method embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring now in particular to fig. 7, a schematic diagram of an electronic device 700 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 700 in the embodiments of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), wearable electronic devices, and the like, and fixed terminals such as digital TVs, desktop computers, smart home devices, and the like. The electronic device shown in fig. 7 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 7, an electronic device 700 may include a processing means (e.g., a central processor, a graphics processor, etc.) 701 that may perform various suitable actions and processes to implement the malicious request detection method of embodiments as described in the present disclosure according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage means 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flowchart, thereby implementing the malicious request detection method as described above. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 701.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

Alternatively, the electronic device may perform other steps described in the above embodiments when the above one or more programs are executed by the electronic device.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or gateway that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or gateway. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or gateway comprising the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting a malicious request, comprising:

acquiring a request log to be detected;

2. The method of claim 1, wherein after the obtaining the request log to be detected, the method further comprises:

3. The method of claim 2, wherein the classification model comprises a first classification module, the predicting the initial malicious score of the request log by a pre-trained classification model comprising:

4. The method according to claim 2, wherein the multi-classification model includes a second classification model, the multi-classification model being capable of predicting a plurality of malicious request types, the predicting the malicious types of the request log by the pre-trained multi-classification model, outputting a probability value that the request log belongs to a target malicious type, comprising:

converting the preprocessed request log into a second feature vector;

5. The method according to claim 1, wherein the classification model is trained by:

6. The method of claim 5, wherein the classification model comprises a first word segmentation module, a first feature extraction module, and a first classification module, wherein the training the classification model according to the plurality of cleaned first samples to obtain a trained classification model comprises;

7. The method of claim 1, wherein the multi-classification model is trained by:

8. A malicious request detection apparatus, comprising:

the acquisition unit is used for acquiring a request log to be detected;

9. An electronic device, comprising:

a memory;

a processor; and

a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the malicious request detection method according to any one of claims 1 to 7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the malicious request detection method according to any one of claims 1 to 7.