CN118041708A

CN118041708A - Data processing method, device and server for access request

Info

Publication number: CN118041708A
Application number: CN202410447166.1A
Authority: CN
Inventors: 陈福祥
Original assignee: CCB Finetech Co Ltd
Current assignee: CCB Finetech Co Ltd
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-05-14

Abstract

The specification provides a data processing method, device and server for an access request, which can be used in the technical field of big data. The preset decision model at least comprising the state representation network and the action value network can be obtained through multiple rounds of iterative training according to a preset training rule in advance. Then, a history feature vector of the last time period obtained based on the history access request of the last time period is processed by utilizing a preset decision model, and a more proper decision result is determined; according to the decision result, dynamically adjusting the firewall rules of the system to obtain firewall rules of the current time period which are matched with the current access flow scene; and further, the firewall rule of the current time period can be utilized to detect and process the access request received by the current time period. Therefore, the method can effectively reduce the rate of missing report and false report, accurately identify the access requests with network attack risk, timely and accurately process the access requests in a targeted manner, and better protect the data security of the system.

Description

Data processing method, device and server for access request

Technical Field

The specification belongs to the technical field of big data, and particularly relates to a data processing method, device and server for an access request.

Background

In an internet scenario, such as Web applications, web sites, etc., are often subject to many network attacks.

Based on the existing methods, most of the technicians are required to preset fixed firewall rules according to personal knowledge and experience; detecting the access request received by the Web application or the Web site by using the firewall rule; when the access request belonging to the network attack is detected, the access request is correspondingly processed according to the firewall rule.

However, when the method is implemented, the firewall rules are affected by subjective factors of technicians and the like, so that the set firewall rules are sometimes inaccurate and reasonable, and further the problems of missing report, false report and the like are easy to occur when network attack is detected based on the firewall rules; in addition, in a real service environment, network attacks, web applications, web websites and the like are continuously and rapidly evolved, and the firewall rules used are always fixed, so that after the firewall rules are continuously used for a period of time, the problems of missing report, false report and the like become more serious, and the data security of the system is further affected.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The specification provides a data processing method, a device and a server for access requests, which can effectively reduce the false alarm rate and the false alarm rate in network attack detection by intelligently and dynamically adjusting firewall rules, accurately identify the access requests with network attack risks, and timely process the access requests in a targeted manner, thereby better protecting the data security of a system.

The specification provides a data processing method of an access request, which comprises the following steps:

acquiring a historical access request of a previous time period;

according to a preset feature processing rule, processing a historical access request of the previous time period to obtain a corresponding historical feature vector of the previous time period;

Processing the historical feature vector of the last time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule;

according to the decision result, determining candidate decision actions meeting the requirements as target decision actions;

According to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

In one embodiment, the predetermined decision model includes at least one of: firewall rule decision model for SQL injection attack, firewall rule decision model for cross-site scripting attack, firewall rule decision model for cross-site request forgery attack.

In one embodiment, when the preset decision model includes a firewall rule decision model for SQL injection attack, according to a preset feature processing rule, processing a historical access request of a previous time period to obtain a corresponding historical feature vector of the previous time period, including:

determining a first feature by detecting whether SQL keywords exist in a historical access request of a previous time period according to a preset feature processing rule; determining and obtaining a second characteristic according to the request length of the historical access request of the previous time period; counting the number of the appointed characters contained in the historical access request of the last time period as a third characteristic;

according to the first feature and the second feature, corresponding crossing features are obtained through feature crossing processing;

According to the first feature and the third feature, obtaining corresponding aggregation features through feature aggregation processing;

And generating a history feature vector of the last corresponding time period by combining the cross feature and the aggregation feature.

In one embodiment, the method further comprises:

performing binarization processing on the second feature to obtain a binarized second feature;

Correspondingly, according to the first feature and the second feature, corresponding crossing features are obtained through feature crossing processing, and the method comprises the following steps:

and according to the first characteristic and the binarized second characteristic, obtaining a corresponding crossing characteristic through characteristic crossing processing.

In one embodiment, adjusting firewall rules of a system based on a target decision action includes:

According to the target decision action, the firewall rules of the system are adjusted in at least one of the following ways:

Determining a first type of firewall rule in a preset firewall rule set; setting the state parameters of the first type firewall rules to be enabled;

determining a second type of firewall rule in a preset firewall rule set; setting the state parameters of the second firewall rules to be forbidden;

Determining a third type of firewall rule in a preset firewall rule set; and modifying the third type firewall rules, and setting the state parameters of the modified third type firewall rules to be enabled.

In one embodiment, the method further comprises:

Constructing an initial decision model; wherein the initial decision model comprises at least an initial state representation network and an initial action value network; the initial action value network is at least connected with an initial first Q network structure;

connecting the initial decision model with a preset firewall rule set; configuring a preset action selection rule for the initial decision model;

acquiring a test sample request; processing the test sample request according to a preset feature processing rule to obtain a corresponding test sample feature vector;

And carrying out multiple rounds of iterative training on the initial decision model by utilizing the characteristic vector of the test sample and the test sample request according to a preset training rule so as to obtain a preset decision model meeting the requirements.

In one embodiment, performing multiple rounds of iterative training on an initial decision model according to a preset training rule using test sample feature vectors and test sample requests, including:

According to the following mode, performing iterative training of the current round according to a preset training rule:

Acquiring a decision model of the previous round;

Processing the characteristic vector of the test sample of the previous round by using the decision model of the previous round, and determining the decision action of the current round; the characteristic vector of the test sample of the previous round is generated according to the test sample request of the previous round;

determining firewall rules of the current wheel according to the decision action of the current wheel;

Detecting and processing a test sample request of the current wheel according to the firewall rule of the current wheel to obtain a detection processing result of the current wheel;

Generating feedback data of the current wheel according to the detection processing result of the current wheel;

and according to the feedback data of the current round, adjusting the model parameters of the decision model of the previous round to obtain the decision model of the current round.

In one embodiment, adjusting model parameters of a decision model of a previous round based on feedback data of a current round includes:

detecting whether the current wheel meets a preset reset condition;

Under the condition that the current round meets the preset reset condition is determined, according to the feedback data of the current round, the network parameters of the network and the network parameters of the action value network are represented by the state in the decision model of the previous round, and the network parameters of the second Q network structure are adjusted; wherein the second Q network structure corresponds to the first Q network structure; the second Q network structure is not connected with the decision model of the previous round;

And copying the adjusted second Q network structure to the first Q network structure to obtain a decision model of the current round.

In one embodiment, in case of detecting whether the current wheel satisfies a preset reset condition, the method further comprises:

and under the condition that the current wheel does not meet the preset reset condition, adjusting the network parameters of the state representation network and the network parameters of the action value network in the decision model of the previous wheel and the network parameters of the second Q network structure according to the feedback data of the current wheel to obtain the decision model of the current wheel.

In one embodiment, the method further comprises:

The method comprises the steps of obtaining a characteristic vector of a test sample of a previous round, a test sample request of a current round and feedback data of the current round to be combined to serve as experience data; and storing the experience data into a preset buffer area.

The present specification also provides a data processing apparatus for an access request, including:

the acquisition module is used for acquiring a historical access request of the previous time period;

the processing module is used for processing the history access request of the previous time period according to a preset characteristic processing rule to obtain a corresponding history characteristic vector of the previous time period;

The decision module is used for processing the historical feature vector of the previous time period by utilizing a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule;

The determining module is used for determining candidate decision actions meeting the requirements as target decision actions according to the decision results;

the adjusting module is used for adjusting firewall rules of the system according to the target decision action to obtain firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

The present specification also provides a server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the relevant steps of a data processing method for the access request.

The present specification also provides a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the relevant steps of a data processing method of the access request.

The present specification also provides a computer program product comprising a computer program which, when executed by a processor, implements the relevant steps of the data processing method of an access request.

According to the data processing method, device and server for the access request provided by the specification, before specific implementation, a preset decision model with a good effect can be obtained through multiple rounds of iterative training according to a preset training rule. In the implementation, the history access request of the previous time period can be acquired first; according to a preset feature processing rule, a corresponding historical feature vector of the previous time period is obtained by processing a historical access request of the previous time period; then, a historical feature vector of the last time period is processed by using a preset decision model to obtain a corresponding decision result; according to the decision result, dynamically adjusting the firewall rules of the system to obtain the firewall rules of the current time period which are matched; and further, the firewall rule of the current time period can be utilized to detect and process the access request received by the system of the current time period. The method comprises the steps of firstly processing a historical feature vector of a last time period obtained based on a historical access request of the last time period by using a preset decision model, and learning and determining a proper decision result according to the latest network attack mode and features of the last time period; according to the decision result, intelligently and dynamically adjusting the firewall rules of the system to obtain firewall rules of the current time period which are matched with the current access flow scene; and further, the firewall rule of the current time period can be utilized to detect and process the access request received by the current time period. Therefore, the method can effectively reduce the false alarm rate and the false alarm rate in network attack detection, accurately identify the access requests with network attack risk, and timely and pertinently process the access requests, thereby better protecting the data security of the system.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure, the drawings that are required for the embodiments will be briefly described below, and the drawings described below are only some embodiments described in the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow diagram of a method for processing data of an access request according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 3 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 4 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 5 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 6 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 7 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 8 is a flow diagram of a method for processing data of an access request according to one embodiment of the present disclosure;

FIG. 9 is a schematic diagram of the structural composition of a server provided in one embodiment of the present disclosure;

FIG. 10 is a schematic diagram showing the structural composition of a data processing apparatus for access request according to one embodiment of the present disclosure;

FIG. 11 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

FIG. 12 is a schematic diagram of one embodiment of a data processing method for an access request provided by embodiments of the present disclosure, in one example scenario;

Fig. 13 is a schematic diagram of an embodiment of a data processing method of an access request provided by the embodiments of the present specification, in one scenario example.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

It should be noted that, the information data referred in the present specification are information and data authorized by the user or fully authorized by the related parties; and the processing of collection, storage, use, processing, transmission, provision, disclosure, application and the like of the related information data all obeys the related laws and regulations and standards of related countries and regions, necessary security measures are taken without violating the public order colloquial, and corresponding operation entrances are provided for users or related parties to select authorization or rejection.

It should also be noted that in the embodiments of the present disclosure, some existing solutions in the industry such as software, components, models, etc. may be mentioned, and they should be considered as exemplary, only for illustrating the feasibility of implementing the technical solution of the present disclosure, but not meant to imply that the applicant has or must not use the solution.

Referring to fig. 1, an embodiment of the present disclosure provides a data processing method for an access request. The method can be applied to a server or a network security device (e.g., web Application Firewall, WAF) side. In particular implementations, the method may include the following:

s101: acquiring a historical access request of a previous time period;

S102: according to a preset feature processing rule, processing a historical access request of the previous time period to obtain a corresponding historical feature vector of the previous time period;

S103: processing the historical feature vector of the last time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule;

S104: according to the decision result, determining candidate decision actions meeting the requirements as target decision actions;

s105: according to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

The preset decision model may be specifically understood as a neural network model which is obtained by performing multiple rounds of iterative training according to a preset training rule in advance, and is capable of learning based on a feature vector of an access request of an input time period, determining and outputting a preferred decision action on adjusting a firewall rule for the access request, wherein the preferred decision action is matched with the access flow scene represented by the access request of the time period according to a specific feature of the access flow scene.

Based on the above embodiment, the matched target decision action can be automatically determined by processing the historical feature vector of the last time period extracted based on the historical access request of the last time period by using the preset decision model; further, the firewall rules of the system can be intelligently and dynamically adjusted by utilizing the target decision action; and detecting and processing the access request received in the current time period according to the firewall rule of the current time period obtained after adjustment. Therefore, the method can automatically learn new network attack modes and characteristics, and dynamically adjust the used firewall rules in a self-adaptive manner; furthermore, the access requests with network attack risks can be accurately identified according to the firewall rules after dynamic adjustment, and the access requests with the risks are processed in a targeted manner in time, so that the false alarm rate and the false alarm rate during network attack detection are effectively reduced.

In some embodiments, the data processing method of the access request may be specifically applied to the wind control server side of the transaction platform, as shown in fig. 2. The wind control server is deployed with a preset prediction model and a preset firewall rule set.

The transaction platform can be an online service platform of a transaction mechanism. The user can use the client to send a corresponding access request so as to process specific transaction business data through the transaction platform.

The wind control server can specifically comprise a background system server which is applied to one side of a transaction platform and can realize functions of data transmission, data processing and the like. Specifically, the wind control server may be, for example, an electronic device with data operation, storage function and network interaction function. Or the wind control server can also be a software program which is operated in the electronic equipment and provides support for data processing, storage and network interaction. In the present embodiment, the number of servers included in the wind control server is not particularly limited. The wind control server can be one server, several servers or a server cluster formed by the several servers.

In particular implementations, the trading platform receives a large number of access requests (e.g., http access requests, etc.). The wind control server can detect the risk of network attack in real time on the access request received by the transaction platform according to the corresponding firewall rules; according to the detection result, when the network attack risk of the access request is determined, the access request is processed in a targeted manner according to the corresponding firewall rules, for example, the access request is intercepted; or after setting a risk tag for the access request, releasing the access request, etc.

In specific implementation, the wind control server may automatically acquire a historical access request of a previous time period every preset time period (for example, every 24 hours); and according to a preset feature processing rule, the historical feature vector of the last time period, which can reflect the latest access flow scene characteristics, is obtained by processing the historical access request of the last time period.

Then, the wind control server may invoke a preset decision model to analyze and combine the latest access traffic scene characteristics (for example, the latest number distribution characteristics of the access requests received by the transaction platform, the latest attack mode of the network attack suffered by the transaction platform, the latest attack type characteristics of the network attack suffered by the transaction platform, and the like) by processing the historical feature vector of the last time period, so as to determine and output a corresponding decision result.

Furthermore, the wind control server can intelligently and dynamically adjust firewall rules based on a preset firewall rule set according to the preferred target decision action indicated by the decision result to obtain adjusted firewall rules which are adapted to the current latest access traffic scene and serve as firewall rules in the current time period; and then according to the firewall rules of the current time period, automatically detecting the risk of network attack on the access request received by the transaction platform in the current time period, and performing corresponding processing in time.

According to the mode, the wind control server can automatically learn the latest attack mode, the latest attack characteristics and the like of the network attack contacted by the transaction platform in each preset time period, simultaneously combine the latest running state characteristics of the transaction platform, and intelligently and dynamically adjust the used firewall rules in a self-adaptive mode so that the used firewall rules are always matched with the latest access flow scene; and the monitoring server can accurately identify the access request with the risk of network attack according to the self-adaptive dynamic regulation firewall rules, and timely and effectively process the access request with the risk of network attack in a targeted manner, so that the data security of a transaction platform can be better protected.

In some embodiments, the obtaining the historical access request of the previous time period may include: acquiring access request logs of a system every preset time period; and screening historical access requests of the last time period from the access request log of the system according to the receiving time of the access requests. The specific duration of the preset time period can be set according to the current running state characteristics of the system and specific safety requirements.

In some embodiments, the preset decision model may specifically include at least one of the following: firewall rule decision models for SQL injection attacks, firewall rule decision models for cross-site scripting attacks (e.g., which may be abbreviated XSS), firewall rule decision models for cross-site request forgery attacks (e.g., which may be abbreviated CSRF), etc.

The above-mentioned SQL injection attack may specifically refer to a network attack that inserts or adds an SQL code into an input parameter of an application, and then transmits the parameter to a background SQL server to analyze and execute the parameter. SQL (Structured Query Language) above specifically refers to a structured query language.

The cross-site scripting attack may specifically be a network attack that uses vulnerabilities left when developing a web page or the like to inject malicious instruction codes onto the web page, so that a user loads and executes the web page or the like maliciously manufactured by an attacker.

The cross-site request forgery attack may specifically refer to a network attack that clamps a user to execute unintended operations on a currently logged-in Web application.

It should be noted that the above listed types of network attacks are only illustrative. In specific implementation, the preset decision model may further include a firewall rule decision model for other types of network attacks according to specific application scenarios and processing requirements. The present specification is not limited to this.

In particular, there are often differences in attack patterns, attack features, etc. of different types of network attacks. Accordingly, firewall rules used in detecting and handling different types of network attacks may also vary.

The firewall rule decision rule model for the SQL injection attack can be specifically understood as a neural network model which is obtained by training the firewall rule decision rule model for the SQL injection attack according to preset training rules and can intelligently and automatically decide and adjust firewall rules related to the SQL injection attack in a self-adaptive mode.

The firewall rule decision rule model for cross-site scripting attack can be specifically understood as a neural network model which is obtained by training a type of network attack for cross-site scripting attack according to a preset training rule and can intelligently and automatically decide and adjust firewall rules related to cross-site scripting attack in a self-adaptive mode.

The firewall rule decision rule model for the cross-site request counterfeit attack can be specifically understood as a neural network model which is obtained by training a network attack of the type for the cross-site request counterfeit attack according to a preset training rule and can intelligently and automatically decide and adjust firewall rules related to the cross-site request counterfeit attack in a self-adaptive manner.

Before specific implementation, according to preset training rules, different types of network attacks can be combined with attack modes, attack features and the like, and preset decision models for different types of network attacks can be respectively trained and obtained for different types of network attacks. The present disclosure mainly takes a firewall rule decision model for SQL injection attack as an example, and regarding training manners, usage manners and the like of other types of preset decision models, reference may be made to an embodiment of a firewall rule decision model for SQL injection attack. This description is not repeated.

In some embodiments, when the preset decision model includes a firewall rule decision model for SQL injection attack, referring to fig. 3, the above processing a historical access request of a previous time period according to a preset feature processing rule to obtain a corresponding historical feature vector of the previous time period may include the following when implemented:

S1: determining a first feature by detecting whether SQL keywords exist in a historical access request of a previous time period according to a preset feature processing rule; determining and obtaining a second characteristic according to the request length of the historical access request of the previous time period; counting the number of the appointed characters contained in the historical access request of the last time period as a third characteristic;

s2: according to the first feature and the second feature, corresponding crossing features are obtained through feature crossing processing;

S3: according to the first feature and the third feature, obtaining corresponding aggregation features through feature aggregation processing;

s4: and generating a history feature vector of the last corresponding time period by combining the cross feature and the aggregation feature.

The preset feature processing rule may correspond to a network attack type, and may be used to process and extract a feature vector of the corresponding attack type.

Based on the above embodiment, according to the preset feature processing rule, by processing the historical access request in the previous time period, the historical feature vector corresponding to the SQL injection attack can be extracted, and the attack mode, the attack feature and the like of the SQL injection attack can be reflected more accurately and finely. Similarly, by using a preset feature processing rule corresponding to cross-site scripting attack or cross-site request forgery attack, a historical feature vector corresponding to cross-site scripting attack or cross-site request forgery attack can be extracted by processing a historical access request of a previous time period.

In some embodiments, in implementation, according to a preset SQL keyword table, whether the SQL keyword exists in the history access request of the previous time period may be determined by searching whether the matching field exists in the history access request of the previous time period. Generating a first value (e.g., 1) as a first feature upon determining that at least one SQL keyword is present in the historical access request for the last time period; conversely, upon determining that no SQL key exists in the history of access requests for the last time period, a second value (e.g., 0) is generated as the first feature.

The preset SQL keywords may specifically include a plurality of risk SQL keywords.

Specifically, a preset SQL keyword table may be constructed in the following manner: acquiring a history access request; screening historical access requests with SQL injection attack risk from the historical access requests to serve as positive sample requests, and simultaneously screening historical access requests without network attack risk to serve as negative sample requests; respectively carrying out field extraction processing on each positive sample request to obtain a corresponding positive sample field group, and simultaneously respectively carrying out field extraction processing on each negative sample request to obtain a corresponding negative sample field group; clustering the positive sample field groups to obtain common fields, combining the common fields to obtain a first keyword group, and clustering the negative sample fields to obtain common fields to obtain a second keyword group; comparing the second keyword group with the first keyword group to determine repeated keywords; removing repeated keywords from the first keyword group; and constructing a preset SQL keyword list according to the first removed keyword group.

Similarly, the historical access request of the previous time period may be searched according to a preset special character table, and the number of specified characters contained in the historical access request of the previous time period is counted as the third feature.

The predetermined special character table may include a plurality of special characters, such as ","; "etc. The above construction manner of the preset special character table may refer to a preset SQL keyword table, which is not described in detail in this specification.

In implementation, the request length of the history access request of the previous time period may be directly determined as the second feature.

In the specific implementation, considering the interaction relation between the first feature and the second feature when reflecting SQL injection attack, feature cross processing can be realized by multiplying the first feature and the second feature according to a preset feature processing rule, and the obtained product is used as a cross feature. The cross feature obtained in this way can be used for characterizing specific data values of the first feature and the second feature to a certain extent, and can be used for characterizing potential interaction relation between the first feature and the second feature.

In the specific implementation, considering the interaction relation between the first feature and the third feature when reflecting SQL injection attack, feature aggregation processing can be realized by adding the first feature and the second feature according to a preset feature processing rule, and the obtained sum is used as an aggregation feature. The obtained aggregate characteristic can be used for characterizing specific data values of the first characteristic and the third characteristic to a certain extent, and can be used for characterizing a potential combined action relationship between the first characteristic and the third characteristic.

In specific implementation, the cross feature and the aggregate feature may be combined to obtain a corresponding historical feature vector. In addition, the cross features and the aggregate features can be combined with the first features and/or the second features to obtain relatively more comprehensive historical feature vectors.

In specific implementation, for different types of network attacks and processing requirements, one or more of the following listed features can be extracted from the historical access request in the previous time period according to a preset feature processing rule to serve as a first feature, a second feature and a third feature: traffic type, source IP, destination IP, transport layer protocol, load characteristics; and combining the features according to a preset feature processing rule to obtain corresponding historical feature vectors.

In some embodiments, the method may further include the following when implemented: performing binarization processing on the second feature to obtain a binarized second feature;

Correspondingly, the obtaining the corresponding crossing feature through the feature crossing processing according to the first feature and the second feature may include: and according to the first characteristic and the binarized second characteristic, obtaining a corresponding crossing characteristic through characteristic crossing processing.

Based on the above embodiment, after the second feature is obtained, the binarization processing may be performed on the second feature, so as to unify the data value of the second feature to the scale adapted to the first feature and the third feature, so as to avoid errors caused by too large or too small data value of the second feature on the generated feature vector.

In particular, when the binarization processing is performed, the data value of the second feature may be numerically compared with a preset request length threshold (for example, 100 fields); according to the comparison result, when the data value of the second feature is larger than or equal to a preset request length threshold value, modifying the data value of the second feature to be 1 to serve as the binarized second feature; in contrast, when it is determined that the data value of the second feature is smaller than the preset request length threshold, the data value of the second feature is modified to 0 as the binarized second feature.

In addition, in some cases, the data value of the second feature may be mapped to a numerical range greater than or equal to 0 and less than or equal to1 according to a corresponding mapping rule, so as to implement normalization processing, and obtain a normalized second feature. At this time, the corresponding cross feature can be obtained through feature cross processing according to the first feature and the normalized second feature.

In some embodiments, in the implementation, the historical feature vector of the previous time period may be input as a model and input into a preset decision model; and running the preset decision model to obtain an output decision result; based on the decision result, a decision action with the highest rewards value (i.e., a preferred decision action) among a plurality of candidate decision actions for the current access traffic scenario can be determined as a target decision action.

The preset decision model at least comprises the following components: the states represent a network and a network of action values. Further, the action value network is at least connected with a first Q network structure and a preset firewall rule set.

Further, the preset firewall rule set may specifically include a firewall rule that is configured in advance and can cover each dimension. The first Q network structure may be specifically understood as a method for predicting and evaluating an effect of a candidate decision action output by the action value network; and determining a reward model of the reward value matched with the candidate decision action according to the evaluation result.

When the preset decision model is operated, the state representation network can map the input historical feature vector of the last time period into a corresponding state representation; the status representation is then transmitted to the action value network.

The action value network receives and learns the current latest access flow scene characteristics according to the state representation, and predicts a plurality of matched candidate decision actions; predicting the rewarding value of each candidate decision action by using the first Q network; and finally, outputting a plurality of candidate decision actions and corresponding rewards by the action value network as corresponding decision results.

In specific implementation, the action value network can utilize the first Q network to predict the effect of each candidate decision action when detecting and processing network attack after implementation, so as to obtain a corresponding prediction result; and determining the rewarding value of each candidate decision action according to the prediction result. For example, the first Q network increases the corresponding reward value according to the prediction result when the accuracy in detecting the network attack is high after determining that the candidate decision action is implemented; conversely, when the accuracy in detecting a network attack is low after the candidate decision action is determined to be implemented, the corresponding prize value is deducted. For another example, the first Q network increases the corresponding reward value when it is timely and accurate to process the detected network attack after determining that the candidate decision action is implemented according to the prediction result; in contrast, there is an error or delay in processing the detected network attack after the candidate decision action is determined to be implemented, deducting the corresponding prize value. Finally, the action value network counts and gives out final rewards of each candidate decision action through the first Q network.

Further, according to decision determination, one candidate decision action with the highest rewarding value can be screened out from a plurality of candidate decision actions to serve as a target decision action meeting requirements.

In some embodiments, the firewall rules of the adjustment system according to the target decision action may include the following when implemented:

The firewall rules of the system can be specifically understood that the firewall rules of the system are in an enabled state.

Specifically, for example, according to the target decision action, the firewall rule in the enabled state before is determined as the second type firewall rule, and the state parameter of the firewall rule is modified from the original enabled state to the disabled state. For another example, according to the target decision action, the firewall rule in the disabled state is determined to be the first type of firewall rule, and the state parameter of the firewall rule is modified from disabled to enabled. For another example, according to the target decision action, firewall rules which are in a disabled or enabled state before are determined as third type firewall rules; modifying rule parameters such as condition parameters, threshold parameters and the like in the firewall rules according to the target decision action; after the modification is completed, the state parameter of the firewall rule is set to be enabled, etc.

Based on the embodiment, according to the target decision action, the firewall rules of the system can be flexibly adjusted by adopting diversified adjustment modes so as to obtain firewall rules which are matched with the current access traffic scene and have relatively good effects.

In some embodiments, the preset firewall rule set may specifically include a plurality of firewall rule sets for different network attack types, for example, a firewall rule set for SQL injection attack, a firewall rule set for cross-site scripting attack, and a firewall rule set for cross-site request forgery attack.

Specifically, the preset firewall rule set at least includes a corresponding detection rule of the network attack, a corresponding processing rule of the network attack, and the like.

In some embodiments, prior to implementation, a preset firewall rule set may be constructed as follows: collecting detection records and processing records of corresponding network attacks; and clustering the detection records and the processing records respectively to determine the commonality rules for detection and the commonality rules for processing, and combining to obtain a corresponding preset firewall rule set.

In some embodiments, referring to fig. 4, after adjusting the firewall rules of the system according to the target decision action to obtain the firewall rules of the current time period, when implementing, the method may further include the following:

s1: detecting whether the access request in the current time period has a network attack risk according to the firewall rules in the current time period, and obtaining a corresponding detection result;

s2: and according to the detection result, under the condition that the access request in the current time period is determined to have network attack risk, correspondingly processing the access request according to the firewall rule in the current time period.

In some embodiments, referring to fig. 5, before implementation, the method may further include the following:

S1: constructing an initial decision model; wherein the initial decision model comprises at least an initial state representation network and an initial action value network; the initial action value network is at least connected with an initial first Q network structure;

s2: connecting the initial decision model with a preset firewall rule set; configuring a preset action selection rule for the initial decision model;

s3: acquiring a test sample request; processing the test sample request according to a preset feature processing rule to obtain a corresponding test sample feature vector;

S4: and carrying out multiple rounds of iterative training on the initial decision model by utilizing the characteristic vector of the test sample and the test sample request according to a preset training rule so as to obtain a preset decision model meeting the requirements.

Based on the embodiment, a decision model with smaller error and better effect can be obtained by performing multiple rounds of iterative training according to a preset training rule and used as a preset decision model meeting the requirements.

The preset training rule may specifically be a model training rule based on a Deep Q Network (DQN, etc.) algorithm (Deep Reinforcement Learning), which is obtained after modification and is suitable for firewall rule adjustment in the Network security field.

When an initial decision model is specifically built, a state space (STATE SPACE) corresponding to the initial state representation network can be defined according to the initial state representation network and a preset characteristic processing rule; meanwhile, according to the initial Action value network and a preset firewall rule set, an Action Space (Action Space) corresponding to the initial Action value network is defined. And a mapping relation between the state space and the action space is constructed.

When an initial action value network is specifically built, two same nonlinear change structures which at least comprise 3 convolution layers and 2 connection layers can be firstly built and respectively marked as a first Q network structure and a second Q network structure; and the first Q network structure is connected with the initial action value network, and the second Q network structure is independent and is not connected with the initial action value network. Meanwhile, a preset action selection rule can be configured for the initial decision model, so that the initial action value network can select decision actions tried as many as possible according to the preset action selection rule.

Wherein the first Q network structure is used to replace the Q function. The preset action selection rules may specifically include selection rules based on an Epsilon-Greedy policy. Based on the preset action selection rule, a certain probability (epsilon) of meeting randomly selects actions so as to explore new actions; and then selecting the action with the highest current rewarding value as the last recommended action with the probability of 1-epsilon.

In some embodiments, referring to fig. 6, according to a preset training rule, by using a test sample feature vector and a test sample request, the initial decision model is subjected to multiple rounds of iterative training, and when the method is specifically implemented, the current round of iterative training may be performed according to the preset training rule according to the following manner:

s1: acquiring a decision model of the previous round;

S2: processing the characteristic vector of the test sample of the previous round by using the decision model of the previous round, and determining the decision action of the current round; the characteristic vector of the test sample of the previous round is generated according to the test sample request of the previous round;

S3: determining firewall rules of the current wheel according to the decision action of the current wheel;

s4: detecting and processing a test sample request of the current wheel according to the firewall rule of the current wheel to obtain a detection processing result of the current wheel;

s5: generating feedback data of the current wheel according to the detection processing result of the current wheel;

S6: and according to the feedback data of the current round, adjusting the model parameters of the decision model of the previous round to obtain the decision model of the current round.

According to the mode, iterative training of the current wheel can be completed, and a decision model of the current wheel is obtained.

After the decision model of the current wheel is obtained, detecting whether an ending condition is met; ending the iterative training under the condition that the ending condition is met; determining a decision model of the current wheel as a preset decision model meeting the requirements; if it is determined that the end condition is not satisfied, the above steps may be repeated to continue with the next iteration of training until the end condition is satisfied.

In the specific implementation, a technician can be used as an observer to check the detection processing result of the current wheel in combination with the test sample request of the current wheel to obtain a corresponding check result; and generating corresponding feedback data according to the checking result.

Specifically, based on the corresponding feedback rule, judging whether a detection result is accurate when detecting whether a network attack risk exists in a current round of test sample request based on the current round of firewall rule according to the detection result; the detection result is accurate, and a corresponding first rewarding score is set; setting a corresponding first punishment point when the detection result is wrong; meanwhile, the accuracy and timeliness of processing the test sample request with the network attack risk based on the firewall rule of the current round can be judged according to the detection result based on the corresponding feedback rule; the accuracy is high, the timeliness is good, and corresponding second rewards are set; the timeliness is poor (for example, delay occurs in processing), and a corresponding second punishment point is set according to the delay condition; poor accuracy (e.g., false processing occurs), setting a corresponding third penalty score according to the degree of deviation of the processing; and then synthesizing the reward points and the punishment points to obtain corresponding and more effective feedback data.

In specific implementation, the loss function value of the model can be calculated according to the feedback data; and then, according to the loss function value, adjusting the model parameters of the decision model of the previous round to obtain the decision model of the current round.

In the specific implementation, when whether the ending condition is met or not is specifically determined, whether the number of the current wheels reaches a preset value of the preset iterative training wheel number or not can be detected according to a preset training rule; and when the number of the current wheels reaches the preset value of the preset iterative training number, determining that the ending condition is met.

When the method is concretely implemented, whether the ending condition is met or not is specifically determined, and the current loss function value can be calculated based on the prediction model of the current wheel according to a preset training rule; detecting whether the current loss function value is smaller than or equal to a preset loss function threshold value; and when the current loss function value is less than or equal to the preset loss function threshold value, determining that the ending condition is met.

In the implementation, when whether the ending condition is met is determined specifically, a plurality of test sample requests can be randomly extracted from the test sample requests of the current round to serve as test requests; then, the inspection test request is processed according to a preset feature processing rule to obtain a corresponding inspection feature vector; the decision model of the current wheel is utilized to process the checking feature vector, and corresponding checking decision actions are obtained; according to the checking decision action, the firewall rules of the system are adjusted to obtain checking firewall rules; judging whether the difference value between the firewall rules of the current round and the check firewall rules is smaller than or equal to a preset tolerance threshold; determining that the ending condition is met under the condition that the difference value is smaller than or equal to a preset tolerance threshold value; in contrast, in the case where it is determined that the difference value is greater than the preset tolerance threshold, it is determined that the end condition is not satisfied.

In some embodiments, referring to fig. 7, the above-mentioned adjusting the model parameters of the decision model of the previous round according to the feedback data of the current round may include the following when implemented:

S1: detecting whether the current wheel meets a preset reset condition;

S2: under the condition that the current round meets the preset reset condition is determined, according to the feedback data of the current round, the network parameters of the network and the network parameters of the action value network are represented by the state in the decision model of the previous round, and the network parameters of the second Q network structure are adjusted; wherein the second Q network structure corresponds to the first Q network structure; the second Q network structure is not connected with the decision model of the previous round;

S3: and copying the adjusted second Q network structure to the first Q network structure to obtain a decision model of the current round.

The above meeting the preset reset condition may specifically mean that the number of the accumulated rounds of iterative training reaches the preset reset round value from the last reset time to the current round.

The first Q network structure is connected with an action value network and directly participates in the operation of the prediction model; the second Q network structure is not connected to the network of actions, independent of the predictive model.

Specifically, for example, the preset reset wheel value may be 10 rounds. Namely, from the last reset time, the number of accumulated rounds is increased by 1 every time the iterative training of one round is completed; when the number of accumulated wheels reaches 10 by the time of detecting the current wheel, the preset reset condition is determined to be met.

In some embodiments, in the case of detecting whether the current wheel meets the preset reset condition, the method may further include the following when implemented:

Based on the above embodiment, when the current round of iterative training is found to not meet the preset reset condition during each time of model parameter modification, only the network parameters of the state representation network and the network parameters of the action value network in the decision model are adjusted, the network parameters of the first Q network are not adjusted, but the network parameters of the second Q network which is not connected with the action value network and does not participate in the operation of the prediction model are adjusted. Therefore, the first Q network directly participating in the operation of the prediction model can be fixed and only the second Q network is updated during the period that the preset reset condition is not met, so that the complexity in model training is simplified, and the related data processing capacity is reduced. When the current round is found to meet the preset reset condition, the state in the whole decision model represents the network parameters of the network, the network parameters of the action value network and the network parameters of the second Q network structure, and the latest second Q network structure after adjustment is directly copied to the first Q network structure, so that the adjustment and update of the first Q network structure are realized. Therefore, the first Q network structure can be regularly adjusted and updated under the condition that the preset reset condition is met, so that the decision model can be updated at relatively low data processing cost, and the iterative training process is relatively more stable.

In some embodiments, the method may further include the following when implemented:

Specifically, when the current round of iterative training is performed, 1 or more pieces of experience data can be randomly extracted from a preset buffer area and added into the current round of iterative training, so that the situation that the model training is too focused on samples involved in the current round of iterative training and overfitting occurs is avoided.

In addition, in the process of model multi-round iterative training, when a specific trigger event is currently detected (for example, a loss function value calculated in a certain round of iterative training is greater than a certain specific value, etc.), random extraction of a plurality of experience data from a preset buffer area can be triggered; and utilizing the plurality of experience data to adaptively adjust and train the current model.

In some embodiments, after detecting and processing the access request received in the current time period according to the firewall rule of the current time period, when the method is implemented, the method may further include: and recording the access request of the current time period and the detection processing result of the access request of the current time period.

And further, the firewall rule of the next time period can be obtained and determined by utilizing a preset prediction model according to the access request of the current time period.

In addition, the recorded detection processing result of the access request can be checked and evaluated regularly to obtain corresponding feedback data; and then collecting and utilizing the recorded access request, the detection processing result related to the access request and corresponding feedback data to train and update the currently used preset prediction model so as to continuously improve the model precision of the used preset prediction model, and enable the used preset prediction model to be better matched with the current latest access flow scene.

Specifically, for example, the access request recorded in the latest specified time period, the detection processing result about the access request, and the corresponding feedback data may be acquired every specified time period (for example, 1 month), and the preset prediction model used currently may be trained and updated, so as to implement the timing update of the preset decision model in the application process.

From the above, according to the data processing method for access request provided in the embodiment of the present disclosure, before implementation, a preset decision model with a better effect may be obtained through multiple rounds of iterative training according to a preset training rule. In the implementation, the history access request of the previous time period can be acquired first; according to a preset feature processing rule, a corresponding historical feature vector of the previous time period is obtained by processing a historical access request of the previous time period; then, a historical feature vector of the last time period is processed by using a preset decision model to obtain a corresponding decision result; according to the decision result, dynamically adjusting the firewall rules of the system to obtain the firewall rules of the current time period which are matched; and further, the firewall rule of the current time period can be utilized to detect and process the access request received by the system of the current time period. The method comprises the steps of firstly processing a historical feature vector of a last time period obtained based on a historical access request of the last time period by using a preset decision model, and learning and determining a proper decision result according to a latest network attack mode and features of the last time period; according to the decision result, intelligently and dynamically adjusting the firewall rules of the system to obtain firewall rules of the current time period which are matched with the current access flow scene; and further, the firewall rule detection of the current time period can be utilized to process the access request received in the current time period. Therefore, the method can effectively reduce the false alarm rate and the false alarm rate of network attack detection, accurately identify the access requests with network attack risk, and timely and pointedly process the access requests, thereby better protecting the data security of the system.

Referring to fig. 8, another method for processing data of an access request is also provided in the present specification. The method can be implemented by the following steps:

s801: acquiring a historical access request of a previous time period;

S802: according to a preset feature processing rule, processing a historical access request of the previous time period to obtain a corresponding historical feature vector of the previous time period;

S803: processing the historical feature vector of the last time period by using the comprehensive decision model to obtain a corresponding comprehensive decision result; the comprehensive decision model at least comprises a preset first decision model, a preset second decision model and a preset third decision model which are connected in parallel; the preset first decision model, the preset second decision model and the preset third decision model are respectively a firewall rule decision model aiming at SQL injection attack, a firewall rule decision model aiming at cross-site script attack and a firewall rule decision model aiming at cross-site request fake attack; the comprehensive decision model is obtained through multiple rounds of iterative training according to a preset training rule;

s804: according to the comprehensive decision result, determining candidate decision actions meeting the requirements as target decision actions;

s805: according to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

The preset characteristic processing rule comprises the following steps: feature processing rules for SQL injection attacks, feature processing rules for cross-site scripting attacks, feature processing rules for cross-site request forgery attacks.

Accordingly, the obtained historical feature vector of the previous time period may include: a plurality of historical feature vectors such as a historical feature vector for SQL injection attack, a historical feature vector for cross-site scripting attack, a historical feature vector for cross-site request forgery attack and the like.

In the implementation, when the historical feature vector of the previous time period is generated, the attack type of the network attack corresponding to each of the historical feature vectors is marked.

After the preset decision model receives the input historical feature vector of the last time period, a plurality of historical feature vectors can be respectively input into the corresponding decision model for processing according to the marked attack type.

Correspondingly, the comprehensive decision result output by the preset decision model comprises: the method comprises the steps of outputting a first decision result aiming at SQL injection attack by a preset first decision model, outputting a second decision result aiming at cross-site scripting attack by a preset second decision model, and outputting a third decision result aiming at cross-site request forgery attack by a preset third decision model.

In the implementation, according to the first decision result, the second decision result and the third decision result, corresponding target decision actions can be respectively determined and used; and respectively adjusting a first firewall rule subset aiming at SQL injection attack, a second firewall rule subset aiming at cross-site script attack and a third firewall rule subset aiming at cross-site request fake attack in a preset firewall rule set so as to finally obtain firewall rules which can be matched with the current latest access traffic scene and can simultaneously detect and process various different types of network attacks in the current time period.

And then the access request received in the current time period can be detected and processed according to the firewall rules in the current time period.

In some embodiments, the integrated decision model may further include other decision models for other types of network attacks in addition to the preset first decision model, the preset second decision model, and the preset third decision model.

Based on the embodiment, various network attacks of different types are comprehensively and accurately identified and found, targeted processing is timely and effectively performed, and the data security of the system can be comprehensively protected.

The embodiment of the present disclosure further provides a server, which may be referred to as fig. 9. The server may specifically include a network communication port 901, a processor 902, and a memory 903, where the foregoing structures are connected by an internal cable, so that each structure may perform specific data interaction.

The network communication port 901 may be specifically configured to obtain a historical access request of a previous time period.

The processor 902 may be specifically configured to process the history access request of the previous time period according to a preset feature processing rule, so as to obtain a corresponding history feature vector of the previous time period; processing the historical feature vector of the last time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule; according to the decision result, determining candidate decision actions meeting the requirements as target decision actions; according to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

The memory 903 may be specifically configured to cache a historical access request, a decision result, a preset firewall rule set of a previous time period, and store a corresponding instruction program.

In this embodiment, the network communication port 901 may be a virtual port that binds with different communication protocols, so that different data may be sent or received. For example, the network communication port may be a port responsible for performing web data communication, a port responsible for performing FTP data communication, or a port responsible for performing mail data communication. The network communication port may also be an entity's communication interface or a communication chip. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it may also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 902 may be implemented in any suitable manner. For example, a processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, among others. The description is not intended to be limiting.

In this embodiment, the memory 903 may include multiple levels, and in a digital system, the memory may be any memory as long as it can hold binary data; in an integrated circuit, a circuit with a memory function without a physical form is also called a memory, such as a RAM, a FIFO, etc.; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card, and the like.

Based on the algorithm, the related structural performance of the server can be effectively utilized, the data processing speed of the electronic equipment is improved, and the data processing of the access request is efficiently realized.

The embodiments of the present specification also provide a computer-readable storage medium storing computer program instructions for implementing a data processing method based on the above access request, when the computer program instructions are executed: acquiring a historical access request of a previous time period; according to a preset feature processing rule, processing a historical access request of the previous time period to obtain a corresponding historical feature vector of the previous time period; processing the historical feature vector of the last time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule; according to the decision result, determining candidate decision actions meeting the requirements as target decision actions; according to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

In the present embodiment, the storage medium includes, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), a Cache (Cache), a hard disk (HARD DISK DRIVE, HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects of the program instructions stored in the computer readable storage medium may be explained in comparison with other embodiments, and are not described herein.

The embodiments of the present specification also provide a computer program product comprising at least a computer program which, when executed by a processor, performs the following method steps: acquiring a historical access request of a previous time period; according to a preset feature processing rule, processing a historical access request of the previous time period to obtain a corresponding historical feature vector of the previous time period; processing the historical feature vector of the last time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule; according to the decision result, determining candidate decision actions meeting the requirements as target decision actions; according to the target decision action, adjusting the firewall rules of the system to obtain the firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

Referring to fig. 10, the embodiment of the present disclosure further provides a data processing apparatus for an access request, where the apparatus may specifically include the following structural modules:

The obtaining module 1001 may be specifically configured to obtain a historical access request of a previous time period;

The processing module 1002 may be specifically configured to process the history access request of the previous time period according to a preset feature processing rule, so as to obtain a corresponding history feature vector of the previous time period;

the decision module 1003 may specifically be configured to process the historical feature vector of the previous time period by using a preset decision model to obtain a corresponding decision result; the preset decision model at least comprises a state representation network and an action value network; the state representation network is used for mapping the historical feature vector of the last time period into a corresponding state representation; the action value network is used for determining a plurality of candidate decision actions according to the state representation and rewarding values corresponding to the candidate decision actions; the preset decision model is obtained through multiple rounds of iterative training according to a preset training rule;

The determining module 1004 may be specifically configured to determine, according to the decision result, a candidate decision action meeting the requirement as a target decision action;

The adjusting module 1005 may be specifically configured to adjust firewall rules of the system according to the target decision action to obtain firewall rules in the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

In some embodiments, the preset decision model may specifically include at least one of the following: firewall rule decision models for SQL injection attacks, firewall rule decision models for cross-site scripting attacks, firewall rule decision models for cross-site request forgery attacks, etc.

In some embodiments, when the preset decision model includes a firewall rule decision model for SQL injection attack, the processing module 1002 may process the historical access request of the previous time period according to the preset feature processing rule according to the following manner, to obtain the corresponding historical feature vector of the previous time period: determining a first feature by detecting whether SQL keywords exist in a historical access request of a previous time period according to a preset feature processing rule; determining and obtaining a second characteristic according to the request length of the historical access request of the previous time period; counting the number of the appointed characters contained in the historical access request of the last time period as a third characteristic; according to the first feature and the second feature, corresponding crossing features are obtained through feature crossing processing; according to the first feature and the third feature, obtaining corresponding aggregation features through feature aggregation processing; and generating a history feature vector of the last corresponding time period by combining the cross feature and the aggregation feature.

In some embodiments, the processing module 1002 may be further configured to: performing binarization processing on the second feature to obtain a binarized second feature; and obtaining corresponding crossing features through feature crossing processing according to the first features and the binarized second features.

In some embodiments, the foregoing adjustment module 1005, when embodied, may adjust the firewall rules of the system according to the target decision actions in at least one of the following ways: determining a first type of firewall rule in a preset firewall rule set; setting the state parameters of the first type firewall rules to be enabled; determining a second type of firewall rule in a preset firewall rule set; setting the state parameters of the second firewall rules to be forbidden; determining a third type of firewall rule in a preset firewall rule set; and modifying the third type firewall rules, and setting the state parameters of the modified third type firewall rules to be enabled.

In some embodiments, the apparatus may specifically further include a training module to construct an initial decision model; wherein the initial decision model comprises at least an initial state representation network and an initial action value network; the initial action value network is at least connected with an initial first Q network structure; connecting the initial decision model with a preset firewall rule set; configuring a preset action selection rule for the initial decision model; acquiring a test sample request; processing the test sample request according to a preset feature processing rule to obtain a corresponding test sample feature vector; and carrying out multiple rounds of iterative training on the initial decision model by utilizing the characteristic vector of the test sample and the test sample request according to a preset training rule so as to obtain a preset decision model meeting the requirements.

In some embodiments, when the training module is specifically implemented, the current round of iterative training in the multiple rounds of iterative training may be performed according to a preset training rule in the following manner: acquiring a decision model of the previous round; processing the characteristic vector of the test sample of the previous round by using the decision model of the previous round, and determining the decision action of the current round; the characteristic vector of the test sample of the previous round is generated according to the test sample request of the previous round; determining firewall rules of the current wheel according to the decision action of the current wheel; detecting and processing a test sample request of the current wheel according to the firewall rule of the current wheel to obtain a detection processing result of the current wheel; generating feedback data of the current wheel according to the detection processing result of the current wheel; and according to the feedback data of the current round, adjusting the model parameters of the decision model of the previous round to obtain the decision model of the current round.

In some embodiments, when the training module is specifically implemented, the model parameters of the decision model of the previous round may be adjusted according to the feedback data of the current round in the following manner: detecting whether the current wheel meets a preset reset condition; under the condition that the current round meets the preset reset condition is determined, according to the feedback data of the current round, the network parameters of the network and the network parameters of the action value network are represented by the state in the decision model of the previous round, and the network parameters of the second Q network structure are adjusted; wherein the second Q network structure corresponds to the first Q network structure; the second Q network structure is not connected with the decision model of the previous round; and copying the adjusted second Q network structure to the first Q network structure to obtain a decision model of the current round.

In some embodiments, in a case of detecting whether the current wheel meets a preset reset condition, the training module may be further configured to: and under the condition that the current wheel does not meet the preset reset condition, adjusting the network parameters of the state representation network and the network parameters of the action value network in the decision model of the previous wheel and the network parameters of the second Q network structure according to the feedback data of the current wheel to obtain the decision model of the current wheel.

In some embodiments, the training module, when embodied, may also be configured to: the method comprises the steps of obtaining a characteristic vector of a test sample of a previous round, a test sample request of a current round and feedback data of the current round to be combined to serve as experience data; and storing the experience data into a preset buffer area.

The present specification also provides another data processing apparatus for an access request, including: the acquisition module is used for acquiring a historical access request of the previous time period; the processing module is used for processing the history access request of the previous time period according to a preset characteristic processing rule to obtain a corresponding history characteristic vector of the previous time period; the decision module is used for processing the historical feature vector of the last time period by utilizing the comprehensive decision model to obtain a corresponding comprehensive decision result; the comprehensive decision model at least comprises a preset first decision model, a preset second decision model and a preset third decision model which are connected in parallel; the preset first decision model, the preset second decision model and the preset third decision model are respectively a firewall rule decision model aiming at SQL injection attack, a firewall rule decision model aiming at cross-site script attack and a firewall rule decision model aiming at cross-site request fake attack; the comprehensive decision model is obtained through multiple rounds of iterative training according to a preset training rule; the determining module is used for determining candidate decision actions meeting the requirements as target decision actions according to the comprehensive decision results; the adjusting module is used for adjusting firewall rules of the system according to the target decision action to obtain firewall rules of the current time period; the firewall rule of the current time period is used for detecting and processing the access request received in the current time period.

It should be noted that, the units, devices, or modules described in the above embodiments may be implemented by a computer chip or entity, or may be implemented by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

From the above, the data processing device for access requests provided in the embodiments of the present disclosure can effectively reduce the rate of missing report and false report of network attack detection, accurately identify access requests for which a risk of network attack is found, and timely and pointedly process such access requests, so as to better protect the data security of the system.

In a specific scenario example, the data processing method of the access request provided in the specification can be applied to realize automatic optimization adjustment of the Web application firewall rule.

In this scenario example, reinforcement learning may be specifically combined with WAF rule (e.g., firewall rules of a network security device) optimization (resulting in preset training rules). The optimal strategy may be learned by the interaction of an agent (e.g., a preset decision model) with the environment to obtain a maximized jackpot. In the WAF rule optimization process, the WAF system may be regarded as an enhanced learning environment, with network traffic data (e.g., access requests) as an environmental state, and the WAF rule as an agent's policy. The agent may choose which rule to execute based on the current state and then observe feedback (rewards) of the environment to update the policy. In particular, the method may include the following steps.

S1: feature extraction and conversion (i.e., processing the access request according to a preset feature processing rule to extract the corresponding feature vector).

In this scenario embodiment, meaningful feature vectors may be extracted from the original network traffic data (e.g., historical access requests for the last time period) by preprocessing. These feature vectors may be for traffic types, source IP, destination IP, transport layer protocols, payload features, etc. The network traffic data is then converted into a data form suitable for the input model (e.g., preset decision making). The specific process can be seen in fig. 11.

Feature extraction and transformation will be specifically described below using SQL injection as an example.

In this scenario example, the following three features may be used to represent whether an HTTP request has an SQL injection vulnerability:

Feature 1 (e.g., first feature): whether or not to contain SQL keywords (a binary feature): 0 indicates that no SQL keyword is included in the request, and 1 indicates that no SQL keyword is included in the request.

Feature 2 (e.g., second feature): request length (a numerical type feature) is used to represent the length of the request.

Feature 3 (e.g., three features): the number of special characters (a numerical type feature) is used to indicate the number of special characters in the request, such as', "; "etc.

In practice, the above features may be subjected to feature transformation and feature combination in the following manner to obtain a desired feature vector.

The feature conversion may specifically include: binarization and/or normalization.

The binarization may be, for example, binarizing the feature 2 (request length), and it is assumed that a request with a length greater than 100 is marked with 1 and a request with a length not greater than 100 is marked with 0.

Normalization, for example, may specifically be normalization of feature 2 (request length), converting the length to a value between 0 and 1.

The above feature combination may specifically include: feature intersection and feature aggregation.

The feature intersection may be, for example, feature 1 (including an SQL keyword) and feature 2 (request length) are feature-intersected to generate a new combined feature, which indicates whether the SQL keyword is included or not and the request length is greater than 100.

Feature aggregation may be, for example, feature 1 (including SQL keywords) and feature 3 (including the number of special characters) are feature aggregated, and the sum of the two features is calculated to obtain a new feature representing the number of requests including SQL keywords.

Referring to table 1, a table may be used to show the original value of each feature and the new feature values after conversion and combination.

TABLE 1

In table 1, each line may represent characteristic information of one HTTP request. Original features include feature 1 (containing SQL keywords), feature 2 (request length), and feature 3 (number of special characters). Then, feature 2 may be binarized and normalized, resulting in binarized feature 2 and normalized feature 2. Then, feature crossing and feature aggregation are performed, and new features after feature crossing and new features after feature aggregation are generated.

S2: the DQN network structure is built (i.e. an initial decision model is built according to preset training rules).

In this scenario example, deep Q Network (DQN) may be employed as an reinforcement learning algorithm to build a state representation Network and an action value Network (build an initial decision model). Wherein the state representation network is adapted to process the feature vector and map it to a state representation. The action value network is used to estimate the value (Q value) of each action, executing the expected rewards of a certain rule. The following is a detailed description of the construction of the DQN network structure, taking the example of an SQL injection attack.

S2-1: a state space (STATE SPACE) is defined.

Specifically, for SQL injection attack detection, the state space may include the following features: whether the binary feature of the SQL keyword is contained in the HTTP request (e.g., 0 indicates not contained, 1 indicates contained). Normalized feature of the request length scales the request length to a value between 0 and 1. The process of defining the state space can be seen from table 2.

TABLE 2

S2-2: a state space (STATE SPACE) is defined.

Specifically, for WAF rule optimization, the action space may include the following operations: enabling or disabling specific WAF rules for SQL injection attacks; the matching threshold of the WAF rules is adjusted to allow or prevent certain types of SQL injection attacks, etc. The process of defining the state space can be seen from table 3.

TABLE 3 Table 3

S2-3: the neural network structure is designed (e.g., an initial decision model is built).

In this field example, a neural network may be used to approximate the Q function based on DQN, and in particular a multi-layer fully connected neural network (e.g., a first Q network structure) may be designed to achieve this goal. The input layer receives the characteristic vector of the state space as input, and the output layer is the Q value corresponding to each action.

S2-4: an experience playback buffer (Experience Replay Buffer) is set.

In this scenario example, each observed state, action, reward, and next state (corresponding to one experience data) may be stored in an experience playback buffer (e.g., a preset buffer) while interacting with the environment during training.

S2-5: training and Q value updating are carried out.

In this scenario example, a set of samples may be randomly sampled from an empirical playback buffer, a target Q value (e.g., a first Q network) and a current Q value (e.g., a second Q network) are calculated, and then the neural network parameters are updated using a mean square error or other loss function so that the predicted Q value is closer to the target Q value.

S2-6: setting an Epsilon-Greedy policy (e.g., configuring action selection rules).

In this scenario example, an Epsilon-Greedy strategy may be used to select actions during the training process. For example, actions may be randomly selected with a certain probability (epsilon) in order to explore new strategies; the action with the highest current Q value is selected with a probability of 1-epsilon to take advantage of the known optimization strategy.

S2-7: and determining a target Q network.

In this scenario example, a target Q network and a primary Q network are introduced to improve the stability of training. The target Q network is identical to the main Q network in structure, but its parameters are not updated in real time, but are periodically copied from the main Q network to reduce the variance of the target Q value estimate.

S3: training agents (i.e., iterative training of an initial decision model according to preset training rules).

In this scenario example, an reinforcement learning algorithm (e.g., a preset training rule) may be used such that the agent is constantly interacting with the environment, learning an optimal rule selection strategy based on the reward signal. The training process involves a large number of iterations and sample sampling. And the intelligent agent selects to execute the action according to the current state and the action value network, and classifies the request in the network traffic data into a normal request or a malicious attack. Then, comparing with the real result, calculating rewards and updating parameters in the action value network by using gradient descent and other methods. The specific rewards may be as shown with reference to fig. 12.

Based on the above manner, the mapping relationship between the state space and the corresponding action selection can be learned and established in the learning and training process, and the table 4 can be referred to.

TABLE 4 Table 4

S4: and the adaptive function is realized.

In this scenario example, to implement the adaptation function, the training data may be updated periodically and the DQN model retrained. As network traffic and attack types change over time, the agent needs to adapt to new attack patterns and environmental changes. A period may be set, such as retraining at intervals or upon triggering of a particular event, to enable the WAF system to accommodate new attacks.

S5: online deployment and application.

In this scenario example, trained agents (e.g., preset decision models) may be deployed online into the actual WAF system, detecting and classifying network traffic in real-time. The WAF dynamically adjusts firewall rules according to the output of the intelligent agent, and realizes a self-adaptive defense strategy. The agent is able to self-optimize over time to accommodate new attack types and changes in network environment. See fig. 13.

In general, the application and implementation of reinforcement learning in WAF rule optimization involves building an agent, defining a reward function, selecting an appropriate reinforcement learning algorithm (e.g., DQN), and performing adaptive training. In this way, the WAF system can constantly learn and optimize the defense rules, improving the detection and defense capabilities for new attacks.

Through the scene example, the data processing method of the access request provided by the specification is verified, the limitation of the traditional WAF rule set can be effectively solved, and the self-adaptive defense to novel network attacks is realized. By learning the actual network traffic data, the WAF can continuously optimize rules, thereby reducing false alarm rate and false miss rate and improving the detection and blocking capability of various network attacks. In addition, the technical scheme can provide more intelligent network security protection, so that the WAF has higher-level autonomous learning and adaptation capability, and the security and reliability of the Web application program are improved.

Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. The terms first, second, etc. are used to denote a name, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer-readable storage media including memory storage devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present description may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be embodied essentially in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present specification.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The specification is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Although the present specification has been described by way of example, it will be appreciated by those skilled in the art that there are many variations and modifications to the specification without departing from the spirit of the specification, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the specification.

Claims

1. A method of processing data for an access request, comprising:

acquiring a historical access request of a previous time period;

2. The method of claim 1, wherein the predetermined decision model comprises at least one of: firewall rule decision model for SQL injection attack, firewall rule decision model for cross-site scripting attack, firewall rule decision model for cross-site request forgery attack.

3. The method according to claim 2, wherein, in the case where the preset decision model includes a firewall rule decision model for SQL injection attack, processing the historical access request of the previous time period according to a preset feature processing rule to obtain a corresponding historical feature vector of the previous time period, including:

4. A method according to claim 3, characterized in that the method further comprises:

5. The method of claim 1, wherein adjusting firewall rules of the system based on the target decision action comprises:

6. The method according to claim 1, wherein the method further comprises:

7. The method of claim 6, wherein performing multiple rounds of iterative training on the initial decision model using the test sample feature vector and the test sample request according to a preset training rule comprises:

Acquiring a decision model of the previous round;

8. The method of claim 7, wherein adjusting model parameters of the decision model of the previous round based on feedback data of the current round comprises:

detecting whether the current wheel meets a preset reset condition;

9. The method according to claim 8, wherein in case of detecting whether the current wheel satisfies a preset reset condition, the method further comprises:

10. The method of claim 7, wherein the method further comprises:

11. A data processing apparatus for an access request, comprising:

12. A server comprising a processor and a memory for storing processor-executable instructions, which when executed by the processor implement the steps of the method of any one of claims 1 to 10.

13. A computer readable storage medium, having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any one of claims 1 to 10.