CN116346397A

CN116346397A - Network request abnormality detection method and device, equipment, medium and product thereof

Info

Publication number: CN116346397A
Application number: CN202211643742.7A
Authority: CN
Inventors: 吴智东
Original assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Current assignee: Guangzhou Huanju Shidai Information Technology Co Ltd
Priority date: 2022-12-20
Filing date: 2022-12-20
Publication date: 2023-06-27

Abstract

The application relates to a network request abnormality detection method and a device, equipment, medium and product thereof, wherein the method comprises the following steps: acquiring slice log data corresponding to preset duration, wherein the slice log data belongs to incremental data in an application log file and comprises event log records corresponding to network requests of a plurality of source addresses; performing feature extraction on event log records in the slice log data to obtain slice feature vectors corresponding to each source address; and inputting the slice feature vectors into an anomaly identification model, and identifying the safety degree of the source address corresponding to each slice feature vector. According to the method and the system, the context network request of each source address is acquired through the slice log data in the stepwise increment mode, the source feature vector representing the relevant context information is obtained, the corresponding safety degree of each source address is determined through the anomaly identification model under the help of the semantics provided by the context information, and the system is more accurate.

Description

Network request abnormality detection method and device, equipment, medium and product thereof

Technical Field

The present disclosure relates to network security technologies, and in particular, to a method and apparatus for detecting network request abnormality, a device, a medium, and a product thereof.

Background

With the continuous development of information technology, network security problems are more and more prominent. Many attackers exploit vulnerabilities of Web applications to request intrusion into the Web application through fake networks, such as common SQL injection, XSS attacks, etc. After the network attack is successful, the enterprise information can be stolen, and the normal service capability of enterprise application can be destroyed. The security problems can affect the normal operation of the Internet platform, and bring benefit loss which is difficult to estimate to the related platform. Therefore, it is important to monitor whether network access is abnormal by technical means.

Conventional technologies for identifying security of network access, common methods include blocking access behaviors of an attacker by a method of blocking source IP, and identifying abnormal requests by regular matching specific request parameters. The methods have advantages and disadvantages, the method for sealing the IP is easy to intercept the normal user access by mistake, and the method based on rule matching is easy to cause a large number of misjudgments. Many network attacks are achieved by cooperation of multiple network requests, and the method for identifying anomalies for a single request can only cover part of anomalous behaviors.

It can be seen that whatever the attack mode, once successful, the normal services of the platform application are affected. Therefore, how to improve the accuracy and the systematicness of the recognition of the abnormal request is particularly important.

Disclosure of Invention

An object of the present application is to solve the above-mentioned problems and provide a network request anomaly detection method and corresponding apparatus, device, non-volatile readable storage medium, and computer program product.

According to one aspect of the present application, there is provided a network request anomaly detection method, including the steps of:

acquiring slice log data corresponding to preset duration, wherein the slice log data belongs to incremental data in an application log file and comprises event log records corresponding to network requests of a plurality of source addresses;

performing feature extraction on event log records in the slice log data to obtain slice feature vectors corresponding to each source address;

and inputting the slice feature vectors into an anomaly identification model, and identifying the safety degree of the source address corresponding to each slice feature vector.

Optionally, feature extraction is performed on the event log record in the slice log data to obtain slice feature vectors corresponding to each source address, including:

For the slice log data, dividing an event log record set of each source address by taking the source address of the event log record in the slice log data as a unit;

based on the same characteristics of the event log records, carrying out characteristic induction on each event log record set to obtain characteristic values of each characteristic in each source address;

and constructing a slice feature vector of the corresponding source address according to the feature value of each source address.

Optionally, the same feature includes any of a source address access feature, a user agent feature, a service host feature, an access address feature, a request method feature, a request jump feature, a request status feature, a security parameter feature.

Optionally, inputting the feature vectors of each slice into an anomaly identification model, and after identifying the security degree of the source address corresponding to each feature vector of each slice, the method includes:

determining corresponding security state labels according to the security degree of each source address, wherein optional members of the security state labels comprise security labels and abnormal labels;

submitting the source address marked as the abnormal label to a network security identification interface to request confirmation;

adding the source address confirmed to the blacklist, rejecting subsequent network requests from the source address in the blacklist.

determining corresponding safety state labels according to the safety degree of each source address, forming a training sample by the slice feature vector of each source address, and mapping and storing the training sample and the safety state label thereof in a training data set of the anomaly identification model;

restarting training of the anomaly identification model by adopting the training data set in response to a timing arrival event of a timing task, and retraining the anomaly identification model to convergence;

and (5) putting the retrained abnormal recognition model on the line again instead of the original abnormal recognition model.

Optionally, before obtaining the slice log data corresponding to the preset duration, the method includes:

invoking a single training sample in a training data set and a pre-marked security state label thereof, wherein the training sample comprises a slice feature vector of a single source address obtained by feature extraction from slice log data of preset duration, and optional members of the security state label comprise a security label and an abnormal label;

inputting the training sample into the abnormal recognition model to perform training, and obtaining a safety state label predicted by the abnormal recognition model;

And calculating a classification loss value of the predicted safety state label by adopting the safety state label pre-marked by the training sample, carrying out gradient update on the abnormal recognition model according to the classification loss value, and iterating the above processes until the abnormal recognition model is judged to reach a convergence state according to the classification loss value.

Optionally, before invoking the single training sample in the training dataset and the pre-labeled security tag thereof, including:

acquiring an application log file, and arranging event log records in the application log file in time sequence;

sliding framing is carried out on the application log file according to a sliding window corresponding to a preset time length of a preset step length application, and slice log data corresponding to a plurality of data frames are obtained, wherein adjacent data frames contain partially identical event log records due to frame movement;

acquiring security state labels provided by the network security identification interface corresponding to the source addresses;

and taking the slice characteristic vector of the source address as a training sample, and mapping and storing the safety state label of the source address and the training sample in the training data set.

According to another aspect of the present application, there is provided a network request abnormality detection apparatus including:

the incremental reading module is used for acquiring slice log data corresponding to preset duration, wherein the slice log data belongs to incremental data in an application log file and comprises event log records corresponding to network requests of a plurality of source addresses;

the feature extraction module is used for extracting features of log records in the slice log data to obtain slice feature vectors corresponding to each source address;

the anomaly detection module is used for inputting the slice feature vectors into an anomaly identification model and identifying the safety degree of the source address corresponding to each slice feature vector.

According to another aspect of the present application, there is provided a network request abnormality detection apparatus including a central processor and a memory, the central processor being configured to invoke execution of a computer program stored in the memory to perform the steps of the network request abnormality detection method described herein.

According to another aspect of the present application, there is provided a non-volatile readable storage medium storing in the form of computer readable instructions a computer program implemented according to the network request anomaly detection method, the computer program executing the steps comprised by the method when being invoked by a computer to run.

According to another aspect of the present application, there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the method as described in any of the embodiments of the present application.

Compared with the prior art, the method and the device have the advantages that the slice log data of the increment in the preset duration range are obtained from the application log file, the event log record in the application log file is subjected to feature processing, the slice feature vector corresponding to each source address is obtained, each slice feature vector is characterized in that the same source address is in the same time period, sufficient context information is provided, therefore, semantic recognition is carried out on the slice feature vector by means of an anomaly recognition model, the safety degree of the source address to which the slice feature vector belongs is determined, the determined safety degree is systematically determined by taking the context information into consideration with the help of the semantics provided by the context information, the relative safety control is carried out on the corresponding source address according to the obtained safety degree, and the safety of network services can be ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a network architecture schematic diagram of an exemplary application environment of the present application;

FIG. 2 is a flow chart of an embodiment of a network request anomaly detection method of the present application;

FIG. 3 is a time structure diagram of slice log data acquired by the present application;

FIG. 4 is a flow chart of constructing a slice feature vector corresponding to a source address in slice source data according to an embodiment of the present application;

FIG. 5 is a flow chart of implementing network security control using security state tags determined by an anomaly identification model in an embodiment of the present application;

FIG. 6 is a schematic flow chart of an iterative training anomaly identification model based on a self-learning mechanism in an embodiment of the present application;

FIG. 7 is a flow chart of training an anomaly detection model in an embodiment of the present application;

FIG. 8 is a flow chart of constructing a training data set of an anomaly detection model in an embodiment of the present application;

FIG. 9 is a functional block diagram of a network request anomaly detection device of the present application;

fig. 10 is a schematic structural diagram of a network request abnormality detection apparatus used in the present application.

Detailed Description

Referring to fig. 1, an exemplary network architecture of the present application is suitable for an e-commerce platform scenario, and includes a terminal device 80, a security server 81, and a service server 82.

The security server 81 may be used as a main execution body of the network request abnormality detection method of the present application, execute a computer program product programmed to implement according to the method, and execute each step of the method by executing the computer program product, thereby providing an abnormality detection service, and achieve the technical purpose of the present application.

The service server 82 may be configured to deploy various services of the internet platform, such as a front-end interface service or a database service, and is configured to receive and respond to various network requests, and complete processing tasks corresponding to the relevant network requests. For example, the business server 82 may be used to deploy an online store and its commodity database of an independent station in an e-commerce platform, and various network requests to access the online store may be responded to by the corresponding application services provided by the business server. Each application service may correspondingly maintain an application log file into which event log records of network requests it processes are stored.

The terminal device 80 may be configured to trigger the network request to use various application services in the business server 82, such as to browse the online store where the order operation of merchandise items is performed, etc.

The security server 81 may read application log files generated by various application services in the service server 82, identify the security degree of each source address corresponding to each network request on the basis of the application log files, and then feed back the security degree to the service server 82, and for the network request initiated by the terminal device 80, the service server 82 may determine whether to respond to the corresponding network request according to the security degree of each source address obtained after the security server 81 processes the corresponding application log file, and determine whether to respond to the corresponding network request according to the source address provided by the network request of the terminal device 80, so as to effectively manage and control the relevant network request according to the security identification result.

Of course, in an alternative network architecture, the security server 81 may be disposed in the same server as the service server 82, or further, other gateway servers may be introduced, so as to provide richer collaboration information for implementation of the present application. And so forth, can be flexibly implemented by those skilled in the art.

Referring to fig. 2, a method for detecting network request abnormality according to the present application includes the following steps in one embodiment:

Step S1100, obtaining slice log data corresponding to preset duration, wherein the slice log data belongs to incremental data in an application log file and comprises event log records corresponding to network requests of a plurality of source addresses;

in consideration of the complexity of network attack implementation of an attacker, the attacker usually cooperates and implements the attack through a plurality of network requests in a specific time period, according to the characteristic, a time period, namely a preset time length, is set for log data to be identified, then all event log records within the time length range are acquired according to the preset time length, slice log data corresponding to the preset time length are formed, so that the safety of each source address is identified on the basis of the slice log data, and the fact that each source address has enough event log records can provide enough context information is ensured, so that the safety of each source address is identified more accurately. The source address is typically denoted as a network address, i.e. an IP address.

The slice log data may be read from application log files of the associated application service, and the secure identification of the same application log file is continued by reading slice log data from the application log files at intervals of a particular interval duration. Thus, each time the slice log data is read, the slice log data comprises incremental data formed by the event log records of the increment of the application log file.

In one embodiment, the interval duration may be equal to the preset duration for determining the span of slice log data, and the preset duration is set to be 15 minutes, which is effectively equivalent to acquiring all event log records of the last 15 minutes occurring every 15 minutes to form corresponding slice log data. It can be seen that each acquired slice log data is the latest incremental data in the application log file.

In another embodiment, the interval duration may be less than the preset duration for determining the span of slice log data, as shown in fig. 3, the preset duration may be 15 minutes, and the interval duration may be 5 minutes, so that a part of overlapped event log records is included between slice log data read before and after, and another part of event log records forming incremental data with respect to the previous slice log data is included in the subsequent slice log data. Thus, each time slice log data is aimed at, the correlation of the front event log record and the rear event log record is close in time sequence, and the adequate context can be formed by covering moderate increment data.

Because of the characteristics of mass users served by the platform, in slice log data in a time period corresponding to the same preset time length, event log records generated by network requests corresponding to a large number of source addresses generally exist, and in the same source address, a plurality of network requests can be triggered in the time period to generate a plurality of event log records.

Step 1200, performing feature extraction on the event log records in the slice log data to obtain slice feature vectors corresponding to each source address;

the event log records in the slice log data are typically generated according to specifications of a corresponding network protocol, such as HTTP protocol, and according to the network request, corresponding text may be generated to represent various features and corresponding result values thereof, including but not limited to source address access-related, user agent-related, service host-related, access address-related, request method-related, request jump-related, request status-related, security parameter-related, etc., which have their specific roles although they are complex and thus can be clearly identified.

In order to extract the features of the event log records in the slice log data, feature processing can be performed on all event log records corresponding to the same source address according to the correspondence of the features, for example, for a certain feature, the number of times that the same result information appears in all event log records corresponding to the source address is counted, and the like, so as to determine the feature value of the feature, thereby obtaining the relation data between the feature and the feature value, and then, all the feature values corresponding to each source address are orderly combined according to a preset sequence, so that the corresponding slice feature vector can be constructed.

It is to be understood that, for the slice log data, the slice feature vector corresponding to each source address can be obtained after feature processing is performed on the event log record corresponding to each source address and the corresponding slice feature vector is constructed.

Step S1300, inputting the feature vectors of the slices into an anomaly identification model, and identifying the security degree of the source address corresponding to each feature vector of the slices.

The method and the device are used for correspondingly preparing an anomaly identification model which can be a machine learning model or a deep learning model, and training the anomaly identification model to a convergence state by adopting a sufficient quantity of training samples in advance, so that the safety degree corresponding to the slice feature vector can be predicted according to the given slice feature vector, wherein the safety degree can be expressed as a classification probability, and can be the classification probability which is determined by a classifier according to the slice feature vector or the deep semantic information operation of the slice feature vector and is mapped to the category corresponding to the safety label.

It will be understood that, on the basis of the security degree, the security degree may be converted into a corresponding security state label, the security degree, that is, the classification probability, is calculated by means of the classifier, and when the classification probability is the largest of all the classes corresponding to the classifier, the source address to which the slice feature vector belongs may be labeled with the security label, which indicates that the source address is a security address, otherwise, the classification probability, that is, the security degree, is insufficient to indicate that the security label is obtained, so that the source address may be labeled as an abnormal label. Both the security tag and the anomaly tag are member tags of the security state tag, and for a source address, the security state tag determined after the anomaly identification model identifies its slice feature vector is typically one of the security tag and the anomaly tag.

In an alternative embodiment, the classifier may be constructed by using multiple classifiers, so that the slice feature vector may be mapped to multiple classes to obtain classification probabilities corresponding to each class, in this case, each class may obtain a classification probability corresponding to a security level, and since the sum of classification probabilities of all classes of the classifier is 1, each class may be used to correspond to a different security level or security type, so that the security level of the corresponding slice feature vector may be represented by the security level or security type in turn. For example, the security level may be set in advance including: the high-risk level, the suspicious level and the safety level are all four levels, and after the classifier calculates the classification probability corresponding to each level, the level with the largest classification probability is the slice feature vector, namely the safety level corresponding to the corresponding source address, and the corresponding safety level is represented. The same applies to the safety type to represent the safety degree, and the description is omitted.

Therefore, the abnormal recognition model is used for carrying out one-by-one safety recognition on the slice feature vectors of each source address in the slice log data, so as to determine the corresponding safety degree, and further, the information such as the safety state label, the safety level, the safety type and the like related to the safety degree can be determined. The security identification result information of each source address formed by the method can be provided for application service or other business links such as gateway service links and the like for generating the application log file to further control network security, for example, source addresses belonging to abnormal labels are blocked, and authority control processing of different measures is performed on source addresses with different security levels, so that a platform network is safer.

The anomaly recognition model adopted by the method can be a traditional machine learning model, such as a decision tree-based LightGBM classification model, and also can be a deep learning model realized based on a cyclic neural network successor classifier, and common characteristics are realized by modeling the corresponding relation between the segmentation feature vector and the safety degree of the segmentation feature vector. Therefore, any mathematical model which can realize the modeling relation so as to determine the corresponding safety degree according to the slice characteristic vector or the deep semantic information thereof by inference can be used as the anomaly identification model of the application.

According to the above embodiment, the present application obtains the incremental slice log data within the preset duration range from the application log file, performs feature processing on the event log record therein to obtain slice feature vectors corresponding to each source address, where each slice feature vector is a feature representation of the same source address within the same time period, and provides sufficient context information, so that semantic recognition is performed on the slice feature vector by means of an anomaly recognition model, and the security degree of the source address to which the slice feature vector belongs is determined, and with the help of the semantic provided by the context information, the determined security degree is determined by systematically considering the context information, so that the security control is performed on the corresponding source address, thereby ensuring the security of the network service.

On the basis of any embodiment of the present application, referring to fig. 4, feature extraction is performed on an event log record in the slice log data to obtain slice feature vectors corresponding to each source address, where the feature vector includes:

step S1210, dividing the slice log data into event log record sets with each source address by taking the source address of the event log record therein as a unit;

When the feature processing is performed on the single slice log data, the slice log data can be firstly subjected to the segmentation and merging processing, specifically, all source addresses in the slice log data are corresponding to each source address, all event log records containing the source address are collected into the same event log record set, and therefore the event log record set corresponding to each source address is obtained.

Step S1220, based on the same characteristics of the event log records, carrying out characteristic induction on each event log record set to obtain characteristic values of each characteristic in each source address;

on the basis of the set of event log records corresponding to each source address, each feature required for feature processing is determined according to the specification of a corresponding network protocol such as the HTTP protocol, and then, in the set, the feature values of each feature are summarized for each event log record therein. As mentioned above, there may be a plurality of features in the network protocol, and for anomaly detection, part of the features may be manually determined, and feature values of the features may be extracted to construct corresponding slice feature vectors.

Typical features include, but are not limited to, the following: a source address access feature, a user agent feature, a service host feature, an access address feature, a request method feature, a request jump feature, a request status feature, and a security parameter feature. From which a plurality of objects can be selected for use as feature processing.

The source access feature is denoted as F _ip Refers to various specific features related to the access behavior of the source address in the network request, such as: the number of accesses of the source address in the slice log data, whether the source address is in a blacklist, whether the source address is a proxy address, etc., one or more of which can be determined as neededSpecific features.

The user agent feature, denoted as F _ua Refers to various specific features corresponding to proxy behavior on the terminal device used by the user in the network request, such as: the number of types of different user agents, whether the user agents appear as null values, the number of occurrences of different operating system types, the number of occurrences of different CPU types, the number of occurrences of different browser languages, etc., can determine one or more specific characteristics thereof as required.

The service host feature is denoted as F _host Refers to various specific features corresponding to the HOST (HOST) sending the network request, such as: one or more specific characteristics of the number of types of different HOSTs, the maximum value of the number of accesses of different HOSTs, the minimum value of the number of accesses of different HOSTs, the average value of the number of accesses of different HOSTs, the median number of accesses of different HOSTs, the variance of the number of accesses of different HOSTs, the standard deviation of the number of accesses of different HOSTs and the like can be determined according to requirements.

The access address feature is denoted as F _uri Refers to various specific features corresponding to the address (URI) accessed by the network request, such as: the number of types of different URIs, the maximum value of the access times of different URIs, the minimum value of the access times of different URIs, the average value of the access times of different URIs, the median of the access times of different URIs, the variance of the access times of different URIs, the standard deviation of the access times of different URIs and the like, and one or more specific characteristics can be determined according to requirements.

The request method feature, denoted as F _method Refers to various features corresponding to the request method adopted by the network request, for example: the number of Post request occurrences, the number of Get request occurrences, the number of Head request occurrences, the number of other request occurrences, etc., one or more specific features of which may be determined as desired.

The request jump feature, denoted F _referer Refers to various specific features corresponding to network requests through jump, such as: the number of times from a common website host (e.g., baidu, google, etc.), the number of times from a current business host, etc. may beOne or more specific features thereof are determined as needed.

The request status feature, denoted as F _status Refers to various specific features corresponding to the request Status (Status) obtained by the network request, such as: the number of types of different Status, the number of occurrences of different Status, etc., one or more specific characteristics thereof may be determined as desired.

The safety parameter feature is denoted as F _param The method is characterized in that parameters of Param in a network request are automatically abstracted and determined according to attack principles of part of common attack types.

For example, according to the principles of SQL injection attacks, the following specific features may be included: the parameters comprise the number of SQL adding, deleting and modifying keywords, the parameters comprise the number of different types of SQL adding, deleting and modifying keywords, the parameters comprise the number of SQL additional operation keywords, the parameters comprise the number of different types of SQL additional operation keywords, the parameters comprise the number of SQL aggregation function keywords, the parameters comprise the number of different types of SQL aggregation function keywords, the parameters comprise the number of SQL character string function keywords, the parameters comprise the number of different types of SQL character string function keywords, the parameters comprise the number of SQL other common function keywords, the parameters comprise the number of different types of SQL other common function keywords, and the like, so that one or more specific characteristics can be determined according to requirements.

As another example, according to other attack means, such as XSS attack, command injection attack, etc., the following specific features may be included: the specific characteristics of one or more of the above specific characteristics can be determined according to the need, such as the number of non-http special characters, the number of types of non-http special characters, whether the URI is a dynamic webpage (ASP/PHP/JSP), the number of HTML keywords contained in the parameter, the number of types of different HTML keywords contained in the parameter, the number of special characters involved in the escape of the HTML contained in the parameter, the number of types of special characters involved in the escape of the different HTML contained in the parameter, the number of types of different special characters contained in the parameter, and the like.

Here, it can be seen that, when constructing the slice feature vector of the source address, embedding the feature value of the feature determined according to the attack principle of the specific attack type therein can provide more specific and sufficient reference semantic information for the security of identifying the source address by the anomaly identification model, so as to obtain more accurate identification results.

In the process of feature processing, after various features in the example are determined in advance, according to the determined features, corresponding statistics and generalization are carried out on the corresponding relation between event log records and the features in the event log record set of each source address, so that feature values corresponding to the features can be determined. For each source address in the slice log data, a corresponding set of characteristic values can be determined on the basis of its corresponding set of event log records.

Step S1230, the slice feature vector of the corresponding source address is constructed according to the feature value of each source address.

After determining the feature value set corresponding to each source address, the feature values corresponding to each source address may be orderly arranged according to a preset ordering relationship to form a corresponding slice feature vector for each source address, so as to implement encoding of the whole event log record set of the source address, thereby obtaining the corresponding slice feature vector for each source address in the slice log data, as exemplified by the following:

{F _ip ，F _ua ，F _host ，F _uri ，F _method ，F _referer ，F _status ，F _param })

According to the embodiment, the feature deep processing is performed on the event log records of each source address in the slice log data to obtain the slice feature vector corresponding to each source address, so that the slice feature vector can effectively represent behavior features of multiple aspects of the corresponding source address, the behavior features provide significant information of a series of access behaviors of the source address within a preset duration range, and association relations among the significant information are provided, so that semantics contained in the slice feature vector are enriched, an anomaly detection model can accurately and systematically determine the safety degree of the corresponding source address according to the semantic meaning, anomaly identification of a network request is more efficient and accurate, reliable criteria can be provided for network security monitoring of an internet platform, and the safety degree of the platform is improved.

On the basis of any embodiment of the present application, referring to fig. 5, after inputting the feature vectors of each slice into an anomaly identification model and identifying the security degree of the source address corresponding to each feature vector of each slice, the method includes:

step S2100, determining corresponding security state labels according to the security degree of each source address, wherein optional members of the security state labels comprise security labels and exception labels;

The anomaly identification model can be expressed as a classification probability of a forward class in a classification space corresponding to the two classifiers based on the corresponding safety degree determined by the slice feature vectors of the source addresses, and the forward class is supervised and trained by adopting corresponding positive sample labels. Accordingly, when the classification probability is the largest in all the classes, the corresponding source address can be determined to be the security tag in the optional member of the security state tag, otherwise, the corresponding source address can be determined to be the abnormal tag in the optional member of the security state tag, and therefore labeling of each source address is achieved.

In one embodiment, to enhance the reliability of the security tag, a verification threshold may be further set, and the classification probability characterizing the security level is compared with the verification threshold, and when the classification probability is higher than the verification threshold, the corresponding source address is marked as the security tag, otherwise, the source address is marked as the abnormal tag.

Step S2200, submitting the source address marked as the abnormal label to a network security identification interface to request confirmation;

in general, in the internet platform, there are also a plurality of security services for identifying security of network requests, and these security services provide a corresponding network security identification interface to facilitate identifying or confirming security of respective source addresses through the interface. These security services may be, for example, services deployed in a gateway for source address security detection of related network requests. Accordingly, the embodiment further submits the source address marked as the abnormal label to one or more network security identification interfaces for security confirmation, and when all network security identification interfaces confirm that the source address is the abnormal label, the source address is ensured to be the abnormal label.

Step S2300, adding the confirmed source address to the blacklist, rejecting the subsequent network request from the source address in the blacklist.

For source addresses that have been confirmed by the respective network security identification interface as exception tags, they may be added to a blacklist for storing all source addresses for which access is denied. Accordingly, various services, including the application service generating the application log file, or the gateway service, or other related services, can access the blacklist before responding to the network request, and when the source address of the network request belongs to a member of the blacklist, the network request can be refused to access, and all network requests initiated by the source address are forbidden, so that the network security is ensured.

According to the embodiment, the security degree corresponding to the slice feature vector of the source address is converted into the corresponding security state label, then the source address belonging to the abnormal label is requested to be confirmed from one or more network security identification interfaces, multi-party verification is achieved, the source address is added into the blacklist to reject access after being confirmed, effective utilization of the detection result of the abnormal identification model is achieved, normal operation of various services can be comprehensively and effectively ensured, the services are prevented from suffering unnecessary network attacks, and platform security is ensured.

On the basis of any embodiment of the present application, referring to fig. 6, after inputting the feature vectors of each slice into an anomaly identification model and identifying the security degree of the source address corresponding to each feature vector of each slice, the method includes:

step S3100, determining corresponding safety state labels according to the safety degree of each source address, forming a training sample by the slice feature vector of each source address, and mapping and storing the training sample and the safety state labels in a training data set of the anomaly identification model;

after determining the corresponding security state labels according to the security degree of each source address, mapping relation data between each source address and the corresponding security state label is actually obtained, so that the slice feature vector of the source address can be used as a training sample, the security state label can be used as a supervision label corresponding to the training sample, the mapping relation data is stored in a training data set for training an anomaly identification model of the application, the training data set is automatically expanded, and the total sample amount of the training data set is enriched.

Step S3200, responding to a timing arrival event of a timing task, restarting training of the abnormal recognition model by adopting the training data set, and retraining the abnormal recognition model to convergence;

And realizing a self-learning mechanism for the anomaly identification model, and performing iterative training on the anomaly identification model through a timing task under the condition that a training data set of the anomaly identification model is continuously expanded. The timing task may be set to be daily, every several days or every half month, etc., a timing arrival event is triggered at intervals, and in response to the timing arrival event, training of the anomaly identification model is restarted using the training dataset, and the anomaly identification model is retrained to a converged state with the aid of sample data in the training dataset, thereby obtaining a new version of anomaly identification model.

And S3300, putting the retrained abnormal recognition model on line again instead of the original abnormal recognition model.

After obtaining the new version of the anomaly identification model, the new version of the anomaly identification model can be used for replacing the running original anomaly identification model and can be regarded as the old version of the anomaly identification model, specifically, the old version of the anomaly identification model can be stopped to run, and then the new version of the anomaly identification model is restarted to provide services. If the requirement of smooth transition is considered, two versions of the abnormal recognition model can be started simultaneously in a time slot range, and the old version of the abnormal recognition model is terminated after the new version of the abnormal recognition model is completely opened for service, so that the method can be flexibly implemented by a person skilled in the art.

According to the above embodiment, the self-learning mechanism of the anomaly identification model is constructed, the security state labels corresponding to the slice feature vectors of all source addresses in the incremental log data are continuously predicted through the anomaly identification model of the old version, then the slice feature vectors and the security state labels are constructed into corresponding sample data for expanding the training data set required by the training of the anomaly identification model, the training of the anomaly identification model is started at regular time by utilizing the expanded training data set, the anomaly identification model is trained again until the anomaly identification model is converged, and then the anomaly identification model of the old version is replaced, so that iteration is continuously circulated, the inference capability of the anomaly identification model in service is stronger, the posterior knowledge is determined by continuously using the more and more accurate priori knowledge, the security identification accuracy of the source address is improved, and long-term benefit can be obtained without additional cost investment.

On the basis of any embodiment of the present application, referring to fig. 7, before the slice log data corresponding to the preset duration is obtained, or when training needs to be performed on the anomaly identification model at any time, the training process of this embodiment may be implemented, where the training process includes:

S4100, calling a single training sample in a training data set and a pre-marked security state label thereof, wherein the training sample comprises a slice feature vector of a single source address obtained by feature extraction from slice log data of preset duration, and optional members of the security state label comprise a security label and an abnormal label;

as described above, the training data set is prepared for training the anomaly identification model, and includes a plurality of sample data, each sample data includes a training sample expressed by a slice feature vector, and a security state label corresponding to the security degree of the slice feature vector is marked, the training sample can be used as an input of the anomaly identification model, and the security state label can be used as a supervision label for calculating a loss value of a predicted result of the anomaly identification model.

When sample data of the training data set is initially prepared, similarly, a preset time length for intercepting slice log data set in an on-line reasoning stage of an anomaly identification model can be set, corresponding slice log data are obtained to perform feature extraction to generate slice feature vectors corresponding to a plurality of source addresses, and the feature extraction mode is as described above and is not repeated herein.

For convenience of understanding, in the description of the embodiment, the anomaly identification model of the application is simplified to obtain classification probabilities corresponding to two categories by performing classification mapping through a classifier, wherein one category corresponds to a supervision tag of a positive sample to perform supervision training to become a positive category, and the other category corresponds to a supervision tag of a negative sample to perform supervision training to become a negative category.

The security state label corresponds to a two-classifier, and the optional members of the two-classifier also only comprise a security label and an exception label, wherein the security label indicates that the corresponding source address belongs to a security address, and thus indicates that the corresponding training sample is a positive sample in effect, and the exception label indicates that the corresponding source address belongs to an unsafe address, and thus indicates that the corresponding training sample is a negative sample in effect.

The process of training the anomaly identification model is implemented by iteratively calling each training sample in the training data set as an input, so that only one training sample is called as an input during each iteration, and the corresponding security state label is also called for supervising the prediction result of the anomaly identification model.

Step S4200, inputting the training sample into the abnormal recognition model for training, and obtaining a safety state label predicted by the abnormal recognition model;

after the training sample is input into the abnormal recognition model, the corresponding deep semantic information is deduced by the training sample according to a preset mathematical modeling operation process, and then the training sample is mapped to two categories of a classification space through a linear layer in a classifier to obtain corresponding classification probability of the training sample, so as to form a prediction result. Since the highest classification probability among the prediction results determines the corresponding security state label, the abnormality recognition model actually obtains the security state label predicted from the slice feature vector.

Step S4300, calculating a classification loss value of the predicted security state label by adopting the security state label pre-marked by the training sample, carrying out gradient update on the abnormal recognition model according to the classification loss value, and iterating the above process until the abnormal recognition model is judged to reach a convergence state according to the classification loss value.

In order to monitor the weight of the anomaly identification model and correct the weight of the anomaly identification model, the safety state label predicted by the anomaly identification model based on the training sample can be calculated by using the safety state label marked by the training sample, in the embodiment, the classification loss value between the two safety state labels is calculated by using a cross entropy loss function, then the classification loss value is compared with a target threshold value for judging whether the anomaly identification model is converged or not, when the classification loss value reaches the target threshold value, the model is proved to be converged, the training can be terminated, otherwise, the model is not converged, the gradient update of the anomaly identification model can be implemented according to the classification loss value, the weight parameters of each link of the anomaly identification model are corrected by back propagation, the convergence is further approximated, then the next iteration is continued from step S4100, and the rest is continued until the anomaly identification model is trained to be converged.

According to the above embodiment, according to the anomaly identification model of the present application, under the training of the corresponding training data set, according to the semantics and the context information contained in the rich features provided by the slice feature vector in the training sample, the corresponding inference capability can be obtained, so that the safety degree of the corresponding source address can be predicted according to the slice feature vector, and the safety identification capability of the anomaly identification model is stronger and more accurate with the help of the highly condensed slice feature vector.

On the basis of any embodiment of the present application, referring to fig. 8, before invoking a single training sample in the training data set and its pre-labeled security tag, it includes:

step S5100, acquiring an application log file, and arranging event log records in the application log file according to time sequence;

in order to prepare the training data set required for training the anomaly identification model, the corresponding sample data may be prepared on the basis of application log files generated by various application services of the internet platform. The anomaly identification model can be specially trained for a particular application service, thereby preparing a corresponding training data set using an application log file generated by the corresponding application service. In other embodiments, training may also be performed for a plurality of application services, such that a corresponding training data set is prepared using a plurality of generated application log files corresponding to the plurality of application services. For ease of understanding, the present embodiment will be described below by taking the processing of a single application log file as an example.

After an application log file generated by an application service is obtained, event log records in the withdrawn event log file are ordered according to time, so that computer operation is facilitated.

Step S5200, carrying out sliding framing on the application log file according to a sliding window corresponding to a preset time length of application of a preset step length to obtain slice log data corresponding to a plurality of data frames, wherein adjacent data frames contain partially identical event log records due to frame movement;

the application log file generally includes a large number of event log records corresponding to network requests for a long time, and the time span is relatively large, and includes a large number of network requests corresponding to source addresses. In view of the feature validity, the requirement of the on-line reasoning stage of the anomaly identification model can be met by adopting the method for determining the preset duration corresponding to the slice log data to segment the application log file, so as to realize framing operation, and obtain data frames corresponding to each preset duration, wherein each data frame is actually corresponding to the slice log data.

In order to enrich the correlation semantics between the sample data and the data before and after capturing, when framing the application log file, a frame shift may be superimposed, so that two data frames adjacent in time sequence, that is, two slice log data, contain a part of the same event log records. For this purpose, a sliding window may be set corresponding to the preset duration, so that the sliding window may cover the event log record of the time span corresponding to the preset duration, and then, the sliding of the sliding window is advanced by a preset step length, and the preset step length is made smaller than the preset duration, so that after the application log file is divided, a part of the same event log data is included between slice log data corresponding to the obtained adjacent data frames. Therefore, it can be seen that the preset step length is actually a fixed interval duration, and the event log records in the application log file are sampled once according to the interval duration, and the time range covered by each sampling is equal to the preset duration covered by the sliding window, so that all event log records in the preset duration range are obtained, and slice log data are formed.

As shown in fig. 3, the preset duration may be set to 15 minutes, that is, each sliding window covers an event log record with 15 minutes duration, and the preset step length may be set to every 5 minutes, so that slice log data corresponding to 15 minutes is collected every 5 minutes, and so on, to obtain each slice log data.

It is understood from the above principle that the process of dividing the application log file to obtain slice log data is actually a convolution operation process, and the division of the application log file based on the convolution operation principle is implemented more efficiently through computer programming.

Step S5300, extracting features of event log records in the slice log data to obtain slice feature vectors corresponding to each source address;

the process of feature extraction for the event log records in each slice log data in the stage of preparing the training sample of the training data set is corresponding to the process of feature extraction for the slice log data formed by the incremental data when the anomaly identification model is used in the online reasoning stage, so the description of this step can refer to the description of each embodiment of feature extraction in the foregoing, and will not be repeated.

In summary, it will be appreciated that for each source address in each slice log data of the application log file, the corresponding slice feature vector is obtained by the feature extraction operation, and the slice feature vector corresponding to the source address can be used as a training sample in the training data set.

Step S5400, obtaining security status labels provided by the network security identification interface corresponding to the source addresses;

in order to construct acceptable sample data, it is necessary to obtain for each training sample its corresponding supervision tag, in particular its corresponding security status tag, for indicating whether the corresponding source address belongs to a security address.

In one embodiment, the security state label corresponding to each training sample, i.e., each slice feature vector, determined from the application log file may be determined by manual labeling, and as described above, the optional member label of the security state label may be a security label or an anomaly label.

In another embodiment, the recognition result of each source address by the network security recognition interface described in the application may be used as a supervision tag of the slice feature vector corresponding to each source address. Specifically, for the source address to which each slice feature vector belongs, the network security identification interface requests to identify whether the slice feature vector belongs to a security tag or an abnormal tag, the network security identification interface returns a corresponding result, and then the corresponding slice feature vector is marked with the result.

In an alternative embodiment, in order to enhance the reliability of the security state label acquired through the network security identification interface, the security state label corresponding to the same source address may be acquired through a plurality of the network security identification interfaces, and when all returned security state labels are consistent, the slice feature vector of the corresponding source address is marked as the security state label. The slice feature vectors that fail to achieve consistent results can be supplemented by manual labeling.

Step S5500, using the slice feature vector of the source address as a training sample, and mapping the security state tag of the source address and the training sample to store in the training data set.

Through the above process, based on massive slice feature vectors generated by each slice log data of the application log file, corresponding security state labels of the slice feature vectors are obtained, so that the slice feature vectors can be used as training samples, the security state labels of source addresses to which the slice feature vectors belong are used as supervision labels, the training samples and the supervision labels construct corresponding mapping relation data and are stored into a training data set of the anomaly identification model, the training data set can be used for training the anomaly identification model, and the total sample amount can be expanded subsequently according to a self-learning mechanism of the anomaly identification model.

According to the above embodiment, the present application may perform framing processing on an application log file generated by an application service in an internet platform by using an appropriate processing means to obtain a large amount of slice log data, then perform feature extraction on event log records of each source address on the basis of the slice log data to determine corresponding slice feature vectors of the event log records, and use the event log records as training samples, and then obtain security state labels corresponding to the training samples as supervision labels through a network security identification interface, so that the manual labeling cost can be reduced to the greatest extent, a sufficient number of training data sets are obtained for training an anomaly identification model, so that the anomaly identification model can be prepared at low cost, and in addition, excellent security identification capability can be obtained according to abundant sample features in the training data sets, and the security state label is served for the internet platform, thereby obtaining economic scale utility.

Referring to fig. 9, a network request abnormality detection apparatus provided according to an aspect of the present application includes an increment reading module 1100, a feature extraction module 1200, and an abnormality detection module 1300, where the increment reading module 1100 is configured to obtain slice log data corresponding to a preset duration, where the slice log data belongs to increment data in an application log file, and includes event log records corresponding to network requests with multiple source addresses; the feature extraction module 1200 is configured to perform feature extraction on the log records in the slice log data, so as to obtain slice feature vectors corresponding to each source address; the anomaly detection module 1300 is configured to input the feature vectors of the respective slices into an anomaly identification model, and identify the security level of the source address corresponding to each of the feature vectors of the slices.

On the basis of any embodiment of the present application, the feature extraction module 1200 includes: an address diversity unit configured to segment, for the slice log data, an event log record set of each source address in units of source addresses of event log records therein; the feature extraction unit is used for carrying out feature induction on each event log record set based on the same features of the event log records to obtain feature values of each feature in each source address; and a vector construction unit configured to construct a slice feature vector of the respective source address based on the feature values of the respective source addresses.

The same feature includes any of a source address access feature, a user agent feature, a service host feature, an access address feature, a request method feature, a request jump feature, a request status feature, a security parameter feature, on the basis of any embodiment of the present application.

On the basis of any embodiment of the present application, a network request abnormality detection device of the present application includes: the result labeling module is used for determining corresponding security state labels according to the security degree of each source address, and optional members of the security state labels comprise security labels and abnormal labels; the label confirming module is used for submitting the source address marked as the abnormal label to the network security identification interface to request confirmation; the security control module is configured to add the source address confirmed to the blacklist and reject subsequent network requests from the source address in the blacklist.

On the basis of any embodiment of the present application, a network request abnormality detection device of the present application includes: the sample conversion module is used for determining corresponding safety state labels according to the safety degree of each source address, forming a training sample by the slice feature vectors of each source address, and mapping and storing the training sample and the safety state labels in a training data set of the anomaly identification model; the restarting training module is set to respond to a timing arrival event of a timing task, and the training data set is adopted to restart the training of the abnormal recognition model, so that the abnormal recognition model is retrained to be converged; and the re-online module is used for re-online the retrained abnormal recognition model instead of the original abnormal recognition model.

On the basis of any embodiment of the present application, a network request abnormality detection device of the present application includes: the sample calling module is configured to call a single training sample in a training data set and a security state label of the single training sample, wherein the training sample comprises a slice feature vector of a single source address obtained by feature extraction from slice log data with preset duration, and optional members of the security state label comprise a security label and an abnormal label; the training reasoning module is used for inputting the training sample into the abnormal recognition model to train so as to obtain a safety state label predicted by the abnormal recognition model; the gradient updating module is used for calculating a classification loss value of the predicted safety state label by adopting the safety state label pre-marked by the training sample, carrying out gradient updating on the abnormal recognition model according to the classification loss value, and iterating the above processes until the abnormal recognition model is judged to reach a convergence state according to the classification loss value.

On the basis of any embodiment of the present application, a network request abnormality detection device of the present application includes: the log combing module is used for acquiring an application log file and arranging event log records in the application log file according to time sequence; the framing processing module is used for carrying out sliding framing on the application log file according to a sliding window corresponding to a preset time length of application of a preset step length to obtain slice log data corresponding to a plurality of data frames, wherein adjacent data frames contain partial identical event log records due to frame movement; the feature extraction module 1200 is configured to perform feature extraction on the event log record in the slice log data, so as to obtain slice feature vectors corresponding to each source address; the tag acquisition module is used for acquiring security state tags provided by the network security identification interface corresponding to the source addresses; and the sample storage module is used for taking the slice characteristic vector of the source address as a training sample and mapping and storing the safety state label of the source address and the training sample in the training data set.

Another embodiment of the present application further provides a network request anomaly detection device. As shown in fig. 10, the internal structure of the network request abnormality detection apparatus is schematically shown. The network request anomaly detection device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The network request abnormality detection device comprises a non-volatile readable storage medium readable by a computer and storing an operating system, a database and computer readable instructions, wherein the database can store an information sequence, and the computer readable instructions can enable a processor to realize a network request abnormality detection method when being executed by the processor.

The processor of the network request anomaly detection device is configured to provide computing and control capabilities to support the operation of the entire network request anomaly detection device. The memory of the network request anomaly detection device may store computer readable instructions that, when executed by the processor, may cause the processor to perform the network request anomaly detection method of the present application. The network requests a network interface of the abnormality detection device for connection communication with the terminal.

It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of a portion of the structure related to the present application and does not constitute a limitation of the network request abnormality detection apparatus to which the present application is applied, and that a specific network request abnormality detection apparatus may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

The processor in this embodiment is configured to perform specific functions of each module in fig. 9, and the memory stores program codes and various types of data required for executing the above-described modules or sub-modules. The network interface is used for realizing data transmission between the user terminals or the servers. The nonvolatile readable storage medium in this embodiment stores therein program codes and data necessary for executing all modules in the network request abnormality detection device of the present application, and the server can call the program codes and data of the server to execute the functions of all modules.

The present application also provides a non-transitory readable storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the network request anomaly detection method of any embodiment of the present application.

The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the method described in any of the embodiments of the present application.

It will be appreciated by those skilled in the art that implementing all or part of the above-described methods according to the embodiments of the present application may be accomplished by way of a computer program stored in a non-transitory readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, the present application acquires the context network request of each source address with the slice log data of the stepwise increment, obtains the source feature vector representing the relevant context information, determines the corresponding security degree of each source address through the anomaly identification model with the help of the semantics provided by the context information, and is more accurate in system, so that the relevant security control is performed on the corresponding source address, and the security of the network service provided by the internet platform can be ensured.

Claims

1. A network request anomaly detection method, comprising:

2. The network request anomaly detection method of claim 1, wherein performing feature extraction on event log records in the slice log data to obtain slice feature vectors corresponding to respective source addresses comprises:

3. The network request anomaly detection method of claim 2, wherein the same characteristics include any of a source address access characteristic, a user agent characteristic, a service host characteristic, an access address characteristic, a request method characteristic, a request jump characteristic, a request status characteristic, a security parameter characteristic.

4. A network request abnormality detection method according to any one of claims 1 to 3, characterized in that, after inputting the respective slice feature vectors into an abnormality recognition model and recognizing the security degree of the source address corresponding to each slice feature vector, it includes:

5. A network request abnormality detection method according to any one of claims 1 to 3, characterized in that, after inputting the respective slice feature vectors into an abnormality recognition model and recognizing the security degree of the source address corresponding to each slice feature vector, it includes:

6. A network request abnormality detection method according to any one of claims 1 to 3, characterized by comprising, before acquiring slice log data corresponding to a preset time length:

7. The method of claim 6, wherein before invoking the single training sample in the training dataset and its pre-labeled security tag, comprising:

8. A network request abnormality detection apparatus, comprising:

9. A network request anomaly detection device comprising a central processor and a memory, characterized in that the central processor is adapted to invoke the execution of a computer program stored in the memory to perform the steps of the method according to any of claims 1 to 7.

10. A non-transitory readable storage medium, characterized in that it stores in form of computer readable instructions a computer program implemented according to the method of any one of claims 1 to 7, which when invoked by a computer, performs the steps comprised by the corresponding method.