WO2017124942A1 - Method and apparatus for abnormal access detection - Google Patents
Method and apparatus for abnormal access detection Download PDFInfo
- Publication number
- WO2017124942A1 WO2017124942A1 PCT/CN2017/070798 CN2017070798W WO2017124942A1 WO 2017124942 A1 WO2017124942 A1 WO 2017124942A1 CN 2017070798 W CN2017070798 W CN 2017070798W WO 2017124942 A1 WO2017124942 A1 WO 2017124942A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- access request
- abnormal
- sample
- sample access
- access
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
Definitions
- the present application relates to the field of Internet technologies, and in particular, to an abnormal access detection method.
- the application also relates to an abnormal access detecting device.
- Data mining is the process of extracting potentially, implicit, and valuable knowledge, patterns, or rules from large data sets.
- the patterns of mining from large-scale datasets can generally be divided into five categories: association rules, classification and prediction, clustering, evolution analysis, and outlier detection.
- the mining of abnormal point data includes two parts: abnormal point data detection and abnormal point data analysis.
- Outlier data is data that is inconsistent with the general behavior or model of the data. They are data that are distinctive in the data set. These data are not random deviations but are generated by completely different mechanisms.
- Abnormal point data mining has a wide range of applications, such as fraud detection, detection of unusual credit card usage or telecommunication services with outlier detection; forecasting market trends; analysis of abnormal behaviors such as customer churn in market analysis; or discovery in medical analysis Unusual response to a variety of treatments, etc.; through the study of these data, found abnormal behavior and patterns, to achieve abnormal data mining capabilities.
- the system determines whether to respond to a user request according to a user's request and a user's information record.
- some machine learning algorithms are introduced to learn.
- the commonly used methods include constructing Markov distance based on user attributes to mine users at outliers, and performing abnormal point discrimination based on the frequency of user submission requests.
- the process of discrimination is as follows:
- the Mahalanobis distance is then calculated from the covariance matrix, which is defined as follows:
- the present application provides an abnormal method detection method for improving detection efficiency and accuracy for abnormal access.
- the method includes the following steps:
- the method before acquiring the attribute data of the access request to be detected, the method further includes:
- the access frequency information includes a user identifier corresponding to the sample access request and an access time, and determining whether each of the sample access requests is abnormal according to the access frequency information of each sample access request, specifically:
- the original detection parameters are generated according to the following formula:
- w is the original detection parameter
- w is the minimum value corresponding to the summation item
- N is the number of the sample access requests
- each sample access request is The value of the label.
- the abnormal threshold is specifically generated by:
- an abnormal access detecting device which is characterized in that it comprises:
- the first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
- a determining module determining whether the abnormal probability is greater than a preset abnormal threshold
- the determining module confirms that the access request is a normal access request.
- An allocation module that assigns different values to the normal sample access request and the abnormal sample access request
- the second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data
- a third generation module configured to generate the detection parameter according to the original detection parameter.
- the access frequency information includes a user identifier ID and an access time corresponding to the sample access request
- the determining module is specifically configured to:
- the original detection parameters are generated according to the following formula:
- argmin w is the value function of the original detection parameter
- w is the original detection parameter
- w is the minimum value corresponding to the summation item
- N is the number of sample access requests
- V i is each The value of the label of the sample access request.
- the abnormal threshold is specifically generated by:
- the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, because the detection parameter is according to the label corresponding to each sample access request.
- the value of the attribute and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request can be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
- FIG. 1 is a schematic diagram of application of anomaly detection in a service response in the prior art
- FIG. 2 is a schematic flowchart of an abnormal access detection method according to the present application.
- FIG. 3 is a flowchart of abnormal point detection based on time series feature extraction in a specific embodiment of the present application
- FIG. 4 is a schematic diagram of feature extraction of time series data in a specific embodiment of the present application.
- FIG. 5 is a schematic diagram of a threshold calculation process in a specific embodiment of the present application.
- FIG. 6 is a schematic structural diagram of an abnormal access detecting apparatus according to the present application.
- the present application proposes an abnormal point detection method, which combines user statistics and time-series access data, gives a preliminary label by time series data according to rules, and adopts a logistic regression method for preliminary labels and users.
- the attributes are trained to produce the final result, so that the result of the abnormal point determination is further improved.
- a schematic flowchart of an abnormal point detecting method proposed by the present application includes the following steps:
- the model and the detection parameter are generated, in the process of predicting each new access request, that is, in determining whether the access request is abnormal, only the attribute of the access request is determined, and the abnormality detection is performed.
- the problem is transformed into a classification problem. For the classification problem, only the attribute data of the access request to be detected is obtained to obtain all the attribute vectors, that is, the time series data of the new access request does not need to be acquired in this step.
- the calculation result is that the parameter w is the original detection parameter.
- all the new access requests can be calculated by using the original detection parameter w, and the calculation result and the abnormal threshold are judged, thereby realizing whether the new access request is abnormal.
- the abnormality probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is based on the value of the label corresponding to each sample access request and The attribute data is generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, it can be confirmed based on the size of the two whether the access request is an abnormal access request. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
- the present application can be implemented by hardware, or by software plus a necessary general hardware platform.
- the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Debugging And Monitoring (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present application discloses a method for abnormal access detection, comprising: acquiring, on the basis of the extraction of a time-series data feature corresponding to each sample access request, the value of a corresponding tag, and then generating, according to the value of the tag corresponding to each sample access request and attribute data, a detection parameter, thus after acquiring the attribute data of the access request to be detected, generating, according to the attribute data and the detection parameter, an abnormity probability corresponding to the access request, and after determining whether the abnormity probability is greater than a preset abnormity threshold, determining, according to the comparison result, whether the access request is an abnormal access request. Therefore, an abnormal access request can be identified among a huge number of access requests and processed, ensuring the stability and security of the network.
Description
本申请涉及互联网技术领域,特别涉及一种异常访问检测方法。本申请同时还涉及一种异常访问检测设备。The present application relates to the field of Internet technologies, and in particular, to an abnormal access detection method. The application also relates to an abnormal access detecting device.
数据挖掘是从大规模的数据集中提取潜在的、隐含的、有价值的知识、模式或规则的过程。从大规模的数据集中挖掘的模式一般可以分为五类:关联规则、分类和预测、聚类、演变分析以及异常点检测等。异常点数据的挖掘包括异常点数据检测和异常点数据分析两个部分。异常点数据是与数据的一般行为或模型不一致的数据,它们是数据集中与众不同的数据,这些数据并非随机偏差,而是产生于完全不同的机制。异常点数据挖掘有着广泛的应用,如欺诈检测,用异常点检测来探测不寻常的信用卡使用或者电信服务;预测市场动向;在市场分析中分析客户的流失等异常行为;或者在医疗分析中发现对多种治疗方式的不寻常的反应等等;通过对这些数据进行研究,发现不正常的行为和模式,实现异常数据挖掘功能。Data mining is the process of extracting potentially, implicit, and valuable knowledge, patterns, or rules from large data sets. The patterns of mining from large-scale datasets can generally be divided into five categories: association rules, classification and prediction, clustering, evolution analysis, and outlier detection. The mining of abnormal point data includes two parts: abnormal point data detection and abnormal point data analysis. Outlier data is data that is inconsistent with the general behavior or model of the data. They are data that are distinctive in the data set. These data are not random deviations but are generated by completely different mechanisms. Abnormal point data mining has a wide range of applications, such as fraud detection, detection of unusual credit card usage or telecommunication services with outlier detection; forecasting market trends; analysis of abnormal behaviors such as customer churn in market analysis; or discovery in medical analysis Unusual response to a variety of treatments, etc.; through the study of these data, found abnormal behavior and patterns, to achieve abnormal data mining capabilities.
如图1所示,为现有的异常点监测技术手段解决服务响应问题的示意图,异常点监测技术手段目前有着广泛的应用。在该问题中,多个用户会向服务器提交相应的服务申请,在这些申请中,有的申请是正常申请,有的申请是异常申请。如果服务器接受了异常申请,那么将会严重影响服务器工作,也会对其他正常的申请造成一定的影响。As shown in FIG. 1 , a schematic diagram of the existing abnormal point monitoring technology to solve the service response problem, the abnormal point monitoring technology has a wide application. In this question, multiple users submit corresponding service requests to the server. Among these applications, some applications are normal applications, and some applications are abnormal applications. If the server accepts the exception request, it will seriously affect the server work, and will also have some impact on other normal applications.
为解决上述技术问题,现有技术中使系统根据用户的请求以及用户的信息记录决定是否响应用户请求。在判定过程中,会引入一些机器学习的算法进行学习,现在常用的方法包括根据用户属性构造马氏距离挖掘处于离群点的用户、以及根据用户提交请求的频率进行异常点判别等方法,具体判别过程如下:
In order to solve the above technical problem, in the prior art, the system determines whether to respond to a user request according to a user's request and a user's information record. In the process of judging, some machine learning algorithms are introduced to learn. The commonly used methods include constructing Markov distance based on user attributes to mine users at outliers, and performing abnormal point discrimination based on the frequency of user submission requests. The process of discrimination is as follows:
(1)在根据马氏距离进行异常点判别的过程中,首先计算用户属性间的协方差矩阵,其定义如下:(1) In the process of discriminating outliers based on Mahalanobis distance, the covariance matrix between user attributes is first calculated, which is defined as follows:
∑=E{(X-E[X])(X-E[X])T}∑=E{(XE[X])(XE[X]) T }
随后根据该协方差矩阵计算马氏距离,其定义如下:The Mahalanobis distance is then calculated from the covariance matrix, which is defined as follows:
Ma=(X-μ)T∑-1(X-μ)M a =(X-μ) T ∑ -1 (X-μ)
最后根据该距离的大小进行判别,一些距离过大的点将被判定为离群点。Finally, according to the size of the distance, some points with too large distance will be judged as outliers.
(2)在根据用户提交请求的频率进行异常点判别的方法中,用户单位时间提交请求的次数超过一定阈值之后,将会直接被判定为异常点。(2) In the method of discriminating the abnormal point according to the frequency at which the user submits the request, after the number of times the user unit submits the request exceeds a certain threshold, it will be directly determined as an abnormal point.
因此如何利用已有的访问数据和用户信息,更加准确地鉴别出异常请求,并采取相应措施,切实关系到服务资源分配的稳定性和经济性,是服务响应策略中的一个非常重要的问题。Therefore, how to use the existing access data and user information to more accurately identify the abnormal request and take corresponding measures, which is related to the stability and economy of service resource allocation, is a very important issue in the service response strategy.
然而,发明人在实现本申请的过程中发现,现有带时序数据的异常点检测算法或者只利用了访问用户本身的特征数据,进行聚类,只能反映访问用户属性上的特征;或者只利用了访问的时序数据,手动设置阈值来发现一些异常点(即确认当前的访问为异常)。这两种方式都没有充分发挥数据的价值,得出的结果往往并不十分的准确以及有效。However, the inventor found in the process of implementing the present application that the existing abnormal point detection algorithm with time series data or only using the feature data of the access user itself to perform clustering can only reflect the characteristics of the access user attribute; or only Using the time series data of the access, manually set the threshold to find some abnormal points (ie, confirm that the current access is abnormal). Neither of these methods fully exploits the value of the data, and the results are often not very accurate and effective.
发明内容Summary of the invention
本申请提供了一种异常方法检测方法,用以提高针对异常访问的检测效率以及准确性。该方法包括以下步骤:The present application provides an abnormal method detection method for improving detection efficiency and accuracy for abnormal access. The method includes the following steps:
获取待检测的访问请求的属性数据;Obtaining attribute data of the access request to be detected;
根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;Generating, according to the attribute data and the detection parameter, an abnormal probability corresponding to the access request, where the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断所述异常概率是否大于预设的异常阈值;
Determining whether the abnormal probability is greater than a preset abnormal threshold;
若是,确认所述访问请求为异常访问请求;If yes, confirm that the access request is an abnormal access request;
若否,确认所述访问请求为正常访问请求。If not, confirm that the access request is a normal access request.
优选地,在获取待检测的访问请求的属性数据之前,还包括:Preferably, before acquiring the attribute data of the access request to be detected, the method further includes:
根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;
分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;Labeling different values for normal sample access requests and exception sample access requests;
根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;Generating original detection parameters according to values of labels corresponding to each sample access request and attribute data;
根据所述原始检测参数生成所述检测参数。The detection parameter is generated according to the original detection parameter.
优选地,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常,具体为:Preferably, the access frequency information includes a user identifier corresponding to the sample access request and an access time, and determining whether each of the sample access requests is abnormal according to the access frequency information of each sample access request, specifically:
根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user identifier, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
优选地,具体根据以下公式生成原始检测参数:Preferably, the original detection parameters are generated according to the following formula:
其中,为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,为各所述样本访问请求的标签的取值。Where is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, where N is the number of the sample access requests, and each sample access request is The value of the label.
优选地,所述异常阈值具体通过以下方式生成:Preferably, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比;
Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
相应地,本申请还提出了一种异常访问检测设备,其特征在于,包括:Correspondingly, the present application further provides an abnormal access detecting device, which is characterized in that it comprises:
获取模块,获取待检测的访问请求的属性数据;Obtaining a module, acquiring attribute data of an access request to be detected;
第一生成模块,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断模块,判断所述异常概率是否大于预设的异常阈值;a determining module, determining whether the abnormal probability is greater than a preset abnormal threshold;
若是,所述判断模块确认所述访问请求为异常访问请求;If yes, the determining module confirms that the access request is an abnormal access request;
若否,所述判断模块确认所述访问请求为正常访问请求。If not, the determining module confirms that the access request is a normal access request.
优选地,还包括:Preferably, the method further comprises:
确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, by the access frequency information of each sample access request, determining whether each of the sample access requests is abnormal;
分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;
第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;
第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
优选地,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:Preferably, the access frequency information includes a user identifier ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:
根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;
Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
优选地,具体根据以下公式生成原始检测参数:Preferably, the original detection parameters are generated according to the following formula:
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
优选地,所述异常阈值具体通过以下方式生成:Preferably, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
由此可见,通过应用本申请的技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。It can be seen that, after applying the technical solution of the present application, after acquiring the attribute data of the access request to be detected, the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, because the detection parameter is according to the label corresponding to each sample access request. The value of the attribute and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request can be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
图1为现有技术中异常检测在服务响应上的应用示意图;1 is a schematic diagram of application of anomaly detection in a service response in the prior art;
图2为本申请提出的一种异常访问检测方法的流程示意图;2 is a schematic flowchart of an abnormal access detection method according to the present application;
图3为本申请具体实施例中基于时序特征提取的异常点检测流程图;3 is a flowchart of abnormal point detection based on time series feature extraction in a specific embodiment of the present application;
图4为本申请具体实施例中时序数据的特征提取示意图;
4 is a schematic diagram of feature extraction of time series data in a specific embodiment of the present application;
图5为本申请具体实施例中阈值计算流程示意图;FIG. 5 is a schematic diagram of a threshold calculation process in a specific embodiment of the present application; FIG.
图6为本申请提出的一种异常访问检测设备的结构示意图。FIG. 6 is a schematic structural diagram of an abnormal access detecting apparatus according to the present application.
如背景技术所述,针对含时序申请数据的特点,进一步提高异常点检测的准确性以及有效性,是关系到系统准确有效运行的一个关键问题,也是本申请所要解决的技术问题。As described in the background art, further improving the accuracy and effectiveness of the abnormal point detection for the characteristics of the time-series application data is a key issue related to the accurate and efficient operation of the system, and is also a technical problem to be solved by the present application.
为解决上述技术问题,本申请提出了一种异常点检测方法,将用户统计数据和时序访问数据结合起来,通过时序数据按规则给出一个初步的标签并采用逻辑回归的方法对初步标签和用户属性进行训练来得出最终结果,从而使异常点判定的结果得以进一步提高。In order to solve the above technical problem, the present application proposes an abnormal point detection method, which combines user statistics and time-series access data, gives a preliminary label by time series data according to rules, and adopts a logistic regression method for preliminary labels and users. The attributes are trained to produce the final result, so that the result of the abnormal point determination is further improved.
如图2所示,为本申请提出的一种异常点检测方法的流程示意图,包括以下步骤:As shown in FIG. 2, a schematic flowchart of an abnormal point detecting method proposed by the present application includes the following steps:
S201获取待检测的访问请求的属性数据。S201 obtains attribute data of the access request to be detected.
在本申请的实施方式中,在模型以及检测参数生成之后,对于每一次新访问请求预测的过程中,即在判断访问请求是否异常的过程中,仅由该次访问请求的属性决定,异常检测问题转化成为分类问题,对于该分类问题,仅需获取待检测的访问请求的属性数据得到全部属性向量即可,也就是说,在此步骤中不需要再获取新访问请求的时序数据。In the embodiment of the present application, after the model and the detection parameter are generated, in the process of predicting each new access request, that is, in determining whether the access request is abnormal, only the attribute of the access request is determined, and the abnormality detection is performed. The problem is transformed into a classification problem. For the classification problem, only the attribute data of the access request to be detected is obtained to obtain all the attribute vectors, that is, the time series data of the new access request does not need to be acquired in this step.
因此本申请的实施方式在进行新访问请求异常预测之前,还需要通过对各所述样本访问请求对应的初步标签和用户属性进行逻辑回归训练,来获得分类模型并得到检测参数,进而可以实现将用户数据和时序访问数据结合起来的目的。本申请逻辑回归训练以及检测参数获取的方式具体如下:Therefore, before performing the new access request abnormality prediction, the embodiment of the present application needs to perform the logistic regression training on the preliminary label and the user attribute corresponding to each sample access request to obtain the classification model and obtain the detection parameters, thereby implementing the The purpose of combining user data with time-series access data. The manner of logistic regression training and detection parameter acquisition in this application is as follows:
a)根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;a) determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;
b)分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;
b) assigning different values to the normal sample access request and the abnormal sample access request respectively;
c)根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;c) generating original detection parameters according to the values of the labels corresponding to the sample access requests and the attribute data;
d)根据所述原始检测参数生成所述检测参数。d) generating the detection parameter according to the original detection parameter.
另外,通过上述步骤可以看出,如何准确判断样本访问请求是否异常是决定分类模型以及检测参数精度的重要参数,故本申请具体实施方式提出了确定各所述样本访问请求是否异常的具体步骤:In addition, it can be seen from the above steps that how to accurately determine whether the sample access request is abnormal is an important parameter for determining the classification model and the accuracy of the detection parameters. Therefore, the specific implementation of the present application proposes specific steps for determining whether each of the sample access requests is abnormal:
a)根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;a) obtaining, according to the user identification, a first number of sample access requests submitted by the same user within a time window prior to the access time, and obtaining a submission by the same user within the time window after the access time The second number of sample access requests;
b)判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;b) determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
c)若是,确认所述样本访问请求为异常样本访问请求;c) if yes, confirm that the sample access request is an abnormal sample access request;
d)若否,确认所述样本访问请求为正常样本访问请求。d) If no, confirm that the sample access request is a normal sample access request.
在本申请的实施方式中,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间。其中,用户标识是作为区分不同用户的凭证,只要保证不同用户对应有不同的用户标识即可,故可能会出现多种形式和内容。举例来说,用户标识可以为用户对应终端的MAC地址,也可以为用户在服务终端的注册ID。访问时间为由服务器记录的该访问请求的访问时间点。In an embodiment of the present application, the access frequency information includes a user identifier corresponding to the sample access request and an access time. The user identifier is used as a credential for distinguishing different users. As long as different users have different user identifiers, different forms and contents may appear. For example, the user identifier may be the MAC address of the user-compatible terminal or the registration ID of the user at the service terminal. The access time is the access time point of the access request recorded by the server.
需要说明的是,以上用户标识的具体实例仅为本申请优选实施例提出的示例,在此基础上还可以选择其他类型的用户标识,以使本申请适用于更多的应用领域,这些改进都属于本发明的保护范围。It should be noted that the specific examples of the foregoing user identifiers are only examples provided by the preferred embodiment of the present application, and other types of user identifiers may be selected on the basis of the foregoing, so that the application is applicable to more application fields, and the improvements are all applicable. It belongs to the scope of protection of the present invention.
需要说明的是,以上确定样本访问请求是否异常的方法仅为本申请具体实施例提出的一种优选方案,在保证具有一定确定精度的前提下,本领域技术人员也可以采用其他方式进行确定,这些都属于本申请的保护范围。It should be noted that the foregoing method for determining whether the sample access request is abnormal is only a preferred solution proposed by the specific embodiment of the present application, and those skilled in the art may also determine other methods by ensuring certain accuracy. These are all within the scope of this application.
S202根据所述属性数据以及检测参数生成与所述访问请求对应的异
常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成。S202 generates, according to the attribute data and the detection parameter, a difference corresponding to the access request.
The probability of the detection is generated according to the value of the tag corresponding to each sample access request and the attribute data.
在本申请的实施方式中,异常阈值应该根据长期的经验进行调整,以达到一个合适的数值范围。如果异常阈值的取值较大,则会将部分异常点其判断为正常访问,故可能会漏掉很多异常点;相反的,如果异常阈值的取值过小,则会将部分正常点判断为异常点,影响正常用户的使用。因此如何通调整获得合适的异常阈值对以提高异常点检测的精度是至关重要的,故本申请通过以下方式来生成异常阈值:In embodiments of the present application, the anomaly threshold should be adjusted based on long-term experience to achieve a suitable range of values. If the value of the abnormal threshold is large, some abnormal points will be judged as normal access, so many abnormal points may be missed. Conversely, if the abnormal threshold is too small, some normal points will be judged as Abnormal points affect the use of normal users. Therefore, how to adjust the appropriate abnormal threshold value to improve the accuracy of the abnormal point detection is crucial, so the present application generates an abnormal threshold by:
a)获取异常样本访问请求占所有样本访问请求的百分比;a) get the percentage of exception sample access requests for all sample access requests;
b)根据所述检测参数获取与各所述样本访问请求对应的异常概率;b) acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
c)将各所述样本访问请求对应的异常概率从小至大进行排序处理;c) sorting the abnormal probability corresponding to each sample access request from small to large;
d)根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。d) determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
在本申请的具体实施例中,生成原始检测参数一个参考公式如下:In a specific embodiment of the present application, a reference formula for generating the original detection parameters is as follows:
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
通过上述生成原始检测参数的参考公式,计算结果是参数w就是所述原始检测参数。在后续过程中即可利用原始检测参数w对所有新访问请求进行计算,通过对计算结果与异常阈值进行判断,进而实现对新访问请求是否异常进行预测。Through the above reference formula for generating the original detection parameter, the calculation result is that the parameter w is the original detection parameter. In the subsequent process, all the new access requests can be calculated by using the original detection parameter w, and the calculation result and the abnormal threshold are judged, thereby realizing whether the new access request is abnormal.
需要说明的是,以上公式仅为本申请具体实施例提出的一种优选方案,然而,在保证计算结果能够作为原始检测参数的前提下,本领域技术人员也可以对该公式进行修改或者变形,这些都属于本申请的保护范围。It should be noted that the above formula is only a preferred solution proposed by the specific embodiment of the present application. However, those skilled in the art may modify or modify the formula under the premise of ensuring that the calculation result can be used as the original detection parameter. These are all within the scope of this application.
S203判断所述异常概率是否大于预设的异常阈值。
S203 determines whether the abnormal probability is greater than a preset abnormal threshold.
在本申请的实施方式中,在新访问请求到达时,通过分类模型来预测新访问请求是否为异常访问请求。具体的,首先通过将新访问请求的属性数据代入分类模型,可以得到该次访问为异常访问请求的概率,即异常概率,通过将该常访问请求的异常概率与预设的异常阈值进行比较,判断所述异常概率是否大于预设的异常阈值。若该新访问请求的异常概率大于异常阈值时,则判定为异常访问请求,即执行S204;若该新访问请求的异常概率小于异常阈值时,则判定为正常访问请求,即执行S205。In an embodiment of the present application, when a new access request arrives, it is predicted by the classification model whether the new access request is an abnormal access request. Specifically, by first substituting the attribute data of the new access request into the classification model, the probability that the access is an abnormal access request, that is, the abnormal probability, can be obtained by comparing the abnormal probability of the frequent access request with a preset abnormal threshold. It is determined whether the abnormal probability is greater than a preset abnormal threshold. If the abnormal probability of the new access request is greater than the abnormal threshold, it is determined as an abnormal access request, that is, S204 is executed; if the abnormal probability of the new access request is less than the abnormal threshold, it is determined as a normal access request, that is, S205 is executed.
S204若是,确认所述访问请求为异常访问请求。S204: If yes, confirm that the access request is an abnormal access request.
S205若否,确认所述访问请求为正常访问请求。S205, if no, confirm that the access request is a normal access request.
由此可见,通过应用以上技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。It can be seen that, by applying the above technical solution, after obtaining the attribute data of the access request to be detected, the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is obtained according to the label corresponding to each sample access request. The value and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, the access request can be confirmed as an abnormal access request based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
为了进一步阐述本申请的技术思想,现结合如图2所示的具体的应用场景,对本申请的技术方案进行说明。该基于时序特征提取的异常点检测流程通过时序序列分析、线性分类器训练和预测三个步骤实现了异常点的检测,这三个不同步骤的具体介绍如下:In order to further illustrate the technical idea of the present application, the technical solution of the present application will be described in conjunction with a specific application scenario as shown in FIG. 2 . The outlier detection process based on time series feature extraction realizes the detection of abnormal points through three steps: time series analysis, linear classifier training and prediction. The three different steps are described as follows:
(1)通过时序序列生成标签(1) Generate tags through time series
根据时序序列的特点,在训练集中,首先将所有用户访问数据按照时间顺序进行排序,排序完成之后,我们对比每次一访问的用户ID,设定一个滑动窗口向后移动,按序遍历每一次访问。对于每一次访问,如果在它的前半个窗口和后半个窗口中由相同用户提交的访问次数大于一定阈值则标记为异常点。那么异常点的标签的集合可记作:
According to the characteristics of the time series, in the training set, all the user access data are first sorted in chronological order. After the sorting is completed, we compare the user ID of each access, set a sliding window to move backward, and traverse each time in order. access. For each visit, if the number of accesses submitted by the same user in its first half and the second half is greater than a certain threshold, it is marked as anomalous. Then the set of labels for the anomaly points can be written as:
其中,Vi表示第i个访问的标签,w为窗口大小参数,th是阈值参数,其示意图如图3所示。Where Vi represents the label of the ith access, w is the window size parameter, t h is a threshold parameter, and its schematic diagram is shown in FIG. 3 .
(2)线性分类器训练(2) Linear classifier training
在所有访问标签生成完毕之后,对于每一次访问,我们认为该次访问是否是异常的,完全由该次访问的属性所决定,问题转化为一个分类问题,对于该分类问题来讲,不需要在使用时序的数据。根据每次访问的其他属性特征和标签,进行逻辑回归训练,得到一个分类模型。该模型的结果是参数w,满足:After all the access tags are generated, for each visit, we think that the visit is abnormal, completely determined by the attributes of the visit, and the problem is transformed into a classification problem. For the classification problem, it is not necessary Use time series data. According to other attribute characteristics and labels of each visit, logistic regression training is performed to obtain a classification model. The result of this model is the parameter w, which satisfies:
其中,argminw是一个参数w的取值函数,w的值使得右边求和项取最小值。N代表总的学习样本个数,Vi表示上一步的异常点标签。wT表示w的转置。在实际进行逻辑回归训练的时候,采用L-BFGS算法对其进行加速。Where argmin w is a function of the value of the parameter w, and the value of w makes the sum of the right side take the minimum value. N represents the total number of learning samples, and V i represents the abnormal point label of the previous step. w T represents the transposition of w. In the actual logistic regression training, the L-BFGS algorithm is used to accelerate it.
(3)新访问预测(3) New visit prediction
当有新的访问到达时,能通过分类模型来预测新的访问是否是异常点。将新的访问数据代入分类模型后,能得到该次访问是异常点的概率,设定一个阈值,当该访问为异常的概率大于该阈值时,则判定为异常点,所有异常新访问的集合表示为:When a new visit arrives, the classification model can be used to predict whether the new access is anomalous. After the new access data is substituted into the classification model, the probability that the access is an abnormal point can be obtained, and a threshold is set. When the probability that the access is abnormal is greater than the threshold, the abnormal point is determined, and the set of all abnormal new accesses is determined. Expressed as:
{Vi|wTxi>pt}
{V i |w T x i >p t }
其中Vi表示第i次访问,xi表示该次访问的所有属性向量,pt为判断异常点的阈值。在这里,阈值应该根据长期的经验进行调整,直到一个合适的数字。如果该阈值取值太大,则会漏掉很多异常点,将其判为正常访问;如果该阈值取值太小,则会将很多正常点判定为异常点,影响正常用户使用。因此调节一个合适的阈值是非常必要的,在这里可以根据百分比的方式来设置,首先找到异常点占总体训练数据的百分比,然后将训练数据带入模型按模型计算出概率,接着对该概率进行排序,找到在异常点占总体百分比位置的概率,将其设为阈值。具体示意图如图5所示。Where V i represents the ith access, x i represents all attribute vectors of the visit, and p t is the threshold for determining the abnormal point. Here, the threshold should be adjusted based on long-term experience until a suitable number. If the threshold value is too large, many abnormal points will be missed and judged as normal access. If the threshold value is too small, many normal points will be determined as abnormal points, which will affect the normal users. Therefore, it is necessary to adjust a suitable threshold. Here, it can be set according to the percentage. First, find the percentage of the abnormal point as the total training data, then bring the training data into the model to calculate the probability according to the model, and then proceed to the probability. Sort, find the probability that the anomaly point is the percentage of the population, and set it as the threshold. The specific schematic diagram is shown in Figure 5.
上述应用场景的技术方案,通过样本数据的时序特征为分类模型提供训练标签,再根据各个样本访问请求对应的标签的取值以及属性数据生成检测参数;在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。The technical solution of the foregoing application scenario provides a training label for the classification model by using the time series feature of the sample data, and then generates detection parameters according to the value of the label corresponding to each sample access request and the attribute data; after acquiring the attribute data of the access request to be detected The abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request may be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
为达到以上技术目的,本申请还提出了一种异常访问检测设备,如图6所示,包括以下模块:To achieve the above technical purpose, the present application also proposes an abnormal access detecting device, as shown in FIG. 6, comprising the following modules:
获取模块610,获取待检测的访问请求的属性数据;The obtaining module 610 is configured to obtain attribute data of the access request to be detected.
第一生成模块620,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module 620 generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断模块630,判断所述异常概率是否大于预设的异常阈值;The determining module 630 determines whether the abnormal probability is greater than a preset abnormal threshold;
若是,所述判断模块630确认所述访问请求为异常访问请求;If yes, the determining module 630 confirms that the access request is an abnormal access request;
若否,所述判断模块630确认所述访问请求为正常访问请求。If not, the determining module 630 confirms that the access request is a normal access request.
在具体的应用场景中,还包括:In specific application scenarios, it also includes:
确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本
访问请求是否异常;Determining a module, determining each sample according to access frequency information of each sample access request
Whether the access request is abnormal;
分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;
第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;
第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
在具体的应用场景中,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:In a specific application scenario, the access frequency information includes a user identifier ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:
根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
在具体的应用场景中,具体根据以下公式生成原始检测参数:In a specific application scenario, the original detection parameters are generated according to the following formula:
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
在具体的应用场景中,所述异常阈值具体通过以下方式生成:In a specific application scenario, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。
Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
通过应用本申请的技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。After the attribute data of the access request to be detected is obtained, the abnormality probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is based on the value of the label corresponding to each sample access request and The attribute data is generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, it can be confirmed based on the size of the two whether the access request is an abnormal access request. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present application.
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。A person skilled in the art can understand that the drawings are only a schematic diagram of a preferred implementation scenario, and the modules or processes in the drawings are not necessarily required to implement the application.
本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。A person skilled in the art may understand that the modules in the apparatus in the implementation scenario may be distributed in the apparatus for implementing the scenario according to the implementation scenario description, or may be correspondingly changed in one or more devices different from the implementation scenario. The modules of the above implementation scenarios may be combined into one module, or may be further split into multiple sub-modules.
上述本申请序号仅仅为了描述,不代表实施场景的优劣。The above serial numbers are only for the description, and do not represent the advantages and disadvantages of the implementation scenario.
以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。
The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any changes that can be made by those skilled in the art should fall within the protection scope of the present application.
Claims (10)
- 一种异常访问检测方法,其特征在于,包括:An abnormal access detection method, comprising:获取待检测的访问请求的属性数据;Obtaining attribute data of the access request to be detected;根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;Generating, according to the attribute data and the detection parameter, an abnormal probability corresponding to the access request, where the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;判断所述异常概率是否大于预设的异常阈值;Determining whether the abnormal probability is greater than a preset abnormal threshold;若是,确认所述访问请求为异常访问请求;If yes, confirm that the access request is an abnormal access request;若否,确认所述访问请求为正常访问请求。If not, confirm that the access request is a normal access request.
- 如权利要求1所述的方法,其特征在于,在获取待检测的访问请求的属性数据之前,还包括:The method of claim 1, wherein before acquiring the attribute data of the access request to be detected, the method further comprises:根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;Labeling different values for normal sample access requests and exception sample access requests;根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;Generating original detection parameters according to values of labels corresponding to each sample access request and attribute data;根据所述原始检测参数生成所述检测参数。The detection parameter is generated according to the original detection parameter.
- 如权利要求2所述的方法,其特征在于,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常,具体为:The method according to claim 2, wherein the access frequency information includes a user identifier corresponding to the sample access request and an access time, and each sample access request is determined according to access frequency information of each sample access request. Whether it is abnormal, specifically:根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user identifier, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;若是,确认所述样本访问请求为异常样本访问请求; If yes, confirm that the sample access request is an abnormal sample access request;若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
- 如权利要求2所述的方法,其特征在于,具体根据以下公式生成原始检测参数:The method of claim 2, wherein the original detection parameters are generated according to the following formula:其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
- 如权利要求1-4任一项所述的方法,其特征在于,所述异常阈值具体通过以下方式生成:The method according to any one of claims 1 to 4, wherein the abnormality threshold is specifically generated by:获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
- 一种异常访问检测设备,其特征在于,包括:An abnormal access detecting device, comprising:获取模块,获取待检测的访问请求的属性数据;Obtaining a module, acquiring attribute data of an access request to be detected;第一生成模块,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;判断模块,判断所述异常概率是否大于预设的异常阈值;a determining module, determining whether the abnormal probability is greater than a preset abnormal threshold;若是,所述判断模块确认所述访问请求为异常访问请求;If yes, the determining module confirms that the access request is an abnormal access request;若否,所述判断模块确认所述访问请求为正常访问请求。If not, the determining module confirms that the access request is a normal access request.
- 如权利要求6所述的设备,其特征在于,还包括:The device of claim 6 further comprising:确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常; Determining, by the access frequency information of each sample access request, determining whether each of the sample access requests is abnormal;分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
- 如权利要求7所述的设备,其特征在于,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:The device according to claim 7, wherein the access frequency information includes a user identification ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
- 如权利要求7所述的设备,其特征在于,具体根据以下公式生成原始检测参数:The apparatus according to claim 7, wherein the original detection parameter is generated according to the following formula:其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
- 如权利要求6-10任一项所述的设备,其特征在于,所述异常阈值具体通过以下方式生成:The device according to any one of claims 6 to 10, wherein the abnormality threshold is specifically generated by:获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。 Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610035487.6A CN106982196B (en) | 2016-01-19 | 2016-01-19 | Abnormal access detection method and equipment |
CN201610035487.6 | 2016-01-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017124942A1 true WO2017124942A1 (en) | 2017-07-27 |
Family
ID=59341062
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/070798 WO2017124942A1 (en) | 2016-01-19 | 2017-01-10 | Method and apparatus for abnormal access detection |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN106982196B (en) |
TW (1) | TW201730766A (en) |
WO (1) | WO2017124942A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020007367A1 (en) * | 2018-07-06 | 2020-01-09 | 北京白山耘科技有限公司 | Method for inspecting abnormal web access, device, medium, and equipment |
CN111476610A (en) * | 2020-04-16 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Information detection method and device and computer readable storage medium |
CN112001596A (en) * | 2020-07-27 | 2020-11-27 | 北京科技大学 | Method and system for detecting abnormal point of time series data |
WO2021017284A1 (en) * | 2019-07-30 | 2021-02-04 | 平安科技(深圳)有限公司 | Cortex-learning-based anomaly detection method and apparatus, terminal device, and storage medium |
CN112511538A (en) * | 2020-11-30 | 2021-03-16 | 杭州安恒信息技术股份有限公司 | Network security detection method based on time sequence and related components |
CN114500004A (en) * | 2022-01-05 | 2022-05-13 | 北京理工大学 | Anomaly detection method based on conditional diffusion probability generation model |
CN116016274A (en) * | 2022-12-29 | 2023-04-25 | 南京融军建科技有限公司 | Abnormal communication detection method and system |
CN117424764A (en) * | 2023-12-19 | 2024-01-19 | 中关村科学城城市大脑股份有限公司 | System resource access request information processing method and device, electronic equipment and medium |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107659566B (en) * | 2017-09-20 | 2021-01-19 | 深圳市创梦天地科技股份有限公司 | Method and device for determining identification frequency of abnormal access of server and server |
JP6698956B2 (en) * | 2017-10-11 | 2020-05-27 | 三菱電機株式会社 | Sample data generation device, sample data generation method, and sample data generation program |
CN107678928B (en) * | 2017-10-31 | 2021-06-01 | 聚好看科技股份有限公司 | Application program processing method and server |
CN107819631B (en) * | 2017-11-23 | 2021-03-02 | 东软集团股份有限公司 | Equipment anomaly detection method, device and equipment |
CN108200008A (en) * | 2017-12-05 | 2018-06-22 | 阿里巴巴集团控股有限公司 | The recognition methods and device that abnormal data accesses |
CN108268632A (en) * | 2018-01-16 | 2018-07-10 | 中国人民解放军海军航空大学 | Abnormal information data identifies machine learning method |
CN108681542A (en) * | 2018-02-12 | 2018-10-19 | 阿里巴巴集团控股有限公司 | A kind of method and device of abnormality detection |
CN108449342B (en) * | 2018-03-20 | 2020-11-27 | 北京云站科技有限公司 | Malicious request detection method and device |
CN109145030B (en) * | 2018-06-26 | 2022-07-22 | 创新先进技术有限公司 | Abnormal data access detection method and device |
CN108667855B (en) * | 2018-07-19 | 2021-12-03 | 百度在线网络技术(北京)有限公司 | Network flow abnormity monitoring method and device, electronic equipment and storage medium |
CN109194539B (en) * | 2018-08-13 | 2022-01-28 | 中国平安人寿保险股份有限公司 | Data management and control method and device, computer equipment and storage medium |
CN109543404B (en) * | 2018-12-03 | 2019-10-25 | 北京芯盾时代科技有限公司 | A kind of methods of risk assessment and device of access behavior |
CN109766244A (en) * | 2019-01-04 | 2019-05-17 | 中国银行股份有限公司 | A kind of distributed system CPU method for detecting abnormality, device and storage medium |
CN109873812B (en) * | 2019-01-28 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Anomaly detection method and device and computer equipment |
CN111835696B (en) * | 2019-04-23 | 2023-05-09 | 阿里巴巴集团控股有限公司 | Method and device for detecting abnormal request individuals |
CN112148763A (en) * | 2019-06-28 | 2020-12-29 | 京东数字科技控股有限公司 | Unsupervised data anomaly detection method and device and storage medium |
CN110417744B (en) * | 2019-06-28 | 2021-12-24 | 平安科技(深圳)有限公司 | Security determination method and device for network access |
CN110351299B (en) * | 2019-07-25 | 2022-04-22 | 新华三信息安全技术有限公司 | Network connection detection method and device |
CN110675228B (en) * | 2019-09-27 | 2021-05-28 | 支付宝(杭州)信息技术有限公司 | User ticket buying behavior detection method and device |
CN111177513B (en) * | 2019-12-31 | 2023-10-31 | 北京百度网讯科技有限公司 | Determination method and device of abnormal access address, electronic equipment and storage medium |
CN113076349B (en) * | 2020-01-06 | 2024-06-11 | 阿里巴巴集团控股有限公司 | Data anomaly detection method, device and system and electronic equipment |
CN115277439B (en) * | 2021-04-30 | 2023-09-19 | 中国移动通信集团有限公司 | Network service detection method and device, electronic equipment and storage medium |
CN113282433B (en) * | 2021-06-10 | 2023-04-28 | 天翼云科技有限公司 | Cluster anomaly detection method, device and related equipment |
CN113360348B (en) * | 2021-06-30 | 2022-09-09 | 北京字节跳动网络技术有限公司 | Abnormal request processing method and device, electronic equipment and storage medium |
TWI789075B (en) * | 2021-10-26 | 2023-01-01 | 中華電信股份有限公司 | Electronic device and method for detecting abnormal execution of application program |
CN117579400B (en) * | 2024-01-17 | 2024-03-29 | 国网四川省电力公司电力科学研究院 | Industrial control system network safety monitoring method and system based on neural network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009211725A (en) * | 2009-06-18 | 2009-09-17 | Toshiba Corp | Abnormal data detecting system, abnormal data detecting method, abnormal data detecting program |
CN105187242A (en) * | 2015-08-20 | 2015-12-23 | 中国人民解放军国防科学技术大学 | Method for detecting abnormal user behaviours mined on the basis of variable-length sequence mode |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8683591B2 (en) * | 2010-11-18 | 2014-03-25 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
CN103198711B (en) * | 2013-03-21 | 2014-12-17 | 东南大学 | Vehicle regulating and controlling method of lowering probability of traffic accidents of different severity |
-
2016
- 2016-01-19 CN CN201610035487.6A patent/CN106982196B/en active Active
-
2017
- 2017-01-10 WO PCT/CN2017/070798 patent/WO2017124942A1/en active Application Filing
- 2017-01-17 TW TW106101584A patent/TW201730766A/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009211725A (en) * | 2009-06-18 | 2009-09-17 | Toshiba Corp | Abnormal data detecting system, abnormal data detecting method, abnormal data detecting program |
CN105187242A (en) * | 2015-08-20 | 2015-12-23 | 中国人民解放军国防科学技术大学 | Method for detecting abnormal user behaviours mined on the basis of variable-length sequence mode |
Non-Patent Citations (1)
Title |
---|
DING, JIE ET AL.: "An Anomaly Detection System on Big Data", NATURAL SCIENCE JOURNAL OF HAINAN UNIVERSITY, vol. 33, no. 1, 31 March 2015 (2015-03-31), pages 24 - 27 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020007367A1 (en) * | 2018-07-06 | 2020-01-09 | 北京白山耘科技有限公司 | Method for inspecting abnormal web access, device, medium, and equipment |
WO2021017284A1 (en) * | 2019-07-30 | 2021-02-04 | 平安科技(深圳)有限公司 | Cortex-learning-based anomaly detection method and apparatus, terminal device, and storage medium |
CN111476610B (en) * | 2020-04-16 | 2023-06-09 | 腾讯科技(深圳)有限公司 | Information detection method, device and computer readable storage medium |
CN111476610A (en) * | 2020-04-16 | 2020-07-31 | 腾讯科技(深圳)有限公司 | Information detection method and device and computer readable storage medium |
CN112001596B (en) * | 2020-07-27 | 2023-10-31 | 北京科技大学 | Method and system for detecting abnormal points of time sequence data |
CN112001596A (en) * | 2020-07-27 | 2020-11-27 | 北京科技大学 | Method and system for detecting abnormal point of time series data |
CN112511538B (en) * | 2020-11-30 | 2022-10-18 | 杭州安恒信息技术股份有限公司 | Network security detection method based on time sequence and related components |
CN112511538A (en) * | 2020-11-30 | 2021-03-16 | 杭州安恒信息技术股份有限公司 | Network security detection method based on time sequence and related components |
CN114500004A (en) * | 2022-01-05 | 2022-05-13 | 北京理工大学 | Anomaly detection method based on conditional diffusion probability generation model |
CN116016274A (en) * | 2022-12-29 | 2023-04-25 | 南京融军建科技有限公司 | Abnormal communication detection method and system |
CN116016274B (en) * | 2022-12-29 | 2023-11-24 | 天航长鹰(江苏)科技有限公司 | Abnormal communication detection method and system |
CN117424764A (en) * | 2023-12-19 | 2024-01-19 | 中关村科学城城市大脑股份有限公司 | System resource access request information processing method and device, electronic equipment and medium |
CN117424764B (en) * | 2023-12-19 | 2024-02-23 | 中关村科学城城市大脑股份有限公司 | System resource access request information processing method and device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN106982196B (en) | 2020-07-31 |
TW201730766A (en) | 2017-09-01 |
CN106982196A (en) | 2017-07-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017124942A1 (en) | Method and apparatus for abnormal access detection | |
US10686829B2 (en) | Identifying changes in use of user credentials | |
Sukumar et al. | Network intrusion detection using improved genetic k-means algorithm | |
US10140576B2 (en) | Computer-implemented system and method for detecting anomalies using sample-based rule identification | |
Shafeeq et al. | Dynamic clustering of data with modified k-means algorithm | |
US20170063893A1 (en) | Learning detector of malicious network traffic from weak labels | |
US10685008B1 (en) | Feature embeddings with relative locality for fast profiling of users on streaming data | |
Cao et al. | Machine learning to detect anomalies in web log analysis | |
CN111709028B (en) | Network security state evaluation and attack prediction method | |
US11516240B2 (en) | Detection of anomalies associated with fraudulent access to a service platform | |
US20210385253A1 (en) | Cluster detection and elimination in security environments | |
Grill et al. | Learning combination of anomaly detectors for security domain | |
US20180032917A1 (en) | Hierarchical classifiers | |
CN110855648B (en) | Early warning control method and device for network attack | |
CN112926045B (en) | Group control equipment identification method based on logistic regression model | |
Powell et al. | A cross-comparison of feature selection algorithms on multiple cyber security data-sets. | |
CN103530312A (en) | User identification method and system using multifaceted footprints | |
Jordaney et al. | Misleading metrics: On evaluating machine learning for malware with confidence | |
US10372702B2 (en) | Methods and apparatus for detecting anomalies in electronic data | |
CN110097120B (en) | Network flow data classification method, equipment and computer storage medium | |
US20220327394A1 (en) | Learning support apparatus, learning support methods, and computer-readable recording medium | |
CN110414229B (en) | Operation command detection method, device, computer equipment and storage medium | |
Iskhakov et al. | Enhanced user authentication algorithm based on behavioral analytics in Web-based cyberphysical systems | |
Gana et al. | Machine learning classification algorithms for phishing detection: A comparative appraisal and analysis | |
CN111224919A (en) | DDOS (distributed denial of service) identification method and device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17740973 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17740973 Country of ref document: EP Kind code of ref document: A1 |