WO2017124942A1 - Method and apparatus for abnormal access detection - Google Patents

Method and apparatus for abnormal access detection Download PDF

Info

Publication number
WO2017124942A1
WO2017124942A1 PCT/CN2017/070798 CN2017070798W WO2017124942A1 WO 2017124942 A1 WO2017124942 A1 WO 2017124942A1 CN 2017070798 W CN2017070798 W CN 2017070798W WO 2017124942 A1 WO2017124942 A1 WO 2017124942A1
Authority
WO
WIPO (PCT)
Prior art keywords
access request
abnormal
sample
sample access
access
Prior art date
Application number
PCT/CN2017/070798
Other languages
French (fr)
Chinese (zh)
Inventor
付子豪
张凯
蔡宁
杨旭
褚崴
Original Assignee
阿里巴巴集团控股有限公司
付子豪
张凯
蔡宁
杨旭
褚崴
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 付子豪, 张凯, 蔡宁, 杨旭, 褚崴 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2017124942A1 publication Critical patent/WO2017124942A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to an abnormal access detection method.
  • the application also relates to an abnormal access detecting device.
  • Data mining is the process of extracting potentially, implicit, and valuable knowledge, patterns, or rules from large data sets.
  • the patterns of mining from large-scale datasets can generally be divided into five categories: association rules, classification and prediction, clustering, evolution analysis, and outlier detection.
  • the mining of abnormal point data includes two parts: abnormal point data detection and abnormal point data analysis.
  • Outlier data is data that is inconsistent with the general behavior or model of the data. They are data that are distinctive in the data set. These data are not random deviations but are generated by completely different mechanisms.
  • Abnormal point data mining has a wide range of applications, such as fraud detection, detection of unusual credit card usage or telecommunication services with outlier detection; forecasting market trends; analysis of abnormal behaviors such as customer churn in market analysis; or discovery in medical analysis Unusual response to a variety of treatments, etc.; through the study of these data, found abnormal behavior and patterns, to achieve abnormal data mining capabilities.
  • the system determines whether to respond to a user request according to a user's request and a user's information record.
  • some machine learning algorithms are introduced to learn.
  • the commonly used methods include constructing Markov distance based on user attributes to mine users at outliers, and performing abnormal point discrimination based on the frequency of user submission requests.
  • the process of discrimination is as follows:
  • the Mahalanobis distance is then calculated from the covariance matrix, which is defined as follows:
  • the present application provides an abnormal method detection method for improving detection efficiency and accuracy for abnormal access.
  • the method includes the following steps:
  • the method before acquiring the attribute data of the access request to be detected, the method further includes:
  • the access frequency information includes a user identifier corresponding to the sample access request and an access time, and determining whether each of the sample access requests is abnormal according to the access frequency information of each sample access request, specifically:
  • the original detection parameters are generated according to the following formula:
  • w is the original detection parameter
  • w is the minimum value corresponding to the summation item
  • N is the number of the sample access requests
  • each sample access request is The value of the label.
  • the abnormal threshold is specifically generated by:
  • an abnormal access detecting device which is characterized in that it comprises:
  • the first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
  • a determining module determining whether the abnormal probability is greater than a preset abnormal threshold
  • the determining module confirms that the access request is a normal access request.
  • An allocation module that assigns different values to the normal sample access request and the abnormal sample access request
  • the second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data
  • a third generation module configured to generate the detection parameter according to the original detection parameter.
  • the access frequency information includes a user identifier ID and an access time corresponding to the sample access request
  • the determining module is specifically configured to:
  • the original detection parameters are generated according to the following formula:
  • argmin w is the value function of the original detection parameter
  • w is the original detection parameter
  • w is the minimum value corresponding to the summation item
  • N is the number of sample access requests
  • V i is each The value of the label of the sample access request.
  • the abnormal threshold is specifically generated by:
  • the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, because the detection parameter is according to the label corresponding to each sample access request.
  • the value of the attribute and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request can be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
  • FIG. 1 is a schematic diagram of application of anomaly detection in a service response in the prior art
  • FIG. 2 is a schematic flowchart of an abnormal access detection method according to the present application.
  • FIG. 3 is a flowchart of abnormal point detection based on time series feature extraction in a specific embodiment of the present application
  • FIG. 4 is a schematic diagram of feature extraction of time series data in a specific embodiment of the present application.
  • FIG. 5 is a schematic diagram of a threshold calculation process in a specific embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an abnormal access detecting apparatus according to the present application.
  • the present application proposes an abnormal point detection method, which combines user statistics and time-series access data, gives a preliminary label by time series data according to rules, and adopts a logistic regression method for preliminary labels and users.
  • the attributes are trained to produce the final result, so that the result of the abnormal point determination is further improved.
  • a schematic flowchart of an abnormal point detecting method proposed by the present application includes the following steps:
  • the model and the detection parameter are generated, in the process of predicting each new access request, that is, in determining whether the access request is abnormal, only the attribute of the access request is determined, and the abnormality detection is performed.
  • the problem is transformed into a classification problem. For the classification problem, only the attribute data of the access request to be detected is obtained to obtain all the attribute vectors, that is, the time series data of the new access request does not need to be acquired in this step.
  • the calculation result is that the parameter w is the original detection parameter.
  • all the new access requests can be calculated by using the original detection parameter w, and the calculation result and the abnormal threshold are judged, thereby realizing whether the new access request is abnormal.
  • the abnormality probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is based on the value of the label corresponding to each sample access request and The attribute data is generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, it can be confirmed based on the size of the two whether the access request is an abnormal access request. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
  • the present application can be implemented by hardware, or by software plus a necessary general hardware platform.
  • the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present application discloses a method for abnormal access detection, comprising: acquiring, on the basis of the extraction of a time-series data feature corresponding to each sample access request, the value of a corresponding tag, and then generating, according to the value of the tag corresponding to each sample access request and attribute data, a detection parameter, thus after acquiring the attribute data of the access request to be detected, generating, according to the attribute data and the detection parameter, an abnormity probability corresponding to the access request, and after determining whether the abnormity probability is greater than a preset abnormity threshold, determining, according to the comparison result, whether the access request is an abnormal access request. Therefore, an abnormal access request can be identified among a huge number of access requests and processed, ensuring the stability and security of the network.

Description

一种异常访问检测方法及设备Abnormal access detection method and device 技术领域Technical field
本申请涉及互联网技术领域,特别涉及一种异常访问检测方法。本申请同时还涉及一种异常访问检测设备。The present application relates to the field of Internet technologies, and in particular, to an abnormal access detection method. The application also relates to an abnormal access detecting device.
背景技术Background technique
数据挖掘是从大规模的数据集中提取潜在的、隐含的、有价值的知识、模式或规则的过程。从大规模的数据集中挖掘的模式一般可以分为五类:关联规则、分类和预测、聚类、演变分析以及异常点检测等。异常点数据的挖掘包括异常点数据检测和异常点数据分析两个部分。异常点数据是与数据的一般行为或模型不一致的数据,它们是数据集中与众不同的数据,这些数据并非随机偏差,而是产生于完全不同的机制。异常点数据挖掘有着广泛的应用,如欺诈检测,用异常点检测来探测不寻常的信用卡使用或者电信服务;预测市场动向;在市场分析中分析客户的流失等异常行为;或者在医疗分析中发现对多种治疗方式的不寻常的反应等等;通过对这些数据进行研究,发现不正常的行为和模式,实现异常数据挖掘功能。Data mining is the process of extracting potentially, implicit, and valuable knowledge, patterns, or rules from large data sets. The patterns of mining from large-scale datasets can generally be divided into five categories: association rules, classification and prediction, clustering, evolution analysis, and outlier detection. The mining of abnormal point data includes two parts: abnormal point data detection and abnormal point data analysis. Outlier data is data that is inconsistent with the general behavior or model of the data. They are data that are distinctive in the data set. These data are not random deviations but are generated by completely different mechanisms. Abnormal point data mining has a wide range of applications, such as fraud detection, detection of unusual credit card usage or telecommunication services with outlier detection; forecasting market trends; analysis of abnormal behaviors such as customer churn in market analysis; or discovery in medical analysis Unusual response to a variety of treatments, etc.; through the study of these data, found abnormal behavior and patterns, to achieve abnormal data mining capabilities.
如图1所示,为现有的异常点监测技术手段解决服务响应问题的示意图,异常点监测技术手段目前有着广泛的应用。在该问题中,多个用户会向服务器提交相应的服务申请,在这些申请中,有的申请是正常申请,有的申请是异常申请。如果服务器接受了异常申请,那么将会严重影响服务器工作,也会对其他正常的申请造成一定的影响。As shown in FIG. 1 , a schematic diagram of the existing abnormal point monitoring technology to solve the service response problem, the abnormal point monitoring technology has a wide application. In this question, multiple users submit corresponding service requests to the server. Among these applications, some applications are normal applications, and some applications are abnormal applications. If the server accepts the exception request, it will seriously affect the server work, and will also have some impact on other normal applications.
为解决上述技术问题,现有技术中使系统根据用户的请求以及用户的信息记录决定是否响应用户请求。在判定过程中,会引入一些机器学习的算法进行学习,现在常用的方法包括根据用户属性构造马氏距离挖掘处于离群点的用户、以及根据用户提交请求的频率进行异常点判别等方法,具体判别过程如下: In order to solve the above technical problem, in the prior art, the system determines whether to respond to a user request according to a user's request and a user's information record. In the process of judging, some machine learning algorithms are introduced to learn. The commonly used methods include constructing Markov distance based on user attributes to mine users at outliers, and performing abnormal point discrimination based on the frequency of user submission requests. The process of discrimination is as follows:
(1)在根据马氏距离进行异常点判别的过程中,首先计算用户属性间的协方差矩阵,其定义如下:(1) In the process of discriminating outliers based on Mahalanobis distance, the covariance matrix between user attributes is first calculated, which is defined as follows:
∑=E{(X-E[X])(X-E[X])T}∑=E{(XE[X])(XE[X]) T }
随后根据该协方差矩阵计算马氏距离,其定义如下:The Mahalanobis distance is then calculated from the covariance matrix, which is defined as follows:
Ma=(X-μ)T-1(X-μ)M a =(X-μ) T-1 (X-μ)
最后根据该距离的大小进行判别,一些距离过大的点将被判定为离群点。Finally, according to the size of the distance, some points with too large distance will be judged as outliers.
(2)在根据用户提交请求的频率进行异常点判别的方法中,用户单位时间提交请求的次数超过一定阈值之后,将会直接被判定为异常点。(2) In the method of discriminating the abnormal point according to the frequency at which the user submits the request, after the number of times the user unit submits the request exceeds a certain threshold, it will be directly determined as an abnormal point.
因此如何利用已有的访问数据和用户信息,更加准确地鉴别出异常请求,并采取相应措施,切实关系到服务资源分配的稳定性和经济性,是服务响应策略中的一个非常重要的问题。Therefore, how to use the existing access data and user information to more accurately identify the abnormal request and take corresponding measures, which is related to the stability and economy of service resource allocation, is a very important issue in the service response strategy.
然而,发明人在实现本申请的过程中发现,现有带时序数据的异常点检测算法或者只利用了访问用户本身的特征数据,进行聚类,只能反映访问用户属性上的特征;或者只利用了访问的时序数据,手动设置阈值来发现一些异常点(即确认当前的访问为异常)。这两种方式都没有充分发挥数据的价值,得出的结果往往并不十分的准确以及有效。However, the inventor found in the process of implementing the present application that the existing abnormal point detection algorithm with time series data or only using the feature data of the access user itself to perform clustering can only reflect the characteristics of the access user attribute; or only Using the time series data of the access, manually set the threshold to find some abnormal points (ie, confirm that the current access is abnormal). Neither of these methods fully exploits the value of the data, and the results are often not very accurate and effective.
发明内容Summary of the invention
本申请提供了一种异常方法检测方法,用以提高针对异常访问的检测效率以及准确性。该方法包括以下步骤:The present application provides an abnormal method detection method for improving detection efficiency and accuracy for abnormal access. The method includes the following steps:
获取待检测的访问请求的属性数据;Obtaining attribute data of the access request to be detected;
根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;Generating, according to the attribute data and the detection parameter, an abnormal probability corresponding to the access request, where the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断所述异常概率是否大于预设的异常阈值; Determining whether the abnormal probability is greater than a preset abnormal threshold;
若是,确认所述访问请求为异常访问请求;If yes, confirm that the access request is an abnormal access request;
若否,确认所述访问请求为正常访问请求。If not, confirm that the access request is a normal access request.
优选地,在获取待检测的访问请求的属性数据之前,还包括:Preferably, before acquiring the attribute data of the access request to be detected, the method further includes:
根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;
分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;Labeling different values for normal sample access requests and exception sample access requests;
根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;Generating original detection parameters according to values of labels corresponding to each sample access request and attribute data;
根据所述原始检测参数生成所述检测参数。The detection parameter is generated according to the original detection parameter.
优选地,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常,具体为:Preferably, the access frequency information includes a user identifier corresponding to the sample access request and an access time, and determining whether each of the sample access requests is abnormal according to the access frequency information of each sample access request, specifically:
根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user identifier, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
优选地,具体根据以下公式生成原始检测参数:Preferably, the original detection parameters are generated according to the following formula:
Figure PCTCN2017070798-appb-000001
Figure PCTCN2017070798-appb-000001
其中,为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,为各所述样本访问请求的标签的取值。Where is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, where N is the number of the sample access requests, and each sample access request is The value of the label.
优选地,所述异常阈值具体通过以下方式生成:Preferably, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比; Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
相应地,本申请还提出了一种异常访问检测设备,其特征在于,包括:Correspondingly, the present application further provides an abnormal access detecting device, which is characterized in that it comprises:
获取模块,获取待检测的访问请求的属性数据;Obtaining a module, acquiring attribute data of an access request to be detected;
第一生成模块,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断模块,判断所述异常概率是否大于预设的异常阈值;a determining module, determining whether the abnormal probability is greater than a preset abnormal threshold;
若是,所述判断模块确认所述访问请求为异常访问请求;If yes, the determining module confirms that the access request is an abnormal access request;
若否,所述判断模块确认所述访问请求为正常访问请求。If not, the determining module confirms that the access request is a normal access request.
优选地,还包括:Preferably, the method further comprises:
确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, by the access frequency information of each sample access request, determining whether each of the sample access requests is abnormal;
分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;
第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;
第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
优选地,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:Preferably, the access frequency information includes a user identifier ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:
根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值; Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
优选地,具体根据以下公式生成原始检测参数:Preferably, the original detection parameters are generated according to the following formula:
Figure PCTCN2017070798-appb-000002
Figure PCTCN2017070798-appb-000002
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
优选地,所述异常阈值具体通过以下方式生成:Preferably, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
由此可见,通过应用本申请的技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。It can be seen that, after applying the technical solution of the present application, after acquiring the attribute data of the access request to be detected, the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, because the detection parameter is according to the label corresponding to each sample access request. The value of the attribute and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request can be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
附图说明DRAWINGS
图1为现有技术中异常检测在服务响应上的应用示意图;1 is a schematic diagram of application of anomaly detection in a service response in the prior art;
图2为本申请提出的一种异常访问检测方法的流程示意图;2 is a schematic flowchart of an abnormal access detection method according to the present application;
图3为本申请具体实施例中基于时序特征提取的异常点检测流程图;3 is a flowchart of abnormal point detection based on time series feature extraction in a specific embodiment of the present application;
图4为本申请具体实施例中时序数据的特征提取示意图; 4 is a schematic diagram of feature extraction of time series data in a specific embodiment of the present application;
图5为本申请具体实施例中阈值计算流程示意图;FIG. 5 is a schematic diagram of a threshold calculation process in a specific embodiment of the present application; FIG.
图6为本申请提出的一种异常访问检测设备的结构示意图。FIG. 6 is a schematic structural diagram of an abnormal access detecting apparatus according to the present application.
具体实施方式detailed description
如背景技术所述,针对含时序申请数据的特点,进一步提高异常点检测的准确性以及有效性,是关系到系统准确有效运行的一个关键问题,也是本申请所要解决的技术问题。As described in the background art, further improving the accuracy and effectiveness of the abnormal point detection for the characteristics of the time-series application data is a key issue related to the accurate and efficient operation of the system, and is also a technical problem to be solved by the present application.
为解决上述技术问题,本申请提出了一种异常点检测方法,将用户统计数据和时序访问数据结合起来,通过时序数据按规则给出一个初步的标签并采用逻辑回归的方法对初步标签和用户属性进行训练来得出最终结果,从而使异常点判定的结果得以进一步提高。In order to solve the above technical problem, the present application proposes an abnormal point detection method, which combines user statistics and time-series access data, gives a preliminary label by time series data according to rules, and adopts a logistic regression method for preliminary labels and users. The attributes are trained to produce the final result, so that the result of the abnormal point determination is further improved.
如图2所示,为本申请提出的一种异常点检测方法的流程示意图,包括以下步骤:As shown in FIG. 2, a schematic flowchart of an abnormal point detecting method proposed by the present application includes the following steps:
S201获取待检测的访问请求的属性数据。S201 obtains attribute data of the access request to be detected.
在本申请的实施方式中,在模型以及检测参数生成之后,对于每一次新访问请求预测的过程中,即在判断访问请求是否异常的过程中,仅由该次访问请求的属性决定,异常检测问题转化成为分类问题,对于该分类问题,仅需获取待检测的访问请求的属性数据得到全部属性向量即可,也就是说,在此步骤中不需要再获取新访问请求的时序数据。In the embodiment of the present application, after the model and the detection parameter are generated, in the process of predicting each new access request, that is, in determining whether the access request is abnormal, only the attribute of the access request is determined, and the abnormality detection is performed. The problem is transformed into a classification problem. For the classification problem, only the attribute data of the access request to be detected is obtained to obtain all the attribute vectors, that is, the time series data of the new access request does not need to be acquired in this step.
因此本申请的实施方式在进行新访问请求异常预测之前,还需要通过对各所述样本访问请求对应的初步标签和用户属性进行逻辑回归训练,来获得分类模型并得到检测参数,进而可以实现将用户数据和时序访问数据结合起来的目的。本申请逻辑回归训练以及检测参数获取的方式具体如下:Therefore, before performing the new access request abnormality prediction, the embodiment of the present application needs to perform the logistic regression training on the preliminary label and the user attribute corresponding to each sample access request to obtain the classification model and obtain the detection parameters, thereby implementing the The purpose of combining user data with time-series access data. The manner of logistic regression training and detection parameter acquisition in this application is as follows:
a)根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;a) determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;
b)分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签; b) assigning different values to the normal sample access request and the abnormal sample access request respectively;
c)根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;c) generating original detection parameters according to the values of the labels corresponding to the sample access requests and the attribute data;
d)根据所述原始检测参数生成所述检测参数。d) generating the detection parameter according to the original detection parameter.
另外,通过上述步骤可以看出,如何准确判断样本访问请求是否异常是决定分类模型以及检测参数精度的重要参数,故本申请具体实施方式提出了确定各所述样本访问请求是否异常的具体步骤:In addition, it can be seen from the above steps that how to accurately determine whether the sample access request is abnormal is an important parameter for determining the classification model and the accuracy of the detection parameters. Therefore, the specific implementation of the present application proposes specific steps for determining whether each of the sample access requests is abnormal:
a)根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;a) obtaining, according to the user identification, a first number of sample access requests submitted by the same user within a time window prior to the access time, and obtaining a submission by the same user within the time window after the access time The second number of sample access requests;
b)判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;b) determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
c)若是,确认所述样本访问请求为异常样本访问请求;c) if yes, confirm that the sample access request is an abnormal sample access request;
d)若否,确认所述样本访问请求为正常样本访问请求。d) If no, confirm that the sample access request is a normal sample access request.
在本申请的实施方式中,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间。其中,用户标识是作为区分不同用户的凭证,只要保证不同用户对应有不同的用户标识即可,故可能会出现多种形式和内容。举例来说,用户标识可以为用户对应终端的MAC地址,也可以为用户在服务终端的注册ID。访问时间为由服务器记录的该访问请求的访问时间点。In an embodiment of the present application, the access frequency information includes a user identifier corresponding to the sample access request and an access time. The user identifier is used as a credential for distinguishing different users. As long as different users have different user identifiers, different forms and contents may appear. For example, the user identifier may be the MAC address of the user-compatible terminal or the registration ID of the user at the service terminal. The access time is the access time point of the access request recorded by the server.
需要说明的是,以上用户标识的具体实例仅为本申请优选实施例提出的示例,在此基础上还可以选择其他类型的用户标识,以使本申请适用于更多的应用领域,这些改进都属于本发明的保护范围。It should be noted that the specific examples of the foregoing user identifiers are only examples provided by the preferred embodiment of the present application, and other types of user identifiers may be selected on the basis of the foregoing, so that the application is applicable to more application fields, and the improvements are all applicable. It belongs to the scope of protection of the present invention.
需要说明的是,以上确定样本访问请求是否异常的方法仅为本申请具体实施例提出的一种优选方案,在保证具有一定确定精度的前提下,本领域技术人员也可以采用其他方式进行确定,这些都属于本申请的保护范围。It should be noted that the foregoing method for determining whether the sample access request is abnormal is only a preferred solution proposed by the specific embodiment of the present application, and those skilled in the art may also determine other methods by ensuring certain accuracy. These are all within the scope of this application.
S202根据所述属性数据以及检测参数生成与所述访问请求对应的异 常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成。S202 generates, according to the attribute data and the detection parameter, a difference corresponding to the access request. The probability of the detection is generated according to the value of the tag corresponding to each sample access request and the attribute data.
在本申请的实施方式中,异常阈值应该根据长期的经验进行调整,以达到一个合适的数值范围。如果异常阈值的取值较大,则会将部分异常点其判断为正常访问,故可能会漏掉很多异常点;相反的,如果异常阈值的取值过小,则会将部分正常点判断为异常点,影响正常用户的使用。因此如何通调整获得合适的异常阈值对以提高异常点检测的精度是至关重要的,故本申请通过以下方式来生成异常阈值:In embodiments of the present application, the anomaly threshold should be adjusted based on long-term experience to achieve a suitable range of values. If the value of the abnormal threshold is large, some abnormal points will be judged as normal access, so many abnormal points may be missed. Conversely, if the abnormal threshold is too small, some normal points will be judged as Abnormal points affect the use of normal users. Therefore, how to adjust the appropriate abnormal threshold value to improve the accuracy of the abnormal point detection is crucial, so the present application generates an abnormal threshold by:
a)获取异常样本访问请求占所有样本访问请求的百分比;a) get the percentage of exception sample access requests for all sample access requests;
b)根据所述检测参数获取与各所述样本访问请求对应的异常概率;b) acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
c)将各所述样本访问请求对应的异常概率从小至大进行排序处理;c) sorting the abnormal probability corresponding to each sample access request from small to large;
d)根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。d) determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
在本申请的具体实施例中,生成原始检测参数一个参考公式如下:In a specific embodiment of the present application, a reference formula for generating the original detection parameters is as follows:
Figure PCTCN2017070798-appb-000003
Figure PCTCN2017070798-appb-000003
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
通过上述生成原始检测参数的参考公式,计算结果是参数w就是所述原始检测参数。在后续过程中即可利用原始检测参数w对所有新访问请求进行计算,通过对计算结果与异常阈值进行判断,进而实现对新访问请求是否异常进行预测。Through the above reference formula for generating the original detection parameter, the calculation result is that the parameter w is the original detection parameter. In the subsequent process, all the new access requests can be calculated by using the original detection parameter w, and the calculation result and the abnormal threshold are judged, thereby realizing whether the new access request is abnormal.
需要说明的是,以上公式仅为本申请具体实施例提出的一种优选方案,然而,在保证计算结果能够作为原始检测参数的前提下,本领域技术人员也可以对该公式进行修改或者变形,这些都属于本申请的保护范围。It should be noted that the above formula is only a preferred solution proposed by the specific embodiment of the present application. However, those skilled in the art may modify or modify the formula under the premise of ensuring that the calculation result can be used as the original detection parameter. These are all within the scope of this application.
S203判断所述异常概率是否大于预设的异常阈值。 S203 determines whether the abnormal probability is greater than a preset abnormal threshold.
在本申请的实施方式中,在新访问请求到达时,通过分类模型来预测新访问请求是否为异常访问请求。具体的,首先通过将新访问请求的属性数据代入分类模型,可以得到该次访问为异常访问请求的概率,即异常概率,通过将该常访问请求的异常概率与预设的异常阈值进行比较,判断所述异常概率是否大于预设的异常阈值。若该新访问请求的异常概率大于异常阈值时,则判定为异常访问请求,即执行S204;若该新访问请求的异常概率小于异常阈值时,则判定为正常访问请求,即执行S205。In an embodiment of the present application, when a new access request arrives, it is predicted by the classification model whether the new access request is an abnormal access request. Specifically, by first substituting the attribute data of the new access request into the classification model, the probability that the access is an abnormal access request, that is, the abnormal probability, can be obtained by comparing the abnormal probability of the frequent access request with a preset abnormal threshold. It is determined whether the abnormal probability is greater than a preset abnormal threshold. If the abnormal probability of the new access request is greater than the abnormal threshold, it is determined as an abnormal access request, that is, S204 is executed; if the abnormal probability of the new access request is less than the abnormal threshold, it is determined as a normal access request, that is, S205 is executed.
S204若是,确认所述访问请求为异常访问请求。S204: If yes, confirm that the access request is an abnormal access request.
S205若否,确认所述访问请求为正常访问请求。S205, if no, confirm that the access request is a normal access request.
由此可见,通过应用以上技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。It can be seen that, by applying the above technical solution, after obtaining the attribute data of the access request to be detected, the abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is obtained according to the label corresponding to each sample access request. The value and the attribute data are generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, the access request can be confirmed as an abnormal access request based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
为了进一步阐述本申请的技术思想,现结合如图2所示的具体的应用场景,对本申请的技术方案进行说明。该基于时序特征提取的异常点检测流程通过时序序列分析、线性分类器训练和预测三个步骤实现了异常点的检测,这三个不同步骤的具体介绍如下:In order to further illustrate the technical idea of the present application, the technical solution of the present application will be described in conjunction with a specific application scenario as shown in FIG. 2 . The outlier detection process based on time series feature extraction realizes the detection of abnormal points through three steps: time series analysis, linear classifier training and prediction. The three different steps are described as follows:
(1)通过时序序列生成标签(1) Generate tags through time series
根据时序序列的特点,在训练集中,首先将所有用户访问数据按照时间顺序进行排序,排序完成之后,我们对比每次一访问的用户ID,设定一个滑动窗口向后移动,按序遍历每一次访问。对于每一次访问,如果在它的前半个窗口和后半个窗口中由相同用户提交的访问次数大于一定阈值则标记为异常点。那么异常点的标签的集合可记作: According to the characteristics of the time series, in the training set, all the user access data are first sorted in chronological order. After the sorting is completed, we compare the user ID of each access, set a sliding window to move backward, and traverse each time in order. access. For each visit, if the number of accesses submitted by the same user in its first half and the second half is greater than a certain threshold, it is marked as anomalous. Then the set of labels for the anomaly points can be written as:
Figure PCTCN2017070798-appb-000004
Figure PCTCN2017070798-appb-000004
其中,Vi表示第i个访问的标签,
Figure PCTCN2017070798-appb-000005
w为窗口大小参数,
Figure PCTCN2017070798-appb-000006
th是阈值参数,其示意图如图3所示。
Where Vi represents the label of the ith access,
Figure PCTCN2017070798-appb-000005
w is the window size parameter,
Figure PCTCN2017070798-appb-000006
t h is a threshold parameter, and its schematic diagram is shown in FIG. 3 .
(2)线性分类器训练(2) Linear classifier training
在所有访问标签生成完毕之后,对于每一次访问,我们认为该次访问是否是异常的,完全由该次访问的属性所决定,问题转化为一个分类问题,对于该分类问题来讲,不需要在使用时序的数据。根据每次访问的其他属性特征和标签,进行逻辑回归训练,得到一个分类模型。该模型的结果是参数w,满足:After all the access tags are generated, for each visit, we think that the visit is abnormal, completely determined by the attributes of the visit, and the problem is transformed into a classification problem. For the classification problem, it is not necessary Use time series data. According to other attribute characteristics and labels of each visit, logistic regression training is performed to obtain a classification model. The result of this model is the parameter w, which satisfies:
Figure PCTCN2017070798-appb-000007
Figure PCTCN2017070798-appb-000007
其中,argminw是一个参数w的取值函数,w的值使得右边求和项取最小值。N代表总的学习样本个数,Vi表示上一步的异常点标签。wT表示w的转置。在实际进行逻辑回归训练的时候,采用L-BFGS算法对其进行加速。Where argmin w is a function of the value of the parameter w, and the value of w makes the sum of the right side take the minimum value. N represents the total number of learning samples, and V i represents the abnormal point label of the previous step. w T represents the transposition of w. In the actual logistic regression training, the L-BFGS algorithm is used to accelerate it.
(3)新访问预测(3) New visit prediction
当有新的访问到达时,能通过分类模型来预测新的访问是否是异常点。将新的访问数据代入分类模型后,能得到该次访问是异常点的概率,设定一个阈值,当该访问为异常的概率大于该阈值时,则判定为异常点,所有异常新访问的集合表示为:When a new visit arrives, the classification model can be used to predict whether the new access is anomalous. After the new access data is substituted into the classification model, the probability that the access is an abnormal point can be obtained, and a threshold is set. When the probability that the access is abnormal is greater than the threshold, the abnormal point is determined, and the set of all abnormal new accesses is determined. Expressed as:
{Vi|wTxi>pt} {V i |w T x i >p t }
其中Vi表示第i次访问,xi表示该次访问的所有属性向量,pt为判断异常点的阈值。在这里,阈值应该根据长期的经验进行调整,直到一个合适的数字。如果该阈值取值太大,则会漏掉很多异常点,将其判为正常访问;如果该阈值取值太小,则会将很多正常点判定为异常点,影响正常用户使用。因此调节一个合适的阈值是非常必要的,在这里可以根据百分比的方式来设置,首先找到异常点占总体训练数据的百分比,然后将训练数据带入模型按模型计算出概率,接着对该概率进行排序,找到在异常点占总体百分比位置的概率,将其设为阈值。具体示意图如图5所示。Where V i represents the ith access, x i represents all attribute vectors of the visit, and p t is the threshold for determining the abnormal point. Here, the threshold should be adjusted based on long-term experience until a suitable number. If the threshold value is too large, many abnormal points will be missed and judged as normal access. If the threshold value is too small, many normal points will be determined as abnormal points, which will affect the normal users. Therefore, it is necessary to adjust a suitable threshold. Here, it can be set according to the percentage. First, find the percentage of the abnormal point as the total training data, then bring the training data into the model to calculate the probability according to the model, and then proceed to the probability. Sort, find the probability that the anomaly point is the percentage of the population, and set it as the threshold. The specific schematic diagram is shown in Figure 5.
上述应用场景的技术方案,通过样本数据的时序特征为分类模型提供训练标签,再根据各个样本访问请求对应的标签的取值以及属性数据生成检测参数;在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。The technical solution of the foregoing application scenario provides a training label for the classification model by using the time series feature of the sample data, and then generates detection parameters according to the value of the label corresponding to each sample access request and the attribute data; after acquiring the attribute data of the access request to be detected The abnormal probability corresponding to the access request is generated according to the attribute data and the detection parameter. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, whether the access request is an abnormal access request may be confirmed based on the size of the two. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
为达到以上技术目的,本申请还提出了一种异常访问检测设备,如图6所示,包括以下模块:To achieve the above technical purpose, the present application also proposes an abnormal access detecting device, as shown in FIG. 6, comprising the following modules:
获取模块610,获取待检测的访问请求的属性数据;The obtaining module 610 is configured to obtain attribute data of the access request to be detected.
第一生成模块620,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module 620 generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
判断模块630,判断所述异常概率是否大于预设的异常阈值;The determining module 630 determines whether the abnormal probability is greater than a preset abnormal threshold;
若是,所述判断模块630确认所述访问请求为异常访问请求;If yes, the determining module 630 confirms that the access request is an abnormal access request;
若否,所述判断模块630确认所述访问请求为正常访问请求。If not, the determining module 630 confirms that the access request is a normal access request.
在具体的应用场景中,还包括:In specific application scenarios, it also includes:
确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本 访问请求是否异常;Determining a module, determining each sample according to access frequency information of each sample access request Whether the access request is abnormal;
分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;
第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;
第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
在具体的应用场景中,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:In a specific application scenario, the access frequency information includes a user identifier ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:
根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
在具体的应用场景中,具体根据以下公式生成原始检测参数:In a specific application scenario, the original detection parameters are generated according to the following formula:
Figure PCTCN2017070798-appb-000008
Figure PCTCN2017070798-appb-000008
其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
在具体的应用场景中,所述异常阈值具体通过以下方式生成:In a specific application scenario, the abnormal threshold is specifically generated by:
获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。 Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
通过应用本申请的技术方案,在获取待检测的访问请求的属性数据之后,根据属性数据以及检测参数生成与访问请求对应的异常概率,由于检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成,因此在判断异常概率是否大于预设的异常阈值之后,即可基于二者的大小确认确认访问请求是否为异常访问请求。从而能够在海量的访问请求中准确地针对异常访问请求进行识别处理,保证了网络的稳定性与安全性。After the attribute data of the access request to be detected is obtained, the abnormality probability corresponding to the access request is generated according to the attribute data and the detection parameter, and the detection parameter is based on the value of the label corresponding to each sample access request and The attribute data is generated. Therefore, after determining whether the abnormal probability is greater than a preset abnormal threshold, it can be confirmed based on the size of the two whether the access request is an abnormal access request. Therefore, the abnormal access request can be accurately identified and processed in a large number of access requests, thereby ensuring the stability and security of the network.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the present application can be implemented by hardware, or by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.), including several The instructions are for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various implementation scenarios of the present application.
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。A person skilled in the art can understand that the drawings are only a schematic diagram of a preferred implementation scenario, and the modules or processes in the drawings are not necessarily required to implement the application.
本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。A person skilled in the art may understand that the modules in the apparatus in the implementation scenario may be distributed in the apparatus for implementing the scenario according to the implementation scenario description, or may be correspondingly changed in one or more devices different from the implementation scenario. The modules of the above implementation scenarios may be combined into one module, or may be further split into multiple sub-modules.
上述本申请序号仅仅为了描述,不代表实施场景的优劣。The above serial numbers are only for the description, and do not represent the advantages and disadvantages of the implementation scenario.
以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。 The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any changes that can be made by those skilled in the art should fall within the protection scope of the present application.

Claims (10)

  1. 一种异常访问检测方法,其特征在于,包括:An abnormal access detection method, comprising:
    获取待检测的访问请求的属性数据;Obtaining attribute data of the access request to be detected;
    根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;Generating, according to the attribute data and the detection parameter, an abnormal probability corresponding to the access request, where the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
    判断所述异常概率是否大于预设的异常阈值;Determining whether the abnormal probability is greater than a preset abnormal threshold;
    若是,确认所述访问请求为异常访问请求;If yes, confirm that the access request is an abnormal access request;
    若否,确认所述访问请求为正常访问请求。If not, confirm that the access request is a normal access request.
  2. 如权利要求1所述的方法,其特征在于,在获取待检测的访问请求的属性数据之前,还包括:The method of claim 1, wherein before acquiring the attribute data of the access request to be detected, the method further comprises:
    根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常;Determining, according to the access frequency information of each sample access request, whether each of the sample access requests is abnormal;
    分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;Labeling different values for normal sample access requests and exception sample access requests;
    根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;Generating original detection parameters according to values of labels corresponding to each sample access request and attribute data;
    根据所述原始检测参数生成所述检测参数。The detection parameter is generated according to the original detection parameter.
  3. 如权利要求2所述的方法,其特征在于,所述访问频次信息包括所述样本访问请求对应的用户标识以及访问时间,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常,具体为:The method according to claim 2, wherein the access frequency information includes a user identifier corresponding to the sample access request and an access time, and each sample access request is determined according to access frequency information of each sample access request. Whether it is abnormal, specifically:
    根据所述用户标识获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user identifier, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
    判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
    若是,确认所述样本访问请求为异常样本访问请求; If yes, confirm that the sample access request is an abnormal sample access request;
    若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
  4. 如权利要求2所述的方法,其特征在于,具体根据以下公式生成原始检测参数:The method of claim 2, wherein the original detection parameters are generated according to the following formula:
    Figure PCTCN2017070798-appb-100001
    Figure PCTCN2017070798-appb-100001
    其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述异常阈值具体通过以下方式生成:The method according to any one of claims 1 to 4, wherein the abnormality threshold is specifically generated by:
    获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
    根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
    将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
    根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
  6. 一种异常访问检测设备,其特征在于,包括:An abnormal access detecting device, comprising:
    获取模块,获取待检测的访问请求的属性数据;Obtaining a module, acquiring attribute data of an access request to be detected;
    第一生成模块,根据所述属性数据以及检测参数生成与所述访问请求对应的异常概率,所述检测参数根据各个样本访问请求对应的标签的取值以及属性数据生成;The first generation module generates an abnormal probability corresponding to the access request according to the attribute data and the detection parameter, and the detection parameter is generated according to the value of the label corresponding to each sample access request and the attribute data;
    判断模块,判断所述异常概率是否大于预设的异常阈值;a determining module, determining whether the abnormal probability is greater than a preset abnormal threshold;
    若是,所述判断模块确认所述访问请求为异常访问请求;If yes, the determining module confirms that the access request is an abnormal access request;
    若否,所述判断模块确认所述访问请求为正常访问请求。If not, the determining module confirms that the access request is a normal access request.
  7. 如权利要求6所述的设备,其特征在于,还包括:The device of claim 6 further comprising:
    确定模块,根据各所述样本访问请求的访问频次信息确定各所述样本访问请求是否异常; Determining, by the access frequency information of each sample access request, determining whether each of the sample access requests is abnormal;
    分配模块,分别为正常样本访问请求以及异常样本访问请求赋予不同取值的标签;An allocation module that assigns different values to the normal sample access request and the abnormal sample access request;
    第二生成模块,根据各个样本访问请求对应的标签的取值以及属性数据生成原始检测参数;The second generation module generates an original detection parameter according to the value of the label corresponding to each sample access request and the attribute data;
    第三生成模块,根据所述原始检测参数生成所述检测参数。And a third generation module, configured to generate the detection parameter according to the original detection parameter.
  8. 如权利要求7所述的设备,其特征在于,所述访问频次信息包括所述样本访问请求对应的用户标识ID以及访问时间,所述确定模块具体用于:The device according to claim 7, wherein the access frequency information includes a user identification ID and an access time corresponding to the sample access request, and the determining module is specifically configured to:
    根据所述用户ID获取在所述访问时间之前的时间窗口内由相同用户提交的样本访问请求的第一数量,以及获取在所述访问时间之后的所述时间窗口内由相同用户提交的样本访问请求的第二数量;Acquiring, according to the user ID, a first number of sample access requests submitted by the same user within a time window before the access time, and acquiring sample access submitted by the same user within the time window after the access time The second quantity requested;
    判断所述第一数量与所述第二数量之和是否大于预设的次数阈值;Determining whether the sum of the first quantity and the second quantity is greater than a preset number of times threshold;
    若是,确认所述样本访问请求为异常样本访问请求;If yes, confirm that the sample access request is an abnormal sample access request;
    若否,确认所述样本访问请求为正常样本访问请求。If not, confirm that the sample access request is a normal sample access request.
  9. 如权利要求7所述的设备,其特征在于,具体根据以下公式生成原始检测参数:The apparatus according to claim 7, wherein the original detection parameter is generated according to the following formula:
    Figure PCTCN2017070798-appb-100002
    Figure PCTCN2017070798-appb-100002
    其中,argminw为所述原始检测参数的取值函数,w为所述原始检测参数,且w为求和项对应的最小值,N为所述样本访问请求的个数,Vi为各所述样本访问请求的标签的取值。Where argmin w is the value function of the original detection parameter, w is the original detection parameter, and w is the minimum value corresponding to the summation item, N is the number of sample access requests, and V i is each The value of the label of the sample access request.
  10. 如权利要求6-10任一项所述的设备,其特征在于,所述异常阈值具体通过以下方式生成:The device according to any one of claims 6 to 10, wherein the abnormality threshold is specifically generated by:
    获取异常样本访问请求占所有样本访问请求的百分比;Get the percentage of exception sample access requests for all sample access requests;
    根据所述检测参数获取与各所述样本访问请求对应的异常概率;Acquiring an abnormal probability corresponding to each of the sample access requests according to the detection parameter;
    将各所述样本访问请求对应的异常概率从小至大进行排序处理;Sorting the abnormal probability corresponding to each sample access request from small to large;
    根据所述排序结果确定与所述百分比对应的异常概率,并将所述异常概率作为所述异常阈值。 Determining an abnormal probability corresponding to the percentage according to the sorting result, and using the abnormal probability as the abnormal threshold.
PCT/CN2017/070798 2016-01-19 2017-01-10 Method and apparatus for abnormal access detection WO2017124942A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610035487.6A CN106982196B (en) 2016-01-19 2016-01-19 Abnormal access detection method and equipment
CN201610035487.6 2016-01-19

Publications (1)

Publication Number Publication Date
WO2017124942A1 true WO2017124942A1 (en) 2017-07-27

Family

ID=59341062

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/070798 WO2017124942A1 (en) 2016-01-19 2017-01-10 Method and apparatus for abnormal access detection

Country Status (3)

Country Link
CN (1) CN106982196B (en)
TW (1) TW201730766A (en)
WO (1) WO2017124942A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020007367A1 (en) * 2018-07-06 2020-01-09 北京白山耘科技有限公司 Method for inspecting abnormal web access, device, medium, and equipment
CN111476610A (en) * 2020-04-16 2020-07-31 腾讯科技(深圳)有限公司 Information detection method and device and computer readable storage medium
CN112001596A (en) * 2020-07-27 2020-11-27 北京科技大学 Method and system for detecting abnormal point of time series data
WO2021017284A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Cortex-learning-based anomaly detection method and apparatus, terminal device, and storage medium
CN112511538A (en) * 2020-11-30 2021-03-16 杭州安恒信息技术股份有限公司 Network security detection method based on time sequence and related components
CN114500004A (en) * 2022-01-05 2022-05-13 北京理工大学 Anomaly detection method based on conditional diffusion probability generation model
CN116016274A (en) * 2022-12-29 2023-04-25 南京融军建科技有限公司 Abnormal communication detection method and system
CN117424764A (en) * 2023-12-19 2024-01-19 中关村科学城城市大脑股份有限公司 System resource access request information processing method and device, electronic equipment and medium

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107659566B (en) * 2017-09-20 2021-01-19 深圳市创梦天地科技股份有限公司 Method and device for determining identification frequency of abnormal access of server and server
JP6698956B2 (en) * 2017-10-11 2020-05-27 三菱電機株式会社 Sample data generation device, sample data generation method, and sample data generation program
CN107678928B (en) * 2017-10-31 2021-06-01 聚好看科技股份有限公司 Application program processing method and server
CN107819631B (en) * 2017-11-23 2021-03-02 东软集团股份有限公司 Equipment anomaly detection method, device and equipment
CN108200008A (en) * 2017-12-05 2018-06-22 阿里巴巴集团控股有限公司 The recognition methods and device that abnormal data accesses
CN108268632A (en) * 2018-01-16 2018-07-10 中国人民解放军海军航空大学 Abnormal information data identifies machine learning method
CN108681542A (en) * 2018-02-12 2018-10-19 阿里巴巴集团控股有限公司 A kind of method and device of abnormality detection
CN108449342B (en) * 2018-03-20 2020-11-27 北京云站科技有限公司 Malicious request detection method and device
CN109145030B (en) * 2018-06-26 2022-07-22 创新先进技术有限公司 Abnormal data access detection method and device
CN108667855B (en) * 2018-07-19 2021-12-03 百度在线网络技术(北京)有限公司 Network flow abnormity monitoring method and device, electronic equipment and storage medium
CN109194539B (en) * 2018-08-13 2022-01-28 中国平安人寿保险股份有限公司 Data management and control method and device, computer equipment and storage medium
CN109543404B (en) * 2018-12-03 2019-10-25 北京芯盾时代科技有限公司 A kind of methods of risk assessment and device of access behavior
CN109766244A (en) * 2019-01-04 2019-05-17 中国银行股份有限公司 A kind of distributed system CPU method for detecting abnormality, device and storage medium
CN109873812B (en) * 2019-01-28 2020-06-23 腾讯科技(深圳)有限公司 Anomaly detection method and device and computer equipment
CN111835696B (en) * 2019-04-23 2023-05-09 阿里巴巴集团控股有限公司 Method and device for detecting abnormal request individuals
CN112148763A (en) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 Unsupervised data anomaly detection method and device and storage medium
CN110417744B (en) * 2019-06-28 2021-12-24 平安科技(深圳)有限公司 Security determination method and device for network access
CN110351299B (en) * 2019-07-25 2022-04-22 新华三信息安全技术有限公司 Network connection detection method and device
CN110675228B (en) * 2019-09-27 2021-05-28 支付宝(杭州)信息技术有限公司 User ticket buying behavior detection method and device
CN111177513B (en) * 2019-12-31 2023-10-31 北京百度网讯科技有限公司 Determination method and device of abnormal access address, electronic equipment and storage medium
CN113076349B (en) * 2020-01-06 2024-06-11 阿里巴巴集团控股有限公司 Data anomaly detection method, device and system and electronic equipment
CN115277439B (en) * 2021-04-30 2023-09-19 中国移动通信集团有限公司 Network service detection method and device, electronic equipment and storage medium
CN113282433B (en) * 2021-06-10 2023-04-28 天翼云科技有限公司 Cluster anomaly detection method, device and related equipment
CN113360348B (en) * 2021-06-30 2022-09-09 北京字节跳动网络技术有限公司 Abnormal request processing method and device, electronic equipment and storage medium
TWI789075B (en) * 2021-10-26 2023-01-01 中華電信股份有限公司 Electronic device and method for detecting abnormal execution of application program
CN117579400B (en) * 2024-01-17 2024-03-29 国网四川省电力公司电力科学研究院 Industrial control system network safety monitoring method and system based on neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009211725A (en) * 2009-06-18 2009-09-17 Toshiba Corp Abnormal data detecting system, abnormal data detecting method, abnormal data detecting program
CN105187242A (en) * 2015-08-20 2015-12-23 中国人民解放军国防科学技术大学 Method for detecting abnormal user behaviours mined on the basis of variable-length sequence mode

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8683591B2 (en) * 2010-11-18 2014-03-25 Nant Holdings Ip, Llc Vector-based anomaly detection
CN103198711B (en) * 2013-03-21 2014-12-17 东南大学 Vehicle regulating and controlling method of lowering probability of traffic accidents of different severity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009211725A (en) * 2009-06-18 2009-09-17 Toshiba Corp Abnormal data detecting system, abnormal data detecting method, abnormal data detecting program
CN105187242A (en) * 2015-08-20 2015-12-23 中国人民解放军国防科学技术大学 Method for detecting abnormal user behaviours mined on the basis of variable-length sequence mode

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DING, JIE ET AL.: "An Anomaly Detection System on Big Data", NATURAL SCIENCE JOURNAL OF HAINAN UNIVERSITY, vol. 33, no. 1, 31 March 2015 (2015-03-31), pages 24 - 27 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020007367A1 (en) * 2018-07-06 2020-01-09 北京白山耘科技有限公司 Method for inspecting abnormal web access, device, medium, and equipment
WO2021017284A1 (en) * 2019-07-30 2021-02-04 平安科技(深圳)有限公司 Cortex-learning-based anomaly detection method and apparatus, terminal device, and storage medium
CN111476610B (en) * 2020-04-16 2023-06-09 腾讯科技(深圳)有限公司 Information detection method, device and computer readable storage medium
CN111476610A (en) * 2020-04-16 2020-07-31 腾讯科技(深圳)有限公司 Information detection method and device and computer readable storage medium
CN112001596B (en) * 2020-07-27 2023-10-31 北京科技大学 Method and system for detecting abnormal points of time sequence data
CN112001596A (en) * 2020-07-27 2020-11-27 北京科技大学 Method and system for detecting abnormal point of time series data
CN112511538B (en) * 2020-11-30 2022-10-18 杭州安恒信息技术股份有限公司 Network security detection method based on time sequence and related components
CN112511538A (en) * 2020-11-30 2021-03-16 杭州安恒信息技术股份有限公司 Network security detection method based on time sequence and related components
CN114500004A (en) * 2022-01-05 2022-05-13 北京理工大学 Anomaly detection method based on conditional diffusion probability generation model
CN116016274A (en) * 2022-12-29 2023-04-25 南京融军建科技有限公司 Abnormal communication detection method and system
CN116016274B (en) * 2022-12-29 2023-11-24 天航长鹰(江苏)科技有限公司 Abnormal communication detection method and system
CN117424764A (en) * 2023-12-19 2024-01-19 中关村科学城城市大脑股份有限公司 System resource access request information processing method and device, electronic equipment and medium
CN117424764B (en) * 2023-12-19 2024-02-23 中关村科学城城市大脑股份有限公司 System resource access request information processing method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN106982196B (en) 2020-07-31
TW201730766A (en) 2017-09-01
CN106982196A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
WO2017124942A1 (en) Method and apparatus for abnormal access detection
US10686829B2 (en) Identifying changes in use of user credentials
Sukumar et al. Network intrusion detection using improved genetic k-means algorithm
US10140576B2 (en) Computer-implemented system and method for detecting anomalies using sample-based rule identification
Shafeeq et al. Dynamic clustering of data with modified k-means algorithm
US20170063893A1 (en) Learning detector of malicious network traffic from weak labels
US10685008B1 (en) Feature embeddings with relative locality for fast profiling of users on streaming data
Cao et al. Machine learning to detect anomalies in web log analysis
CN111709028B (en) Network security state evaluation and attack prediction method
US11516240B2 (en) Detection of anomalies associated with fraudulent access to a service platform
US20210385253A1 (en) Cluster detection and elimination in security environments
Grill et al. Learning combination of anomaly detectors for security domain
US20180032917A1 (en) Hierarchical classifiers
CN110855648B (en) Early warning control method and device for network attack
CN112926045B (en) Group control equipment identification method based on logistic regression model
Powell et al. A cross-comparison of feature selection algorithms on multiple cyber security data-sets.
CN103530312A (en) User identification method and system using multifaceted footprints
Jordaney et al. Misleading metrics: On evaluating machine learning for malware with confidence
US10372702B2 (en) Methods and apparatus for detecting anomalies in electronic data
CN110097120B (en) Network flow data classification method, equipment and computer storage medium
US20220327394A1 (en) Learning support apparatus, learning support methods, and computer-readable recording medium
CN110414229B (en) Operation command detection method, device, computer equipment and storage medium
Iskhakov et al. Enhanced user authentication algorithm based on behavioral analytics in Web-based cyberphysical systems
Gana et al. Machine learning classification algorithms for phishing detection: A comparative appraisal and analysis
CN111224919A (en) DDOS (distributed denial of service) identification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17740973

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17740973

Country of ref document: EP

Kind code of ref document: A1