WO2021073114A1 - 基于统计的异常流量监测方法、装置、设备及存储介质 - Google Patents

基于统计的异常流量监测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021073114A1
WO2021073114A1 PCT/CN2020/093392 CN2020093392W WO2021073114A1 WO 2021073114 A1 WO2021073114 A1 WO 2021073114A1 CN 2020093392 W CN2020093392 W CN 2020093392W WO 2021073114 A1 WO2021073114 A1 WO 2021073114A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
gaussian distribution
user access
statistical
multivariate gaussian
Prior art date
Application number
PCT/CN2020/093392
Other languages
English (en)
French (fr)
Inventor
刘玉洁
杨冬艳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021073114A1 publication Critical patent/WO2021073114A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • This application relates to the field of network security technology, and in particular to a method, device, equipment, and storage medium for monitoring abnormal traffic based on statistics.
  • abnormal network traffic monitoring has always been an important part of the information security field.
  • Abnormal network traffic refers to irregular and significant changes in traffic in the network.
  • problems such as high-frequency operation, abnormal time access, abnormal file or abnormal access object behind it. Regardless of the type of problem, it may face a decline in service quality that affects normal user access and network security issues.
  • abnormal traffic monitoring is usually implemented based on machine learning.
  • the inventor realizes that this not only requires building a corresponding technical system and deploying a monitoring model, but also requires professional algorithm technicians to perform operation and maintenance.
  • the implementation is somewhat complicated and costly. high.
  • the main purpose of this application is to provide a statistically-based method, device, equipment, and storage medium for abnormal traffic monitoring, aiming to solve the technical problems of cumbersome deployment and high implementation cost of the existing network abnormal traffic monitoring.
  • This application provides a method for monitoring abnormal traffic based on statistics, including:
  • the multivariate Gaussian distribution density function calculate the Gaussian distribution probability values corresponding to the statistical characteristics of the user access log records corresponding to the current network traffic in each time dimension;
  • the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located, it is determined that the current network traffic is abnormal traffic.
  • This application also provides a statistics-based abnormal traffic monitoring device, including:
  • the collection module is used to collect user access log records within a preset time period based on preset embedding points
  • the standardized processing module is used to clean and transform the original data in the user access log records to generate standard user access data that meets statistical requirements;
  • the statistical module is used to slide according to the time window corresponding to the day, week, and month, and to calculate the distribution of the statistical characteristics corresponding to the standard user access data in different time dimensions;
  • a mapping module configured to map the distributions of the statistical features in different time dimensions into corresponding multivariate Gaussian distributions and respectively perform parameter estimation to obtain corresponding multivariate Gaussian distribution density functions
  • the calculation module is configured to calculate the Gaussian distribution probability values corresponding to the statistical characteristics of the user access log records corresponding to the current network traffic in each time dimension according to the multivariate Gaussian distribution density function;
  • the judgment module is used to judge whether the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located; if the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located, then determine The current network traffic is abnormal traffic.
  • the application also provides an abnormal flow monitoring device based on statistics.
  • the abnormal flow monitoring device includes a memory, a processor, and an abnormal flow monitoring program stored in the memory and running on the processor. When the flow monitoring program is executed by the processor, the steps of the abnormal flow monitoring method described in any one of the above are implemented.
  • the present application also provides a computer-readable storage medium, characterized in that an abnormal flow monitoring program is stored on the computer-readable storage medium, and when the abnormal flow monitoring program is executed by a processor, the above The steps of the abnormal flow monitoring method.
  • This application performs abnormal traffic detection based on a statistical probability analysis method, fits the statistical feature distribution corresponding to the user's access record into a multivariate Gaussian distribution, and implements abnormal traffic detection based on the characteristics of the multivariate Gaussian distribution.
  • This embodiment only needs to go through certain data preprocessing and data standardization and fit the data into a Gaussian distribution to easily perform traffic alarms.
  • the method is short and precise, does not involve complex algorithms, is easy to deploy and implement, and the threshold can be adjusted with traffic.
  • the real-time characteristics of the data are dynamically adjusted, which not only avoids the shortcomings of rule-based alarms but not flexible enough, but also improves the problem of high implementation costs based on complex algorithms such as machine learning.
  • FIG. 1 is a schematic structural diagram of an operating environment of an abnormal flow monitoring device involved in a scheme of an embodiment of this application;
  • FIG. 2 is a schematic flowchart of an embodiment of a method for monitoring abnormal traffic based on statistics according to this application;
  • Fig. 3 is a detailed flowchart of an embodiment of step S40 in Fig. 2;
  • FIG. 4 is a detailed flowchart of an embodiment of step S20 in FIG. 2;
  • FIG. 5 is a schematic diagram of functional modules of an embodiment of an abnormal traffic monitoring device based on statistics according to the present application.
  • This application provides an abnormal traffic monitoring device based on statistics.
  • FIG. 1 is a schematic structural diagram of the operating environment of the abnormal flow monitoring device involved in the solution of the embodiment of the application.
  • the abnormal traffic monitoring device includes: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the hardware structure of the abnormal flow monitoring device shown in FIG. 1 does not constitute a limitation on the abnormal flow monitoring device, and may include more or less components than shown in the figure, or a combination of certain components, Or different component arrangements.
  • the memory 1005 which is a computer-readable storage medium, may include an operating system, a network communication module, a user interface module, and an abnormal traffic monitoring program based on statistics.
  • the operating system is a program that manages and controls abnormal flow monitoring equipment and software resources, and supports the operation of abnormal flow monitoring programs based on statistics and other software and/or programs.
  • the network interface 1004 is mainly used to access the network; the user interface 1003 is mainly used to detect confirmation commands and edit commands, etc., and the processor 1001 can be used to call the memory 1005
  • the abnormal flow monitoring program based on statistics is stored in, and the operations of the following embodiments of the method for monitoring abnormal flow based on statistics are executed.
  • FIG. 2 is a schematic flowchart of an embodiment of a method for monitoring abnormal traffic based on statistics according to this application.
  • the abnormal flow monitoring method includes the following steps:
  • Step S10 Collect user access log records within a preset time period based on the preset buried point
  • network traffic has certain characteristics, and this characteristic conforms to normal distribution.
  • the characteristics of network traffic include user access time, user stay time, user end access time, access abnormalities, etc. Therefore, in order to obtain the characteristics of the network traffic, in this embodiment, preset buried points, such as buried points in a log database, are used to collect user access log record data within a preset time period. In order to fit the characteristics of network traffic more realistically, it is therefore preferable to collect user access log records within a period of at least one month or more.
  • the user access log record includes at least: user ID, user IP address, server IP address, user access start time, user access stay time, user access end time, and access exception code.
  • Step S20 Perform cleaning and transformation processing on the original data in the user access log records to generate standard user access data that meets statistical requirements;
  • the original data in the collected user access log records are cleaned and transformed in advance, so as to generate standard user access data that meets the statistical requirements.
  • This embodiment is not limited to the processing method of cleaning and transformation.
  • Data cleaning refers to filtering data that does not meet the requirements. There are three main categories: incomplete data, incorrect data, and repeated data. Among them, incomplete data, that is, some information that should be missing, such data needs to be eliminated or completed through interpolation processing. Wrong data refers to incorrect format, such as incorrect field format and incorrect business meaning of the data. Duplicate data, such data needs to be eliminated.
  • Data conversion is mainly to convert inconsistent data, such as unifying the same type of data in different business systems.
  • the code of the same supplier in the A system is XX0001
  • the code in the B system is YY0001.
  • Such data needs to be unified Convert to the same code.
  • it also includes the calculation of business rules. For example, different business systems have different business rules and use different data indicators, and these indicators need to be calculated according to the corresponding business rules before they can be used.
  • the standard user access data obtained by cleaning and transforming the original data in the user access log record is valid data and can be used for subsequent statistical processing.
  • Step S30 sliding according to the time window corresponding to the day, week, and month respectively, to calculate the distribution of the statistical characteristics corresponding to the standard user access data in different time dimensions;
  • multiple time dimensions are selected to count traffic characteristics, that is, statistical characteristics corresponding to standard user access data are calculated on different time dimensions.
  • the statistical characteristics include at least: user visits, abnormal visits, abnormal type, visit time, and whether it is a new or subtracted user; the statistics for each of the above-mentioned characteristics are based on different characteristics.
  • the time dimension (day, week, month) is used for statistical calculation to obtain the distribution of each statistical feature in different time dimensions. For example, in a day, the visit time is concentrated at: 9-12 am, and 19 pm to 23 pm; during the week, the visit volume is small from Monday to Friday, and the visit volume is large on Saturday and Sunday.
  • the new user is defined by comparing whether there is an existing user one day, one week, or one month ago. If it exists, it is an old user, otherwise it is a new user.
  • the statistical feature uses the following format: the combined key formed by the user's IP address + the server's IP address is used as the user ID to include user access volume, abnormal access volume, abnormal type, access time, Whether it is adding/decreasing users and other content as specific features.
  • Step S40 mapping the distributions of the statistical features in different time dimensions into corresponding multivariate Gaussian distributions and respectively performing parameter estimation to obtain corresponding multivariate Gaussian distribution density functions;
  • the characteristics of the network traffic in order to fit the network traffic, it is necessary to fit the characteristics of the network traffic to the corresponding multivariate Gaussian distribution density function.
  • the statistical characteristics corresponding to the same standard user access data are in the three time dimensions of day, week, and month.
  • the above distributions are respectively mapped to the corresponding multivariate Gaussian distributions, and then the parameters of each multivariate Gaussian distribution are estimated to determine the characteristic parameter values in the density function of each multivariate Gaussian distribution.
  • the statistical characteristics in this embodiment need to be preset.
  • the design can be designed to add more distinguishable statistical features to make the statistical feature distribution more in line with the Gaussian distribution, such as combining multiple associated features into a new feature.
  • Step S50 calculate the Gaussian distribution probability values corresponding to the statistical characteristics of the user access log records corresponding to the current network traffic in each time dimension;
  • the calculated multivariate Gaussian distribution density function is as follows:
  • the multivariate Gaussian distribution density function can be used to fit the Gaussian distribution of user access log records in different time dimensions. Therefore, the Gaussian distribution probability value corresponding to the current user access log record in each time dimension can be calculated in real time through the multivariate Gaussian distribution density function. Since the statistical feature distribution corresponding to the user access log record has Gaussian distribution characteristics, it can be passed The user access log records corresponding Gaussian distribution probability values for abnormal traffic detection.
  • Step S60 judging whether the Gaussian distribution probability value is less than a preset alarm threshold in the time dimension where the current network traffic is located;
  • Step S70 If the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located, it is determined that the current network traffic is abnormal traffic.
  • the alarm thresholds corresponding to different time dimensions are set according to the Gaussian distribution characteristics and actual experience. For example, if the monitoring time dimension is one day, the alarm threshold is set to five thousandths, that is, the probability of the Gaussian distribution is lower than one thousandths. The fifth data is used as alarm data; if the monitoring time dimension is one week, the alarm threshold is set to three thousandths, that is, data with a Gaussian distribution probability value lower than three thousandths is used as alarm data. If the calculated Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located, it is determined that the current network traffic is abnormal traffic.
  • the statistical characteristics of user access log records corresponding to the network traffic can be plotted in Gaussian distribution curves in each time dimension, and in each Gaussian
  • the contour line corresponding to the alarm threshold is identified on the distribution curve; according to the Gaussian distribution curve corresponding to the time dimension of the current network traffic, it is judged whether the Gaussian distribution probability value corresponding to the current network traffic is less than the predicted value in the time dimension of the current network traffic.
  • Set the alarm threshold if the flow data appears within the contour line, it is determined that the Gaussian distribution probability value is greater than the preset alarm threshold in the time dimension where the current network traffic is located, which is regarded as normal flow data. Outside the high line, it is determined that the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located, that is, it is regarded as abnormal traffic data.
  • This embodiment performs abnormal traffic detection based on the method of statistical probability analysis, fits the statistical feature distribution corresponding to the user's access record to a multivariate Gaussian distribution, and implements abnormal traffic detection based on the characteristics of the multivariate Gaussian distribution.
  • This embodiment only needs to go through certain data preprocessing and data standardization and fit the data into a Gaussian distribution to easily perform traffic alarms.
  • the method is short and precise, does not involve complex algorithms, is easy to deploy and implement, and the threshold can be adjusted with traffic.
  • the real-time characteristics of the data are dynamically adjusted, which not only avoids the shortcomings of rule-based alarms but not flexible enough, but also improves the problem of high implementation costs based on complex algorithms such as machine learning.
  • this embodiment separately deploys user access log record data in different dimensions of day, week, and month to achieve the purpose of dynamic alarms in different time dimensions according to different alarm thresholds, thereby reducing errors caused by different time dimensions.
  • this embodiment due to the consideration of multiple time dimensions, it can flexibly respond to real-time alarms of abnormal traffic in different time periods and different business scenarios.
  • FIG. 3 is a detailed flowchart of an embodiment of step S40 in FIG. 2. Based on the foregoing embodiment, in this embodiment, the foregoing step S40 further includes:
  • Step S401 performing normalization processing or data transformation processing on the corresponding statistical features in different time dimensions, respectively, so as to map the distribution of the statistical features in different time dimensions into corresponding multivariate Gaussian distributions;
  • ⁇ and ⁇ are the mean and variance of the original data set, and S is the normalized data;
  • ⁇ and ⁇ are the mean and variance of the original data set, respectively, and S is the normalized data; through logarithmic transformation of the original data, the value of the original dense interval is as dispersed as possible, and the value of the original scattered interval Try to aggregate as much as possible to make the data distribution close to the normal distribution and make the data independent of the average value of the distribution.
  • the data exhibits Gaussian distribution characteristics through normalization processing or data transformation processing, and then the data can be fitted into a Gaussian distribution to facilitate abnormal traffic monitoring.
  • Step S402 taking the data corresponding to each multivariate Gaussian distribution as a sample, and using maximum likelihood estimation to solve the respective mean estimator and covariance matrix estimator corresponding to each multivariate Gaussian distribution;
  • ⁇ and ⁇ are the sample mean vector and sample covariance matrix corresponding to the P-ary Gaussian distribution
  • Xi represents the i-th statistical feature sample vector
  • n represents a total of n statistical feature sample vectors
  • L represents the likelihood function
  • f represents Probability density function.
  • the mean estimator of the multivariate Gaussian distribution is the sample mean vector
  • the covariance matrix estimator is the sample covariance matrix
  • Step S403 Generate a multivariate Gaussian distribution density function corresponding to each multivariate Gaussian distribution based on the mean estimator and the covariance matrix estimate corresponding to each multivariate Gaussian distribution.
  • the multivariate Gaussian distribution density function corresponding to each multivariate Gaussian distribution can be generated based on the respective sample data of each multivariate Gaussian distribution. For example, based on the sample data within a day, the sample mean vector and the sample covariance matrix corresponding to the multivariate Gaussian distribution are calculated, so as to generate the multivariate Gaussian distribution density function corresponding to the time dimension of the day.
  • the user access log record data is separately deployed in different dimensions of day, week, and month to achieve the purpose of dynamic alarms in different time dimensions according to different alarm thresholds, thereby reducing false alarms due to different time dimensions.
  • the user access log record data is separately deployed in different dimensions of day, week, and month to achieve the purpose of dynamic alarms in different time dimensions according to different alarm thresholds, thereby reducing false alarms due to different time dimensions.
  • it can flexibly respond to real-time alarms of abnormal traffic in different time periods and different business scenarios.
  • FIG. 4 is a detailed flowchart of an embodiment of step S20 in FIG. 2. Based on the foregoing embodiment, in this embodiment, the foregoing step S20 further includes:
  • Step S201 detecting whether there are missing values in the original data in the user access log records
  • the user access log uses multiple fields to record a variety of information, such as user ID, user and server IP addresses, user access time, user stay time, user end access time, access exceptions, access status, Anomaly type code and anomaly type description, etc. If there is a missing value in the corresponding field of a record, it is determined that there is a missing value in the record.
  • Step S202 If there are missing values, calculate the missing value ratio corresponding to each field, and perform missing value cleaning according to the missing value ratio and the importance of the field.
  • the missing value cleaning includes: deleting missing value fields and using interpolation to complete Missing value
  • the proportion of missing values corresponding to each field for example, if there are 100 user access log records, if a field corresponds to 10 If the record has missing values, the proportion of missing values corresponding to the field is 10%.
  • different fields have different degrees of importance in actual application scenarios.
  • the user's IP address is more important than the server's IP address
  • the user's access time is more important than the user's stay time.
  • Different levels of importance of the fields use different cleaning strategies. For example, if the proportion of missing values is high and the importance of the field is low, the missing value field is directly deleted, and if the proportion of missing values is low and the importance of the field is high, then interpolation is used to complete the missing value.
  • Step S203 Sort the original data in the user access log records, and calculate the similarity between each sorted record and adjacent records;
  • Step S204 if the similarity between different records exceeds a preset threshold, it is determined as a duplicate record and redundant data is deleted;
  • the duplicate records are further deduplicated. Specifically, all the original data in the user access log records are sorted first, such as sorting based on the numerical value of a certain field, such as sorting based on access time, Then calculate the similarity between each sorted record and adjacent records, such as using field matching algorithm, standardized Euclidean distance, etc. to calculate the similarity between different records. If the similarity between different records exceeds a preset threshold (for example, 90%), it is determined as a duplicate record and redundant data is deleted.
  • a preset threshold for example, 90%
  • Step S205 Perform transformation processing on the cleaned data to generate user access standard data that meets statistical requirements.
  • the transformation processing includes one or more of data type transformation, logarithmic transformation, and data discretization.
  • the cleaned data is further transformed and processed to generate user access standard data that meets the statistical requirements.
  • continuous data such as time
  • intervals can be used to analyze the characteristics of the data.
  • continuous data can be discretized with equal width, such as dividing time into morning, noon, afternoon, and evening. ,late at night.
  • the data is cleaned and transformed to obtain standard data that meets the statistical requirements, which not only facilitates statistical analysis, but also further improves the accuracy of traffic monitoring.
  • the application also provides a statistics-based abnormal flow monitoring device.
  • FIG. 5 is a schematic diagram of functional modules of an embodiment of an abnormal traffic monitoring device based on statistics according to the present application.
  • the abnormal flow monitoring device includes:
  • the collection module 10 is used to collect user access log records within a preset time period based on a preset embedding point;
  • the standardized processing module 20 is used to clean and transform the original data in the user access log records to generate standard user access data that meets statistical requirements;
  • the statistics module 30 is configured to slide according to the time windows corresponding to days, weeks, and months, and to calculate the distribution of statistical characteristics corresponding to standard user access data in different time dimensions;
  • the mapping module 40 is configured to map the distribution of the statistical features in different time dimensions into corresponding multivariate Gaussian distributions and respectively perform parameter estimation to obtain the corresponding multivariate Gaussian distribution density function;
  • the calculation module 50 is configured to calculate the Gaussian distribution probability values corresponding to the statistical characteristics of the user access log records corresponding to the current network traffic in each time dimension according to the multivariate Gaussian distribution density function;
  • the judging module 60 is configured to judge whether the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension of the current network traffic; if the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension of the current network traffic, then Determine that the current network traffic is abnormal traffic.
  • mapping module 40 includes:
  • the preprocessing unit is configured to perform normalization processing or data transformation processing on the corresponding statistical features in different time dimensions, so as to map the distribution of the statistical features in different time dimensions to corresponding multivariate Gaussian distributions;
  • the estimation unit is used to take the data corresponding to each multivariate Gaussian distribution as a sample, and use maximum likelihood estimation to solve the mean estimator and covariance matrix estimator corresponding to each multivariate Gaussian distribution;
  • the generating unit is configured to generate the multivariate Gaussian distribution density function corresponding to each multivariate Gaussian distribution based on the mean estimator and the covariance matrix estimator corresponding to each multivariate Gaussian distribution.
  • the original data set corresponding to the statistical feature is normalized by the following formula:
  • ⁇ and ⁇ are the mean and variance of the original data set, and S is the normalized data;
  • the logarithmic transformation process is performed on the original data set corresponding to the statistical feature by the following formula:
  • x is the original data
  • y is the data after logarithmic transformation
  • is set to 1
  • c is set to the maximum value of the transformed data.
  • the following function is used as the likelihood function corresponding to the P-ary Gaussian distribution:
  • ⁇ , ⁇ are P Gaussian distribution corresponding to the sample mean vector and the sample covariance matrix
  • X i represents the i-th eigenvalues of samples vector
  • n indicates there are n eigenvalues of samples vector
  • L represents a likelihood function
  • f Represents the probability density function.
  • the standardized processing module 20 includes:
  • the cleaning unit is used to detect whether there are missing values in the original data in the user access log records; if there are missing values, calculate the missing value ratio corresponding to each field, and perform missing value cleaning according to the missing value ratio and the importance of the field ,
  • the missing value cleaning includes: deleting missing value fields and using interpolation to complete missing values;
  • the sorting unit is used to sort the original data in the user access log records and calculate the similarity between each sorted record and adjacent records; if the similarity between different records exceeds a preset threshold, It is judged to be a duplicate record and redundant data is deleted;
  • the transformation unit is used to transform the cleaned data to generate user access standard data that meets statistical requirements.
  • the transformation processing includes one or more of data type transformation, logarithmic transformation, and data discretization.
  • the user access log record includes at least: user ID, user IP address, server IP address, user access start time, user access stay time, user access end time, access exception code;
  • the statistical features include at least :User visits, abnormal visits, abnormal type, visit time, whether it is a new/subtracted user;
  • the statistical feature adopts the following format: the user's IP address and the server's IP address are the user IDs, and the specific features are at least the amount of user visits, the amount of abnormal visits, the type of abnormality, the visit time, and whether the user is newly added or subtracted.
  • the judgment module 60 is specifically configured to:
  • the Gaussian distribution curve corresponding to the current network traffic in the time dimension determine whether the Gaussian distribution probability value corresponding to the current network traffic is less than the preset alarm threshold in the time dimension of the current network traffic;
  • the Gaussian distribution probability value is less than the preset alarm threshold in the time dimension where the current network traffic is located.
  • This embodiment performs abnormal traffic detection based on the method of statistical probability analysis, fits the statistical feature distribution corresponding to the user's access record to a multivariate Gaussian distribution, and implements abnormal traffic detection based on the characteristics of the multivariate Gaussian distribution.
  • This embodiment only needs to go through certain data preprocessing and data standardization and fit the data into a Gaussian distribution to easily perform traffic alarms.
  • the method is short and precise, does not involve complex algorithms, is easy to deploy and implement, and the threshold can be adjusted with traffic.
  • the real-time characteristics of the data are dynamically adjusted, which not only avoids the shortcomings of rule-based alarms but not flexible enough, but also improves the problem of high implementation costs based on complex algorithms such as machine learning.
  • this embodiment separately deploys user access log record data in different dimensions of day, week, and month to achieve the purpose of dynamic alarms in different time dimensions according to different alarm thresholds, thereby reducing errors caused by different time dimensions.
  • this embodiment due to the consideration of multiple time dimensions, it can flexibly respond to real-time alarms of abnormal traffic in different time periods and different business scenarios.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • an abnormal flow monitoring program is stored on the computer-readable storage medium.
  • the steps of the statistical-based abnormal flow monitoring method described in any of the above embodiments are implemented.
  • the method implemented when the abnormal flow monitoring program is executed by the processor can refer to the various embodiments of the abnormal flow monitoring method based on statistics of the present application, so it will not be repeated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请涉及大数据领域,基于统计的异常流量监测方法,包括:收集预设时间段内的用户访问日志记录并进行清洗与变换处理,生成标准用户访问数据;统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;将统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计;计算当前网络流量对应的统计特征在各时间维度内分别对应的高斯分布概率值;判断高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;若是,则判定当前网络流量为异常流量。本申请还公开了一种基于统计的异常流量监测装置、设备及存储介质。本申请易于部署且实施成本低,并可灵活应对不同时间周期不同业务场景的异常流量实时告警。

Description

基于统计的异常流量监测方法、装置、设备及存储介质
本申请要求于2019年10月18日提交中国专利局、申请号为201910991150.6,发明名称为“基于统计的异常流量监测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及网络安全技术领域,尤其涉及一种基于统计的异常流量监测方法、装置、设备及存储介质。
背景技术
随着信息时代的到来,网络异常流量监测一直是信息安全领域的重要一环。网络异常流量指网络中流量不规则地显著变化。针对网络流量在短时间内可能发生的突变异常,其背后可能存在高频操作、异常时段访问、文件异常或者访问对象异常等问题。无论是哪类问题都可能面临服务质量下降影响正常用户访问以及网络安全问题。
目前异常流量监测通常都是基于机器学习方式来进行实现,发明人意识到这不仅需要搭建相应的技术系统以及部署监测模型,还需要专业的算法技术人员进行运维,实现起来有些复杂且成本较高。
技术问题
本申请的主要目的在于提供一种基于统计的异常流量监测方法、装置、设备及存储介质,旨在解决现有网络异常流量监测部署繁琐且实现成本高的技术问题。
技术解决方案
本申请提供一种基于统计的异常流量监测方法,包括:
基于预置埋点,收集预设时间段内的用户访问日志记录;
对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
本申请还提供一种基于统计的异常流量监测装置,包括:
收集模块,用于基于预置埋点,收集预设时间段内的用户访问日志记录;
标准化处理模块,用于对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
统计模块,用于分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
映射模块,用于将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
计算模块,用于根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问 日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
判断模块,用于判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
本申请还提供一种基于统计的异常流量监测设备,所述异常流量监测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的异常流量监测程序,所述异常流量监测程序被所述处理器执行时实现如上述任一项所述的异常流量监测方法的步骤。
本申请还提供一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有异常流量监测程序,所述异常流量监测程序被处理器执行时实现如上述任一项所述的异常流量监测方法的步骤。
有益效果
本申请基于统计概率分析的方法进行异常流量检测,将用户访问记录对应的统计特征分布拟合成多元高斯分布,并基于多元高斯分布的特性实现异常流量检测。本实施例只需要经过一定的数据预处理和数据标准化并将数据拟合成高斯分布,即可方便地进行流量告警,该方法短小精悍,不涉及复杂的算法,易于部署实施且阈值可随着流量数据的实时特征进行动态调整,既避免了基于规则告警而不够灵活的缺点,也改善了基于机器学习等复杂算法实施成本高的问题。
附图说明
图1为本申请实施例方案涉及的异常流量监测设备运行环境的结构示意图;
图2为本申请基于统计的异常流量监测方法一实施例的流程示意图;
图3为图2中步骤S40一实施例的细化流程示意图;
图4为图2中步骤S20一实施例的细化流程示意图;
图5为本申请基于统计的异常流量监测装置一实施例的功能模块示意图。
本发明的最佳实施方式
本申请提供一种基于统计的异常流量监测设备。
参照图1,图1为本申请实施例方案涉及的异常流量监测设备运行环境的结构示意图。
如图1所示,该异常流量监测设备包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的异常流量监测设备的硬件结构并不构成对异常流量监测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及基于统计的异常流量监测程序。其中,操作系统是管理和控制异常流量监测设备和软件资源的程序,支持基于统计的异常流量监测程序以及其它软件和/或程序的运行。
在图1所示的异常流量监测设备的硬件结构中,网络接口1004主要用于接入网络;用户接口1003主要用于侦测确认指令和编辑指令等,而处理器1001可以用于调用存储器1005中存储的基于统计的异常流量监测程序,并执行以下基于统计的异常流量监测方法的各实施例的操作。
基于上述异常流量监测设备硬件结构,提出本申请基于统计的异常流量监测方法的各 个实施例。
参照图2,图2为本申请基于统计的异常流量监测方法一实施例的流程示意图。本实施例中,所述异常流量监测方法包括以下步骤:
步骤S10,基于预置埋点,收集预设时间段内的用户访问日志记录;
通常,网络流量都具备一定的特征,并且该特征符合正态分布,网络流量的特征具体表现包括用户访问时间、用户停留时间、用户结束访问时间、访问异常情况等。因此,为获得网络流量的特征,本实施例中,通过预置埋点,比如在日志数据库中埋点,从而收集预设时间段内的用户访问日志记录数据。为更真实拟合网络流量的特征,因此优选收集至少一个月以上时间段内的用户访问日志记录。
可选的,在一具体实施例中,用户访问日志记录至少包括:用户ID、用户IP地址、服务方IP地址、用户访问开始时间、用户访问停留时间、用户访问结束时间、访问异常代码。
步骤S20,对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
本实施例中,为便于后续处理,因此预先对收集到的用户访问日志记录中的原始数据进行清洗与变换处理,从而生成符合统计要求的标准用户访问数据。本实施例对于清洗与变换的处理方式不限。
数据清洗是指过滤那些不符合要求的数据,主要有不完整的数据、错误的数据和重复的数据三大类。其中,不完整的数据,也即一些应该有的信息缺失,此类数据需要剔除或者通过插值处理进行补全。错误的数据是指与格式不正确,比如字段格式不正确、数据对应的业务意义不正确。重复的数据,此类数据需要剔除。
数据转换主要是对不一致的数据进行转换,比如将不同业务系统的相同类型的数据统一,比如同一个供应商在A系统的编码是XX0001,而在B系统中编码是YY0001,这样的数据需要统一转换成同一个编码。另外,还包括进行业务规则的计算,比如不同业务系统有不同的业务规则,使用不同的数据指标,而这些指标需要根据对应的业务规则进行计算后方能使用。
本实施例中,通过对用户访问日志记录中的原始数据进行清洗与变换处理后所得到的标准用户访问数据方为有效数据,可用于后续的统计处理。
步骤S30,分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
本实施例中,为便于更灵活地监控异常流量,选用多种时间维度来统计流量特征,也即统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布。
可选的,在一具体实施例中,统计特征至少包括:用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户等特征;对上述各个特征的统计是依照不同的时间维度(天、周、月)进行统计计算,从而得到各统计特征分别在不同时间维度上的分布情况。比如在一天中,访问时间集中在:上午9-12点、下午19点到23点;一周中,周一到周五访问量较小,而周六周日访问量较大。其中,新增用户是对照一天、一周或一月前是否是已存在用户进行定义的。若存在,则为老用户,否则为新用户。
可选的,在一实施例中,统计特征使用如下格式:以用户IP地址+服务方IP地址形成的组合key作为用户ID,以包含有用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户等内容作为具体特征。
步骤S40,将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
本实施例中,为拟合网络流量,因此需要将网络流量的特征拟合成对应的多元高斯分布密度函数,具体将同一标准用户访问数据对应的统计特征在天、周、月三个时间维度上的分布分别映射成对应的多元高斯分布,然后再对各多元高斯分布进行参数估计,确定各多元高斯分布密度函数中的特征参数值。
本实施例中,优选采用归一化处理或数据变换处理,以归纳统一样本的统计分布性,进而使数据呈现出高斯分布特征。需要进一步说明的是,本实施例中的统计特征需要预先设定。在设置统计特征时,若选取的统计特征不符合高斯分布,则可通过统计数据的直方图分布,尝试使用多种函数变换数据,直至直方图分布特性符合高斯分布。另外,若当前选取的统计特征区分度不够明显,可则设计增加更有区分度的统计特征,以使得统计特征分布更符合高斯分布,比如将多个具有关联的特征组合成一个新的特征。
本实施例中,在将统计特征分布映射成对应的多元高斯分布后,为实现数据拟合,还需进一步进行参数估计,从而计算出各多元高斯分布密度函数的特征参数值,比如特征的均值、特征的协方差、概率分布的分位数等。
步骤S50,根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
本实施例中,计算得到的多元高斯分布密度函数如下所示:
Figure PCTCN2020093392-appb-000001
其中,
Figure PCTCN2020093392-appb-000002
表示维度为D的向量,
Figure PCTCN2020093392-appb-000003
则是这些向量的平均值,∑表示所有向量
Figure PCTCN2020093392-appb-000004
的协方差矩阵。
本实施例中,使用多元高斯分布密度函数可以拟合用户访问日志记录在不同时间维度内的高斯分布情况。因此,可通过多元高斯分布密度函数,实时计算出当前用户访问日志记录在各时间维度内分别对应的高斯分布概率值,由于用户访问日志记录对应的统计特征分布具有高斯分布特征,因此,可通过用户访问日志记录对应的高斯分布概率值进行异常流量检测。
步骤S60,判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
步骤S70,若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
本实施例中,根据高斯分布特性与实际经验设置不同时间维度对应的告警阈值,例如,若监测时间维度为一天,则告警阈值设为千分之五,也即将高斯分布概率值低于千分之五的数据作为告警数据;若监测时间维度为一周,则告警阈值设为千分之三,也即将高斯分布概率值低于千分之三的数据作为告警数据。若计算得出的高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
可选的,在具体一实施例中,为便于对网络流量进行可视化监控,可绘制出网络流量对应的用户访问日志记录的统计特征分别在各时间维度内的高斯分布曲线图,并在各高斯分布曲线图上标识出告警阈值对应的等高线;根据当前网络流量所在时间维度内对应的高斯分布曲线图,判断当前网络流量对应的高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;若流量数据出现在等高线以内,则判定所述高斯分布概率值大于当前网络流量所在时间维度内的预置告警阈值,也即视为正常流量数据,若流量数据出现在等高线以外,则判定所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,也即视为异常流量数据。
本实施例基于统计概率分析的方法进行异常流量检测,将用户访问记录对应的统计特征分布拟合成多元高斯分布,并基于多元高斯分布的特性实现异常流量检测。本实施例只需要经过一定的数据预处理和数据标准化并将数据拟合成高斯分布,即可方便地进行流量告警,该方法短小精悍,不涉及复杂的算法,易于部署实施且阈值可随着流量数据的实时特征进行动态调整,既避免了基于规则告警而不够灵活的缺点,也改善了基于机器学习等复杂算法实施成本高的问题。
此外,本实施例将用户访问日志记录数据按照天、周、月不同维度进行分别部署,以达到在不同时间维度上按照不同告警阈值进行动态告警的目的,从而降低了由于时间维度不同导致的误报,同时由于考虑了多个时间维度,进而可灵活应对不同时间周期不同业务场景的异常流量实时告警。
参照图3,图3为图2中步骤S40一实施例的细化流程示意图。基于上述实施例,本实施例中,上述步骤S40进一步包括:
步骤S401,对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
本实施例中,考虑到数据可能并不是直接满足高斯分布的,因此需要对数据进行一些处理,而使得数据满足高斯分布。本实施例优选通过归一化处理或者数据变换处理,进而将统计得到的统计特征分布映射成对应的多元高斯分布。
(1)通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
Figure PCTCN2020093392-appb-000005
其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
(2)通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
y=log c(1+λx);
其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;通过对原数据进行对数变换,进而使原本密集区间的值尽可能的分散,原本分散区间的值尽量的聚合,可使数据分布接近于正态分布并使得数据与分布的平均值无关。
本实施例中,通过归一化处理或数据变换处理而使数据呈现出高斯分布特征,进而可将数据拟合成高斯分布以便于进行异常流量监控。
步骤S402,以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
假设X (1),X (2),...,X (n)为统计特征分别映射为P元高斯分布N p(μ,∑)所对应的样本,构造多元高斯分布对应的似然函数,即:
Figure PCTCN2020093392-appb-000006
其中,μ、∑分别为P元高斯分布对应的样本均值向量和样本协差阵,Xi表示第i个统计特征样本向量,n表示共有n个统计特征样本向量,L表示似然函数,f表示概率密度函数。
为求出使上式取极值的μ和∑的估计量,因此将上述公式两边取对数,得到如下等式:
Figure PCTCN2020093392-appb-000007
由于对数函数是一个严格单调的增函数,因此可通过求lnL(μ,∑)的极大值而得到μ和∑的估计量。因此,对上述对数似然函数分别对μ和∑求偏导数,得到如下等式:
Figure PCTCN2020093392-appb-000008
计算上述等式,得到μ和∑的极大似然估计量:
Figure PCTCN2020093392-appb-000009
由上可知,多元高斯分布的均值估计量为样本均值向量,协差阵估计量为样本协差阵。
步骤S403,基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
本实施例中,在获得了各多元高斯分布的参数估计量后,即可基于各多元高斯分布各自的样本数据,生成各多元高斯分布各自对应的多元高斯分布密度函数。例如,基于一天时间内的样本数据,计算出对应多元高斯分布的样本均值向量与样本协差阵,从而生成天时间维度所对应的多元高斯分布密度函数。
本实施例将用户访问日志记录数据按照天、周、月不同维度进行分别部署,以达到在不同时间维度上按照不同告警阈值进行动态告警的目的,从而降低了由于时间维度不同导致的误报,同时由于考虑了多个时间维度,进而可灵活应对不同时间周期不同业务场景的异常流量实时告警。
参照图4,图4为图2中步骤S20一实施例的细化流程示意图。基于上述实施例,本实施例中,上述步骤S20进一步包括:
步骤S201,检测所述用户访问日志记录中的原始数据是否存在缺失值;
本实施例中,用户访问日志使用了多个字段记录了多种信息,比如用户ID、用户以及服务方IP地址、用户访问时间、用户停留时间、用户结束访问时间、访问异常情况、访问状态、异常类型code以及异常类型说明等,若某条记录对应字段存在数值缺失,则确定该条记录中存在缺失值。
步骤S202,若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
本实施例中,若用户访问日志记录中某个或某些个字段存在缺失值,则每个字段对应的缺失值比例,例如,有100条用户访问日志记录,若某个字段对应有10条记录存在缺失值,则该字段对应的缺失值比例为10%。
本实施例中,在实际应用场景中不同字段的重要程度不同。比如,用户IP地址较服务方IP地址更重要,用户访问时间较用户停留时间更重要。字段的不同重要程度所使用的清洗策略不同。例如,若缺失值比例高且字段重要程度低,则直接删除缺失值字段,而若缺失值比例低且字段重要程度高,则使用插值法补全缺失值。
步骤S203,对所述用户访问日志记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
步骤S204,若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
本实施例中,还进一步对重复的记录进行去重处理,具体为先对用户访问日志记录中的所有原始数据进行排序,比如基于某个字段的数值大小进行排序,比如基于访问时间进行排序,然后计算排序后的每条记录与相邻记录之间的相似度,比如采用字段匹配算法、标准化欧氏距离等方式计算不同记录之间的相似度。若不同记录之间的相似度超过预置阈值(比如90%),则判定为重复记录并删除多余的数据。
步骤S205,对清洗后的数据进行变换处理,生成符合统计要求的用户访问标准数据,所述变换处理包括:数据类型变换、对数变换、数据离散化中的一种或多种。
本实施例中,为使得数据更便于进行统计,因此进一步对清洗后的数据进行变换处理,进而生成符合统计要求的用户访问标准数据。
A、对数据类型进行变换,比如浮点型数据变换为整数型数据,以便于计算;
B、对原数据进行对数变换,进而使原本密集区间的值尽可能的分散,原本分散区间的值尽量的聚合,可使数据分布接近于正态分布并使得数据与分布的平均值无关。
C、对连续型的数据进行离散化处理,比如时间,从而可使用区间来分析数据的特征,例如对连续型的数据进行等宽离散化处理,比如将时间划分为早上、中午、下午、晚上、深夜。
本实施例中,通过对数据进行清洗和变换处理,从而获得符合统计要求的标准数据,不仅便于统计分析,同时还能进一步提升流量监测的准确性。
本申请还提供一种基于统计的异常流量监测装置。
参照图5,图5为本申请基于统计的异常流量监测装置一实施例的功能模块示意图。本实施例中,异常流量监测装置包括:
收集模块10,用于基于预置埋点,收集预设时间段内的用户访问日志记录;
标准化处理模块20,用于对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
统计模块30,用于分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
映射模块40,用于将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
计算模块50,用于根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
判断模块60,用于判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
可选地,所述映射模块40包括:
预处理单元,用于对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
估算单元,用于以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
生成单元,用于基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
可选地,通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
Figure PCTCN2020093392-appb-000010
其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
y=log c(1+λx);
其中,x为原始数据,y为对数变换后的数据,λ设置为1,c设置为变换数据的最大值。
可选地,采用如下函数作为P元高斯分布对应的似然函数:
Figure PCTCN2020093392-appb-000011
其中,μ、∑分别为P元高斯分布对应的样本均值向量和样本协差阵,X i表示第i个统计特征样本向量,n表示共有n个统计特征样本向量,L表示似然函数,f表示概率密度函数。
可选地,所述标准化处理模块20包括:
清洗单元,用于检测所述用户访问日志记录中的原始数据是否存在缺失值;若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
排序单元,用于对所述用户访问日志记录中的原始数据进行排序,并计算排序后的每 条记录与相邻记录之间的相似度;若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
变换单元,用于对清洗后的数据进行变换处理,生成符合统计要求的用户访问标准数据,所述变换处理包括:数据类型变换、对数变换、数据离散化中的一种或多种。
可选地,所述用户访问日志记录至少包括:用户ID、用户IP地址、服务方IP地址、用户访问开始时间、用户访问停留时间、用户访问结束时间、访问异常代码;所述统计特征至少包括:用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户;
所述统计特征采用如下格式:以用户IP地址和服务方IP地址为用户ID,并至少以用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户为具体特征。
可选地,所述判断模块60具体用于:
绘制网络流量对应的用户访问日志记录的统计特征分别在各时间维度内的高斯分布曲线图,并在各高斯分布曲线图上标识出各预置告警阈值对应的等高线;
根据当前网络流量所在时间维度内对应的高斯分布曲线图,判断当前网络流量对应的高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
若流量数据出现在等高线以外,则判定所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值。
基于与上述本申请异常流量监测方法相同的实施例说明内容,因此本实施例对异常流量监测装置的实施例内容不做过多赘述。
本实施例基于统计概率分析的方法进行异常流量检测,将用户访问记录对应的统计特征分布拟合成多元高斯分布,并基于多元高斯分布的特性实现异常流量检测。本实施例只需要经过一定的数据预处理和数据标准化并将数据拟合成高斯分布,即可方便地进行流量告警,该方法短小精悍,不涉及复杂的算法,易于部署实施且阈值可随着流量数据的实时特征进行动态调整,既避免了基于规则告警而不够灵活的缺点,也改善了基于机器学习等复杂算法实施成本高的问题。
此外,本实施例将用户访问日志记录数据按照天、周、月不同维度进行分别部署,以达到在不同时间维度上按照不同告警阈值进行动态告警的目的,从而降低了由于时间维度不同导致的误报,同时由于考虑了多个时间维度,进而可灵活应对不同时间周期不同业务场景的异常流量实时告警。
本申请还提供一种计算机可读存储介质,计算机可读存储介质可以是非易失性,也可以是易失性。
本实施例中,计算机可读存储介质上存储有异常流量监测程序,所述异常流量监测程序被处理器执行时实现如上述任一项实施例中所述的基于统计的异常流量监测方法的步骤。其中,异常流量监测程序被处理器执行时所实现的方法可参照本申请基于统计的异常流量监测方法的各个实施例,因此不再过多赘述。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。

Claims (20)

  1. 一种基于统计的异常流量监测方法,其中,所述异常流量监测方法包括以下步骤:
    基于预置埋点,收集预设时间段内的用户访问日志记录;
    对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
    分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
    将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
    根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
    判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
    若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
  2. 如权利要求1所述的基于统计的异常流量监测方法,其中,所述将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数包括:
    对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
    以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
    基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
  3. 如权利要求2所述的基于统计的异常流量监测方法,其中,通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
    Figure PCTCN2020093392-appb-100001
    其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
    通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
    y=log c(1+λx);
    其中,x为原始数据,y为对数变换后的数据,λ设置为1,c设置为变换数据的最大值。
  4. 如权利要求2或3所述的基于统计的异常流量监测方法,其中,采用如下函数作为P元高斯分布对应的似然函数:
    Figure PCTCN2020093392-appb-100002
    其中,μ、∑分别为P元高斯分布对应的样本均值向量和样本协差阵,X i表示第i个统计特征样本向量,n表示共有n个统计特征样本向量,L表示似然函数,f表示概率密度函数。
  5. 如权利要求1所述的基于统计的异常流量监测方法,其中,所述对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据包括:
    检测所述用户访问日志记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问日志记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行变换处理,生成符合统计要求的用户访问标准数据,所述变换处理包括:数据类型变换、对数变换、数据离散化中的一种或多种。
  6. 如权利要求1所述的基于统计的异常流量监测方法,其中,所述用户访问日志记录至少包括:用户ID、用户IP地址、服务方IP地址、用户访问开始时间、用户访问停留时间、用户访问结束时间、访问异常代码;所述统计特征至少包括:用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户;
    所述统计特征采用如下格式:以用户IP地址和服务方IP地址为用户ID,并至少以用户访问量、异常访问量、异常类型、访问时间、是否是新增/减用户为具体特征。
  7. 如权利要求1、5或6所述的基于统计的异常流量监测方法,其中,所述判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值包括:
    绘制网络流量对应的用户访问日志记录的统计特征分别在各时间维度内的高斯分布曲线图,并在各高斯分布曲线图上标识出各预置告警阈值对应的等高线;
    根据当前网络流量所在时间维度内对应的高斯分布曲线图,判断当前网络流量对应的高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
    若流量数据出现在等高线以外,则判定所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值。
  8. 一种基于统计的异常流量监测装置,其中,所述异常流量监测装置包括:
    收集模块,用于基于预置埋点,收集预设时间段内的用户访问日志记录;
    标准化处理模块,用于对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
    统计模块,用于分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
    映射模块,用于将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
    计算模块,用于根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
    判断模块,用于判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
  9. 如权利要求8所述的基于统计的异常流量监测装置,其中,所述映射模块包括:
    预处理单元,用于对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
    估算单元,用于以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
    生成单元,用于基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
  10. 如权利要求9所述的基于统计的异常流量监测装置,通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
    Figure PCTCN2020093392-appb-100003
    其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
    通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
    y=log c(1+λx);
    其中,x为原始数据,y为对数变换后的数据,λ设置为1,c设置为变换数据的最大值。
  11. 一种基于统计的异常流量监测设备,所述异常流量监测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的异常流量监测程序,所述异常流量监测程序被所述处理器执行时实现基于统计的异常流量监测方法,其中,
    异常流量监测方法包括:
    基于预置埋点,收集预设时间段内的用户访问日志记录;
    对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
    分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
    将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
    根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
    判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
    若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
  12. 如权利要求11所述的基于统计的异常流量监测设备,所述将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数包括:
    对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
    以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
    基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
  13. 如权利要求12所述的基于统计的异常流量监测设备,通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
    Figure PCTCN2020093392-appb-100004
    其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
    通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
    y=log c(1+λx);
    其中,x为原始数据,y为对数变换后的数据,λ设置为1,c设置为变换数据的最大值。
  14. 如权利要求12或13所述的基于统计的异常流量监测设备,采用如下函数作为P元高斯分布对应的似然函数:
    Figure PCTCN2020093392-appb-100005
    其中,μ、∑分别为P元高斯分布对应的样本均值向量和样本协差阵,X i表示第i个统计特征样本向量,n表示共有n个统计特征样本向量,L表示似然函数,f表示概率密度函数。
  15. 如权利要求11所述的基于统计的异常流量监测设备,所述对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据包括:
    检测所述用户访问日志记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问日志记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行变换处理,生成符合统计要求的用户访问标准数据,所述变换处理包括:数据类型变换、对数变换、数据离散化中的一种或多种。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有异常流量监测程序,所述异常流量监测程序被处理器执行时实现基于统计的异常流量监测方法,其中,异常流量监测方法包括:
    基于预置埋点,收集预设时间段内的用户访问日志记录;
    对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据;
    分别按照天、周、月对应的时间窗口进行滑动,统计标准用户访问数据对应的统计特征分别在不同时间维度上的分布;
    将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数;
    根据所述多元高斯分布密度函数,计算当前网络流量对应的用户访问日志记录的统计特征在各时间维度内分别对应的高斯分布概率值;
    判断所述高斯分布概率值是否小于当前网络流量所在时间维度内的预置告警阈值;
    若所述高斯分布概率值小于当前网络流量所在时间维度内的预置告警阈值,则判定当前网络流量为异常流量。
  17. 如权利要求16所述的计算机可读存储介质,所述将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布并分别进行参数估计,得到对应的多元高斯分布密度函数包括:
    对不同时间维度内对应的所述统计特征分别进行归一化处理或者数据变换处理,以将所述统计特征在不同时间维度上的分布映射成对应的多元高斯分布;
    以各多元高斯分布对应数据为样本,采用最大似然估计求解各多元高斯分布各自对应的均值估计量和协差阵估计量;
    基于各多元高斯分布各自对应的均值估计量和协差阵估计量,生成各多元高斯分布各自对应的多元高斯分布密度函数。
  18. 如权利要求17所述的计算机可读存储介质,通过以下公式对所述统计特征对应的原始数据集进行归一化处理:
    Figure PCTCN2020093392-appb-100006
    其中,μ、σ分别为原始数据集的均值和方差,S为归一化后的数据;
    通过以下公式对所述统计特征对应的原始数据集进行对数变换处理:
    y=log c(1+λx);
    其中,x为原始数据,y为对数变换后的数据,λ设置为1,c设置为变换数据的最大值。
  19. 如权利要求17或18所述的计算机可读存储介质,其中,采用如下函数作为P元高斯分布对应的似然函数:
    Figure PCTCN2020093392-appb-100007
    其中,μ、∑分别为P元高斯分布对应的样本均值向量和样本协差阵,X i表示第i个统计特征样本向量,n表示共有n个统计特征样本向量,L表示似然函数,f表示概率密度函数。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述对所述用户访问日志记录中的原始数据进行清洗与变换处理,生成符合统计要求的标准用户访问数据包括:
    检测所述用户访问日志记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问日志记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行变换处理,生成符合统计要求的用户访问标准数据,所述变换处理包括:数据类型变换、对数变换、数据离散化中的一种或多种。
PCT/CN2020/093392 2019-10-18 2020-05-29 基于统计的异常流量监测方法、装置、设备及存储介质 WO2021073114A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910991150.6 2019-10-18
CN201910991150.6A CN110830450A (zh) 2019-10-18 2019-10-18 基于统计的异常流量监测方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021073114A1 true WO2021073114A1 (zh) 2021-04-22

Family

ID=69549556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093392 WO2021073114A1 (zh) 2019-10-18 2020-05-29 基于统计的异常流量监测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110830450A (zh)
WO (1) WO2021073114A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225339A (zh) * 2021-05-07 2021-08-06 恒安嘉新(北京)科技股份公司 网络安全监测方法、装置、计算机设备及存储介质
CN115174358A (zh) * 2022-09-08 2022-10-11 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110830450A (zh) * 2019-10-18 2020-02-21 平安科技(深圳)有限公司 基于统计的异常流量监测方法、装置、设备及存储介质
CN111447228A (zh) * 2020-03-27 2020-07-24 四川虹美智能科技有限公司 智能家电访问请求处理方法及系统、云服务器及智能空调
CN112148747A (zh) * 2020-09-08 2020-12-29 银清科技有限公司 一种基于r语言的交易系统日志分析方法及装置
CN112818244A (zh) * 2021-02-24 2021-05-18 深圳市网联安瑞网络科技有限公司 频道、群组及群组用户的活跃度判定方法、系统、终端
CN113852591B (zh) * 2021-06-08 2023-09-22 天翼数字生活科技有限公司 基于改进四分位差法的摄像头异常访问识别与告警方法
CN113542236A (zh) * 2021-06-28 2021-10-22 中孚安全技术有限公司 一种基于核密度估计和指数平滑算法的异常用户检测方法
CN113271322B (zh) * 2021-07-20 2021-11-23 北京明略软件系统有限公司 异常流量的检测方法和装置、电子设备和存储介质
CN114079624B (zh) * 2022-01-18 2022-04-08 广东道一信息技术股份有限公司 一种基于多用户接入的架构数据流监测方法及系统
CN114745328B (zh) * 2022-02-16 2023-12-26 多点生活(成都)科技有限公司 一种网关动态限流方法及其构成的实时限流方法
CN114692058B (zh) * 2022-06-01 2022-08-02 天津市普迅电力信息技术有限公司 基于vue架构下的自动化埋点方法、系统、电子设备
CN115174254B (zh) * 2022-07-22 2023-10-31 科来网络技术股份有限公司 流量异常告警方法、装置、电子设备及存储介质
TWI820973B (zh) 2022-10-18 2023-11-01 財團法人資訊工業策進會 資訊安全預警裝置以及方法
CN117527611B (zh) * 2023-11-07 2024-06-07 北京太极信息系统技术有限公司 一种基于高斯分布的故障动态预测方法、系统、电子设备及存储介质
CN117290380B (zh) * 2023-11-14 2024-02-06 华青融天(北京)软件股份有限公司 异常维度数据生成方法、装置、设备和计算机可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014031A (zh) * 2010-12-31 2011-04-13 湖南神州祥网科技有限公司 一种网络流量异常检测方法及系统
CN107086944A (zh) * 2017-06-22 2017-08-22 北京奇艺世纪科技有限公司 一种异常检测方法和装置
CN107154950A (zh) * 2017-07-24 2017-09-12 深信服科技股份有限公司 一种日志流异常检测的方法及系统
CN107302547A (zh) * 2017-08-21 2017-10-27 深信服科技股份有限公司 一种web业务异常检测方法及装置
CN107370766A (zh) * 2017-09-07 2017-11-21 杭州安恒信息技术有限公司 一种网络流量异常检测方法及系统
CN110830450A (zh) * 2019-10-18 2020-02-21 平安科技(深圳)有限公司 基于统计的异常流量监测方法、装置、设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243301B (zh) * 2014-07-09 2019-01-18 阿里巴巴集团控股有限公司 键盘输入异常检测方法、装置以及安全提示方法、装置
CN105516127B (zh) * 2015-12-07 2019-01-25 中国科学院信息工程研究所 面向内部威胁检测的用户跨域行为模式挖掘方法
CN107222472A (zh) * 2017-05-26 2017-09-29 电子科技大学 一种Hadoop集群下的用户行为异常检测方法
CN107168854B (zh) * 2017-06-01 2020-06-30 北京京东尚科信息技术有限公司 互联网广告异常点击检测方法、装置、设备及可读存储介质
CN107341095B (zh) * 2017-06-27 2020-07-28 北京优特捷信息技术有限公司 一种智能分析日志数据的方法及装置
CN107483455B (zh) * 2017-08-25 2020-07-14 国家计算机网络与信息安全管理中心 一种基于流的网络节点异常检测方法和系统
CN108234463B (zh) * 2017-12-22 2021-02-02 杭州安恒信息技术股份有限公司 一种基于多维行为模型的用户风险评估与分析方法
US11128648B2 (en) * 2018-01-02 2021-09-21 Maryam AMIRMAZLAGHANI Generalized likelihood ratio test (GLRT) based network intrusion detection system in wavelet domain
CN109614576B (zh) * 2018-12-11 2022-08-30 福建工程学院 基于多维高斯分布与趋势分段的变压器异常检测方法
CN109960631B (zh) * 2019-03-19 2020-01-03 山东九州信泰信息科技股份有限公司 一种安全事件异常的实时侦测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014031A (zh) * 2010-12-31 2011-04-13 湖南神州祥网科技有限公司 一种网络流量异常检测方法及系统
CN107086944A (zh) * 2017-06-22 2017-08-22 北京奇艺世纪科技有限公司 一种异常检测方法和装置
CN107154950A (zh) * 2017-07-24 2017-09-12 深信服科技股份有限公司 一种日志流异常检测的方法及系统
CN107302547A (zh) * 2017-08-21 2017-10-27 深信服科技股份有限公司 一种web业务异常检测方法及装置
CN107370766A (zh) * 2017-09-07 2017-11-21 杭州安恒信息技术有限公司 一种网络流量异常检测方法及系统
CN110830450A (zh) * 2019-10-18 2020-02-21 平安科技(深圳)有限公司 基于统计的异常流量监测方法、装置、设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225339A (zh) * 2021-05-07 2021-08-06 恒安嘉新(北京)科技股份公司 网络安全监测方法、装置、计算机设备及存储介质
CN113225339B (zh) * 2021-05-07 2023-04-07 恒安嘉新(北京)科技股份公司 网络安全监测方法、装置、计算机设备及存储介质
CN115174358A (zh) * 2022-09-08 2022-10-11 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质
CN115174358B (zh) * 2022-09-08 2023-01-17 浪潮电子信息产业股份有限公司 存储集群接口的监测处理方法、系统、设备及存储介质

Also Published As

Publication number Publication date
CN110830450A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2021073114A1 (zh) 基于统计的异常流量监测方法、装置、设备及存储介质
WO2021212756A1 (zh) 指标异常分析方法、装置、电子设备及存储介质
WO2021184727A1 (zh) 数据异常检测方法、装置、电子设备及存储介质
US11836162B2 (en) Unsupervised method for classifying seasonal patterns
US10331802B2 (en) System for detecting and characterizing seasons
WO2021072887A1 (zh) 异常流量监测方法、装置、设备及存储介质
US8078913B2 (en) Automated identification of performance crisis
CN108600009B (zh) 一种基于告警数据分析的网络告警根源定位方法
CN110928718A (zh) 一种基于关联分析的异常处理方法、系统、终端及介质
US10437696B2 (en) Proactive information technology infrastructure management
US20170249562A1 (en) Supervised method for classifying seasonal patterns
CN110347582B (zh) 埋点测试方法和装置
CN111176953B (zh) 一种异常检测及其模型训练方法、计算机设备和存储介质
CN109002424B (zh) 文件格式转换方法、装置、计算机设备及存储介质
CN111585837B (zh) 物联网数据链路监控方法、装置、计算机设备和存储介质
CN109684320B (zh) 监测数据在线清洗的方法和设备
CN111800389A (zh) 基于贝叶斯网络的港口网络入侵检测方法
CN111626842A (zh) 一种消费行为数据的分析方法和装置
CN116594857A (zh) 一种基于人工智能的办公软件智能交互管理平台
CN116627707A (zh) 一种用户异常操作行为的检测方法及系统
CN112001443A (zh) 网络行为数据的监控方法、装置、存储介质及电子设备
CN111191720A (zh) 一种业务场景的识别方法、装置及电子设备
CN114785616A (zh) 数据风险检测方法、装置、计算机设备及存储介质
CN115049410A (zh) 窃电行为识别方法、装置、电子设备及计算机可读存储介质
CN113645215A (zh) 异常网络流量数据的检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20877464

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20877464

Country of ref document: EP

Kind code of ref document: A1