WO2021072887A1 - 异常流量监测方法、装置、设备及存储介质 - Google Patents

异常流量监测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021072887A1
WO2021072887A1 PCT/CN2019/119204 CN2019119204W WO2021072887A1 WO 2021072887 A1 WO2021072887 A1 WO 2021072887A1 CN 2019119204 W CN2019119204 W CN 2019119204W WO 2021072887 A1 WO2021072887 A1 WO 2021072887A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
wavelet coefficients
time series
coefficients
wavelet
Prior art date
Application number
PCT/CN2019/119204
Other languages
English (en)
French (fr)
Inventor
刘玉洁
杨冬艳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021072887A1 publication Critical patent/WO2021072887A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour

Definitions

  • This application relates to the field of network security technology, and in particular to an abnormal traffic monitoring method, device, equipment, and storage medium.
  • abnormal network traffic monitoring has always been an important part of the information security field.
  • Abnormal network traffic refers to irregular and significant changes in traffic in the network.
  • problems such as high-frequency operation, abnormal time access, abnormal file or abnormal access object behind it. Regardless of the type of problem, it may face a decline in service quality that affects normal user access and network security issues.
  • the traditional abnormal traffic monitoring is based on the basic characteristics of the request volume, the length of the traffic data packet, the size of the network traffic, and the combination of characteristics. This method is suitable for small-scale and simple network monitoring.
  • the network structure is not only large in scale, such as many branches and business types, but also many and complex application scenarios. For example, online shopping scenarios involve not only customer service communication and third-party payment, but also third-party The application scenarios of logistics and subsequent after-sales service, supplier management, etc. are very complex, so the traditional abnormal traffic monitoring is not suitable for the current complex network structure and complex application scenarios.
  • the main purpose of this application is to provide an abnormal traffic monitoring method, device, equipment and storage medium, aiming to solve the technical problem that the existing network abnormal traffic monitoring method cannot adapt to the current complex network structure and application scenarios.
  • the abnormal flow monitoring method includes the following steps:
  • Multi-scale wavelet decomposition is performed on the first flow time series data through a preset low-pass filter and a high-pass filter using a multi-resolution analysis algorithm to obtain wavelet coefficients corresponding to each layer of wavelet decomposition;
  • the actual flow value is within the confidence interval of the predicted flow value, it is determined that the current network flow is normal; if the actual flow value exceeds the confidence interval of the predicted flow value, it is determined that the current network flow is abnormal.
  • the cleaning and statistical processing of the original data in the user access record to generate the first traffic time series data corresponding to the amount of access includes:
  • the missing value cleaning includes: deleting missing value fields and using interpolation to complete missing values;
  • using the wavelet coefficients of each layer as an analysis object, respectively establishing corresponding stationary time series models, and predicting the wavelet coefficients of each layer through the stationary time series model, and obtaining the predicted wavelet coefficients corresponding to each layer includes:
  • the difference operation is performed on the one or more layers of wavelet coefficients until any layer of wavelet coefficients is a stationary time series;
  • the white noise detection is performed on the wavelet coefficients of each layer respectively;
  • the autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer are calculated respectively;
  • the wavelet coefficients of each layer are suitable for the autoregressive moving average model, then based on the preset order criterion, determine the order of the autoregressive moving average model to be constructed;
  • the wavelet coefficients of each layer are respectively predicted, and the predicted wavelet coefficients corresponding to each layer are obtained.
  • the determination of the suitable stationary time series model of the wavelet coefficients of each layer according to the respective autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer includes:
  • the method further includes:
  • the wavelet coefficients of each layer are suitable for the autoregressive model, determine the order of the autoregressive model to be constructed based on the preset order criterion;
  • the autoregressive models corresponding to the wavelet coefficients of each layer are respectively constructed;
  • the wavelet coefficients of each layer are respectively predicted, and the predicted wavelet coefficients corresponding to each layer are obtained.
  • the method further includes:
  • the wavelet coefficients of each layer are suitable for the moving average model, determine the order of the moving average model to be constructed based on the preset order criterion;
  • the wavelet coefficients of each layer are respectively predicted, and the predicted wavelet coefficients corresponding to each layer are obtained.
  • cA j+1 H*cA j
  • cD j+1 G*cD j
  • j 1, 2,...,J;
  • H and G are decomposition operators
  • H represents a low-pass filter
  • G represents a high-pass filter
  • H*, G* are the dual operators of the decomposition operators H and G, respectively
  • cA 0 represents the original signal data
  • cA j And cD j respectively represent the low-frequency signal part and the high-frequency signal part of the original signal data at a resolution of 2- j
  • J represents the maximum decomposition layer number.
  • the present application also provides an abnormal flow monitoring device, the abnormal flow monitoring device includes:
  • the collection module is used to collect user visit records within a preset time period based on preset embedding points
  • the preprocessing module is configured to clean and statistically process the original data in the user access record, and generate first traffic time series data corresponding to the amount of visits, and the first flow time series data reflects the corresponding relationship between the amount of visits and time;
  • the decomposition module is used to perform multi-scale wavelet decomposition on the first flow time series data through a preset low-pass filter and high-pass filter using a multi-resolution analysis algorithm to obtain wavelet coefficients corresponding to each layer of wavelet decomposition;
  • the prediction module is used to take the wavelet coefficients of each layer as the analysis object, respectively establish corresponding stationary time series models, and predict the wavelet coefficients of each layer through the stationary time series model to obtain the predicted wavelet coefficients corresponding to each layer;
  • a reconstruction module configured to use inverse wavelet transform to perform wavelet reconstruction on the predicted wavelet coefficients corresponding to each layer to obtain second traffic time series data
  • the comparison module is configured to compare the actual flow value corresponding to the same time with the flow prediction value using the second flow time series data as the flow prediction value;
  • the judging module is used for judging that the current network flow is normal if the actual flow value is within the confidence interval of the predicted flow value; if the actual flow value exceeds the confidence interval of the predicted flow value, judging that the current network flow is abnormal.
  • the present application also provides an abnormal flow monitoring device, the abnormal flow monitoring device includes a memory, a processor, and a computer readable stored in the memory and running on the processor. Instructions, when the computer-readable instructions are executed by the processor, the steps of the abnormal flow monitoring method as described in any one of the above are implemented.
  • the present application also provides a computer-readable storage medium having computer-readable instructions stored on the computer-readable storage medium.
  • the computer-readable instructions are executed by a processor, any of the foregoing One of the steps of the abnormal flow monitoring method.
  • This application performs flow data processing based on wavelet analysis to highlight the localized information of the flow.
  • the hidden laws and characteristics of the original signal can be found, that is, the predicted values of the wavelet coefficients of each layer are obtained, and then the The obtained predicted values of the wavelet coefficients are reconstructed by wavelet to obtain the predicted flow time series data.
  • the normal sequence and the abnormal sequence can be distinguished, and the abnormal flow can be identified and alarmed.
  • this application can not only remove the prediction misjudgment caused by noise in the flow time series data, but also include the time domain and frequency domain information in the signal.
  • the ARMA time series model is established to predict the wavelet coefficients. Then through wavelet reconstruction, a series of predicted traffic data can be obtained, and the time sequence characteristics of the period can be retained, and then different threshold ranges can be set according to the predicted traffic value and different business scenarios to identify and alert abnormal traffic.
  • FIG. 1 is a schematic structural diagram of an operating environment of an abnormal flow monitoring device involved in a scheme of an embodiment of this application;
  • FIG. 2 is a schematic flowchart of an embodiment of a method for monitoring abnormal traffic according to this application;
  • FIG. 3 is a detailed flowchart of an embodiment of step S20 in FIG. 2;
  • FIG. 4 is a detailed flowchart of an embodiment of step S40 in FIG. 2;
  • FIG. 5 is a schematic diagram of functional modules of an embodiment of an abnormal flow monitoring device according to the present application.
  • FIG. 6 is a schematic diagram of detailed functional modules of an embodiment of the preprocessing module 20 in FIG. 5;
  • FIG. 7 is a schematic diagram of detailed functional modules of an embodiment of the prediction module 40 in FIG. 5.
  • This application provides an abnormal flow monitoring device.
  • FIG. 1 is a schematic structural diagram of the operating environment of the abnormal flow monitoring device involved in the solution of the embodiment of the application.
  • the abnormal traffic monitoring device includes: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the hardware structure of the abnormal flow monitoring device shown in FIG. 1 does not constitute a limitation on the abnormal flow monitoring device, and may include more or less components than shown in the figure, or a combination of certain components, Or different component arrangements.
  • a memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and computer-readable instructions.
  • the operating system is a program that manages and controls abnormal flow monitoring equipment and software resources, and supports the operation of computer-readable instructions and other software and/or programs.
  • the network interface 1004 is mainly used to access the network; the user interface 1003 is mainly used to detect confirmation commands and edit commands, etc., and the processor 1001 can be used to call the memory 1005
  • the computer-readable instructions are stored in and execute the operations of the following embodiments of the abnormal flow monitoring method.
  • FIG. 2 is a schematic flowchart of an embodiment of an abnormal flow monitoring method according to the present application.
  • the abnormal flow monitoring method includes the following steps:
  • Step S10 based on the preset buried point, collect user access records within a preset time period
  • preset buried points such as buried points in a log database, are used to collect user access record data within a preset time period. In order to more realistically fit the visit volume characteristics of the network traffic, it is preferable to collect user visit records in a period of at least one month or more.
  • the user access record includes user ID, user and server IP addresses, user access time, user stay time, user end access time, and other information.
  • Step S20 Perform cleaning and statistical processing on the original data in the user access record to generate first traffic time series data corresponding to the number of visits, where the first traffic time series data reflects the corresponding relationship between the number of visits and time;
  • the original data in the collected user access records is cleaned and statistically processed in advance, so as to generate traffic time series data corresponding to the amount of access.
  • Data cleaning refers to filtering data that does not meet the requirements. There are three main categories: incomplete data, incorrect data, and repeated data. Among them, incomplete data, that is, some information that should be missing, such data needs to be eliminated or completed through interpolation processing. Wrong data refers to incorrect format, such as incorrect field format and incorrect business meaning of the data. Duplicate data, such data needs to be eliminated. Data statistics refers to counting the amount of system visits in different periods of time, so that time series data corresponding to the amount of visits can be obtained, that is, the first traffic time series data.
  • the traffic time series data corresponding to the visit volume is generated, that is, the time series data composed of the visit volume sets corresponding to different time points.
  • Step S30 using a multi-resolution analysis algorithm to perform multi-scale wavelet decomposition on the first flow time series data through a preset low-pass filter and a high-pass filter to obtain wavelet coefficients corresponding to each layer of wavelet decomposition;
  • Wavelet decomposition refers to the expansion of the original signal according to a certain wavelet function cluster, that is, the original signal is expressed as a series of linear combinations of wavelet functions with different scales and different time shifts, and the coefficient of each item is called the wavelet coefficient, and under the same scale
  • the linear combination of all wavelet functions with different time shifts is called the wavelet component of the signal at this scale.
  • the wavelet coefficients are the coefficients of the wavelet basis function similar to the original signal. Because the network traffic data is discrete data, it is only suitable for discrete wavelet transform. At the same time, because many wavelet functions are not orthogonal functions, wavelet transform requires a scaling function, that is, the original The signal function can be decomposed into a linear combination of a scaling function and a wavelet function. In this function, the scaling function produces a low-frequency part, and the wavelet function produces a high-frequency part. Therefore, the wavelet coefficients include the detail coefficients corresponding to the high-frequency parts of the flow time series data and the low-frequency parts. The approximation coefficient.
  • the scaling function can be realized by a low-pass filter, and the wavelet function can be realized by a high-pass filter.
  • a filter bank constitutes the frame of the original signal decomposition.
  • the scaling function of the low-pass filter can be used as the generating function of the wavelet function and the scaling function of the next stage.
  • a multi-resolution analysis algorithm is used to decompose the original signal function into corresponding spaces by layer.
  • the multi-resolution analysis algorithm uses two filters constructed to divide the frequency band of the original signal. The algorithm is expressed as follows:
  • cA j+1 H*cA j
  • cD j+1 G*cD j
  • j 1, 2,...,J;
  • Step S40 using the wavelet coefficients of each layer as the analysis object, respectively establish corresponding stationary time series models, and predict the wavelet coefficients of each layer through the stationary time series model to obtain the predicted wavelet coefficients corresponding to each layer;
  • the wavelet coefficients (including the detail coefficients corresponding to the high frequency part and the approximation coefficients corresponding to the low frequency part) obtained after wavelet decomposition with the actual flow are used as actual data to initialize the stable time series model (preferably the ARMA model).
  • the stable time series model preferably the ARMA model.
  • Step S50 using inverse wavelet transform to perform wavelet reconstruction on the predicted wavelet coefficients corresponding to each layer to obtain second traffic time series data
  • inverse wavelet transform is used to perform wavelet reconstruction on the wavelet coefficients of each layer obtained by prediction, that is, the prediction results of the detail coefficients and approximation coefficients of each layer are superimposed (the original signal is equal to the high-frequency part of each decomposition layer and the last layer). The superimposition of the low-frequency part), and finally get the reconstructed flow time series data.
  • H and G are decomposition operators
  • H represents a low-pass filter
  • G represents a high-pass filter
  • H*, G* are the dual operators of the decomposition operators H and G, respectively
  • cA 0 represents the original signal data
  • cA j And cD j respectively represent the low-frequency signal part and the high-frequency signal part of the original signal data at a resolution of 2- j
  • J represents the maximum number of decomposition layers
  • cD 1 G*cA 0 .
  • the approximate part and detail part of the original signal can be reconstructed by the above reconstruction algorithm.
  • Step S60 using the second flow time series data as the flow prediction value, and comparing the actual flow value corresponding to the same time with the flow prediction value;
  • the reconstructed predicted flow time series data is obtained, it is compared with the actual flow data, and at the same time, a certain difference threshold between the predicted value and the true value is set for abnormal flow judgment and alarm.
  • step S70 if the actual flow value is within the confidence interval of the predicted flow value, it is determined that the current network flow is normal; if the actual flow value exceeds the confidence interval of the predicted flow value, it is judged that the current network flow is abnormal.
  • a range above and below the predicted value can be set as a confidence interval, such as five thousandths; however, in actual business, there may also be different tolerances for up and down fluctuations, such as the threshold value for upward fluctuations (The surge in visits may be due to malicious visits or attacks, and the risk is higher) fluctuates down by five thousandths (higher tolerance for reduced user visits).
  • This embodiment performs flow data processing based on wavelet analysis to highlight the localized information of the flow.
  • the hidden laws and characteristics of the original signal can be found, that is, the predicted values of the wavelet coefficients of each layer are obtained.
  • normal sequences and abnormal sequences can be distinguished, and abnormal flow can be identified and alarmed.
  • this embodiment can not only remove the prediction misjudgment caused by noise in the flow time series data, but also include the time domain and frequency domain information in the signal.
  • an ARMA time series model is established to predict the wavelet coefficients.
  • the predicted series of traffic data can be obtained, and the time sequence characteristics of the period can be retained, and then different threshold ranges can be set according to the predicted traffic value and different business scenarios to identify and alert abnormal traffic.
  • FIG. 3 is a detailed flowchart of an embodiment of step S20 in FIG. 2. Based on the foregoing embodiment, in this embodiment, the foregoing step S20 further includes:
  • Step S201 detecting whether there are missing values in the original data in the user access record
  • the user access log uses multiple fields to record a variety of information, such as user ID, user and server IP addresses, user access time, user stay time, user end access time, access exceptions, access status, Anomaly type code and anomaly type description, etc. If there is a missing value in the corresponding field of a record, it is determined that there is a missing value in the record.
  • Step S202 If there are missing values, calculate the missing value ratio corresponding to each field, and perform missing value cleaning according to the missing value ratio and the importance of the field.
  • the missing value cleaning includes: deleting missing value fields and using interpolation to complete Missing value
  • the proportion of missing values corresponding to each field for example, there are 100 user access records, if there are 10 records corresponding to a field Missing values, the proportion of missing values corresponding to the field is 10%.
  • different fields have different degrees of importance in actual application scenarios.
  • the user's IP address is more important than the server's IP address
  • the user's access time is more important than the user's stay time.
  • Different levels of importance of the fields use different cleaning strategies. For example, if the proportion of missing values is high and the importance of the field is low, the missing value field is directly deleted, and if the proportion of missing values is low and the importance of the field is high, then interpolation is used to complete the missing value.
  • Step S203 Sort the original data in the user access records, and calculate the similarity between each sorted record and adjacent records;
  • Step S204 if the similarity between different records exceeds a preset threshold, it is determined as a duplicate record and redundant data is deleted;
  • the duplicate records are further deduplicated. Specifically, all the original data in the user access records are sorted first, such as sorting based on the numerical value of a certain field, such as sorting based on the access time, and then Calculate the similarity between each sorted record and adjacent records, such as using field matching algorithm, standardized Euclidean distance, etc. to calculate the similarity between different records. If the similarity between different records exceeds a preset threshold (for example, 90%), it is determined as a duplicate record and redundant data is deleted.
  • a preset threshold for example, 90%
  • Step S205 Perform statistics on the amount of access to the cleaned data in chronological order, and generate the first traffic time series data corresponding to the amount of access.
  • FIG. 4 is a detailed flowchart of an embodiment of step S40 in FIG. 2. Based on the foregoing embodiment, in this embodiment, the foregoing step S40 further includes:
  • Step S401 Perform stationarity detection on the wavelet coefficients of each layer to determine whether the wavelet coefficients of each layer are a stationary time series;
  • time series meets the following requirements: (1) For any time t, its mean is always constant; (2) For any time t and s, the correlation coefficient of this time series is determined by the time period between two time points Yes, the starting point of the two time points will not cause any impact. Such a time series is a stationary time series.
  • Stationarity test methods include data graph, reverse order test, run test, unit root test, DF test, ADF test, etc.
  • Step S402 if one or more wavelet coefficients are non-stationary time series, then the difference operation is performed on the one or more wavelet coefficients until any wavelet coefficients of any layer are stationary time series;
  • the time series after the difference is processed, and the corresponding stationary random process or model can be established.
  • a stationary ARMA(p,q) model can be used as its corresponding model.
  • step S403 if the wavelet coefficients of any layer are in a stationary time series, white noise detection is performed on the wavelet coefficients of each layer;
  • the sequence in order to verify whether the sequence is white noise, if it is white noise, then the sequence is all random disturbances and cannot be predicted and used.
  • step S404 if the wavelet coefficients of any layer are stationary and non-white noise time series, then the autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer are respectively calculated;
  • the correlation coefficient is used to measure the linear correlation between two vectors, and in a stationary time series ⁇ R t ⁇ , the linear correlation between R t and R ti is called the autocorrelation coefficient.
  • the partial autocorrelation coefficient is used to evaluate the correlation degree of the influence of R ti on R t.
  • the specific calculation method is the same as that in the prior art, so it will not be described in detail.
  • Step S405 according to the respective autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer, respectively determine the suitable stationary time series model of the wavelet coefficients of each layer;
  • the autocorrelation coefficients and partial autocorrelation coefficients corresponding to the time series data are different, and the corresponding stationary time series models are also different.
  • the sequence is suitable for the AR model; if the partial autocorrelation coefficient of the stationary series is tailed, and the autocorrelation coefficient is tailed, It can be concluded that the sequence is suitable for the MA model; if the partial autocorrelation coefficient and the autocorrelation coefficient of the stationary sequence are tailed, the sequence is suitable for the ARMA model.
  • truncation refers to the property that the autocorrelation coefficient (ACF) or partial autocorrelation coefficient (PACF) of the time series is 0 after a certain order (such as AR's PACF); tailing means that the ACF or PACF is not after a certain order All are 0 (such as AR ACF).
  • ACF autocorrelation coefficient
  • PAF partial autocorrelation coefficient
  • step S406 if the wavelet coefficients of each layer are suitable for the autoregressive moving average model, the order of the autoregressive moving average model to be constructed is determined based on the preset order criterion;
  • the autoregressive moving average process has the characteristics of randomness, and it includes two different parts, namely, autoregressive and moving average. If p represents the upper limit of the order value of the former part (autoregressive order), and q represents the upper limit of the order value of the latter part (moving average order), then the autoregressive moving average process can be expressed as ARMA (p,q). In an embodiment, the following expression is preferably used to determine the autoregressive moving average model to be constructed, which is specifically as follows:
  • ⁇ t , ⁇ t-1 ,..., ⁇ tq are white noise
  • ⁇ 1 , ⁇ 2 ,..., ⁇ q are the parameters of the moving average model
  • x t ,x t-1 ,x t-2 ,...,x tp represent the time series, t Is a positive integer.
  • the order of the autoregressive moving average model to be constructed is determined based on the preset order criterion.
  • the order of p and q cannot be determined.
  • it In order to determine the order of p and q more accurately, it must be compared with the commonly used order criterion. Joint application. For example, AIC (A-Information Criterion).
  • Step S407 Perform parameter estimation on the autoregressive moving average model to be constructed to obtain model parameter values
  • the parameter values of the model need to be further calculated.
  • the moment estimation method is used for parameter estimation, or the approximate maximum likelihood estimation method is used for parameter estimation.
  • Step S408 based on the determined order and the model parameter values, respectively construct an autoregressive moving average model corresponding to each layer of wavelet coefficients;
  • Step S409 based on the constructed respective regression moving average models, respectively predict the wavelet coefficients of each layer, and obtain the predicted wavelet coefficients corresponding to each layer.
  • the autoregressive moving average model corresponding to the wavelet coefficients of each layer can be constructed, and then each layer can be calculated according to the constructed autoregressive moving average model.
  • the predicted value of the wavelet coefficients can be determined, and then the autoregressive model and the moving average model.
  • the autoregressive model and the moving average model are constructed in the same way as the autoregressive moving average model, so I won’t repeat them too much.
  • FIG. 5 is a schematic diagram of functional modules of an embodiment of an abnormal flow monitoring device according to the present application.
  • the abnormal flow monitoring device includes:
  • the collection module 10 is used to collect user access records within a preset time period based on preset embedding points;
  • the preprocessing module 20 is configured to perform cleaning and statistical processing on the original data in the user access record to generate first traffic time series data corresponding to the amount of visits, and the first flow time series data reflects the corresponding relationship between the amount of visits and time;
  • the decomposition module 30 is configured to perform multi-scale wavelet decomposition on the first flow time series data through a preset low-pass filter and high-pass filter using a multi-resolution analysis algorithm to obtain wavelet coefficients corresponding to each layer of wavelet decomposition;
  • the prediction module 40 is configured to take the wavelet coefficients of each layer as an analysis object, respectively establish corresponding stationary time series models, and predict the wavelet coefficients of each layer through the stationary time series model to obtain the predicted wavelet coefficients corresponding to each layer;
  • the reconstruction module 50 is configured to use inverse wavelet transform to perform wavelet reconstruction on the predicted wavelet coefficients corresponding to each layer to obtain the second traffic time series data;
  • the comparison module 60 is configured to compare the actual flow value corresponding to the same time with the flow prediction value by using the second flow time series data as the flow prediction value;
  • the determining module 70 is configured to determine that the current network flow is normal if the actual flow value is within the confidence interval of the predicted flow value; if the actual flow value exceeds the confidence interval of the predicted flow value, judge that the current network flow is abnormal.
  • This embodiment performs flow data processing based on wavelet analysis to highlight the localized information of the flow.
  • the hidden laws and characteristics of the original signal can be found, that is, the predicted values of the wavelet coefficients of each layer are obtained.
  • normal sequences and abnormal sequences can be distinguished, and abnormal flow can be identified and alarmed.
  • this embodiment can not only remove the prediction misjudgment caused by noise in the flow time series data, but also include the time domain and frequency domain information in the signal.
  • an ARMA time series model is established to predict the wavelet coefficients.
  • the predicted series of traffic data can be obtained, and the time sequence characteristics of the period can be retained, and then different threshold ranges can be set according to the predicted traffic value and different business scenarios to identify and alert abnormal traffic.
  • FIG. 6 is a schematic diagram of detailed functional modules of an embodiment of the preprocessing module 20 in FIG. 5. Based on the foregoing embodiment, in this embodiment, the foregoing preprocessing module 20 further includes:
  • the detection unit 201 is configured to detect whether there are missing values in the original data in the user access record
  • the cleaning unit 202 is configured to calculate the missing value ratio corresponding to each field if there are missing values, and perform missing value cleaning according to the missing value ratio and the importance of the field.
  • the missing value cleaning includes: deleting missing value fields and using interpolation Method to complete missing values;
  • the sorting unit 203 is configured to sort the original data in the user access records, and calculate the similarity between each sorted record and adjacent records;
  • the judging unit 204 is configured to, if the similarity between different records exceeds a preset threshold, judge it as a duplicate record and delete redundant data;
  • the generating unit 205 is configured to perform statistics on the amount of access to the cleaned data in chronological order, and generate the first traffic time series data corresponding to the amount of access.
  • FIG. 7 is a schematic diagram of detailed functional modules of an embodiment of the prediction module 40 in FIG. 5.
  • the aforementioned prediction module 40 further includes:
  • the stationarity detection unit 401 is configured to perform stationarity detection on the wavelet coefficients of each layer to determine whether the wavelet coefficients of each layer are a stationary time series;
  • the difference operation unit 402 is configured to perform a difference operation on the one or more layers of wavelet coefficients if there are one or more layers of wavelet coefficients that are non-stationary time series, until any layer of wavelet coefficients is a stationary time series;
  • the white noise detection unit 403 is configured to perform white noise detection on the wavelet coefficients of each layer if the wavelet coefficients of any layer are in a stationary time series;
  • the coefficient determining unit 404 is configured to calculate the autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer if the wavelet coefficients of any layer are all stationary non-white noise time series;
  • the model determining unit 405 is configured to determine the suitable stationary time series model of the wavelet coefficients of each layer according to the respective autocorrelation coefficients and partial autocorrelation coefficients of the wavelet coefficients of each layer;
  • the model construction unit 406 is used to determine the order of the autoregressive moving average model to be constructed based on the preset order criterion if the wavelet coefficients of each layer are suitable for the autoregressive moving average model; Parameter estimation to obtain model parameter values; based on the determined order and the model parameter values, respectively construct an autoregressive moving average model corresponding to each layer of wavelet coefficients;
  • the model prediction unit 407 is configured to respectively predict the wavelet coefficients of each layer based on the constructed respective regression moving average models, and obtain the predicted wavelet coefficients corresponding to each layer.
  • This application also provides a computer-readable storage medium, where the computer-readable storage medium may be volatile or non-volatile, which is not specifically limited by this application.
  • computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by the processor, the steps of the abnormal flow monitoring method described in any of the above embodiments are implemented.
  • the method implemented when the computer-readable instruction is executed by the processor can refer to the various embodiments of the abnormal traffic monitoring method of the present application, so it will not be described in detail.

Abstract

一种异常流量监测方法、装置、设备及计算机可读存储介质,该方法包括:收集用户访问记录;对原始数据进行清洗与统计处理,生成第一流量时序数据;对第一流量时序数据进行多尺度小波分解,得到各层小波系数;以各层小波系数为分析对象,建立对应的平稳时间序列模型并进行预测,得到各层预测小波系数;对各层预测小波系数进行小波重构,得到第二流量时序数据;以第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常,否则判定当前网络流量异常。该方法提升了异常流量监测准确率,降低了部署成本。

Description

异常流量监测方法、装置、设备及存储介质
本申请要求于2019年10月18日提交中国专利局、申请号为201910991177.5、发明名称为“异常流量监测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及网络安全技术领域,尤其涉及一种异常流量监测方法、装置、设备及存储介质。
背景技术
随着信息时代的到来,网络异常流量监测一直是信息安全领域的重要一环。网络异常流量指网络中流量不规则地显著变化。针对网络流量在短时间内可能发生的突变异常,其背后可能存在高频操作、异常时段访问、文件异常或者访问对象异常等问题。无论是哪类问题都可能面临服务质量下降影响正常用户访问以及网络安全问题。
传统的异常流量监测是根据请求量、流量数据包的长度、网络流量大小等基本特征以及组合特征进行流量的异常监测,该方法适于小规模、简单网络的监测,而目前很多企业所使用的网络结构不仅规模庞大,比如分支机构非常多、业务类型也多,同时应用场景也非常的多且复杂,比如,网络购物的场景,不仅涉及到客服沟通、第三方支付,同时还涉及到第三方物流以及后续的售后服务、供应商管理等,应用场景非常复杂,因而传统的异常流量监测并不适合目前复杂的网络结构以及复杂的应用场景。
发明内容
本申请的主要目的在于提供一种异常流量监测方法、装置、设备及存储介质,旨在解决现有网络异常流量监测方式无法适应目前复杂的网络结构与应用场景的技术问题。
为实现上述目的,本申请提供一种异常流量监测方法,所述异常流量监测方法包括以下步骤:
基于预置埋点,收集预设时间段内的用户访问记录;
对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;
若所述实际流量值在所述流量预测值的置信区间内,则判定当前网络流量正常;若所述实际流量值超过所述流量预测值的置信区间,则判定当前网络流量异常。
可选地,所述对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据包括:
检测所述用户访问记录中的原始数据是否存在缺失值;
若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
可选地,所述以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数包括:
分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;
对待构建的自回归移动平均模型进行参数估计,得到模型参数值;
基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
可选地,所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型包括:
判断各层小波系数各自对应的偏自相关系数是否拖尾以及判断各层小波系数各自对应的自相关系数是否截尾;
若各层小波系数各自对应的偏自相关系数均截尾、自相关系数均拖尾,则确定各层小波系数均适合自回归模型;
若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均截尾,则确定各层小波系数均适合移动平均模型;
若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均拖尾,则确定各层小波系数均适合自回归移动平均模型。
可选地,在所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型的步骤之后,还包括:
若各层小波系数均适合自回归模型,则基于所述预置定阶准则,确定待构建的自回归模型的阶数;
对待构建的自回归模型进行参数估计,得到模型参数值;
基于确定的阶数及通过参数估计得到的模型参数值,分别构建各层小波系数对应的自回归模型;
基于构建的各自回归模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
可选地,在所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型的步骤之后,还包括:
若各层小波系数均适合移动平均模型,则基于所述预置定阶准则,确定待构建的移动平均模型的阶数;
对待构建的移动平均模型进行参数估计,得到模型参数值;
基于确定的阶数及通过参数估计得到的模型参数值,分别构建各层小波系数对应的移动平均模型;
基于构建的各移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
可选地,采用多分辨分析算法进行多尺度小波分解的对应公式如下:
cA j+1=H*cA j,cD j+1=G*cD j,j=1,2,...,J;
采用逆小波变换进行小波重构的对应公式如下:
cA j-1=H **cA j+G *cD j,j=1,2,...,J;
其中,H、G为分解算子,H表示低通滤波器,G表示高通滤波器,H*、G*分别为分解算子H、G的对偶算子,cA 0表示原信号数据,cA j和cD j分别表示在分辨率2 -j下原信号数据的低频信号部分和高频信号部分,J表示最大分解层数。
进一步地,为实现上述目的,本申请还提供一种异常流量监测装置,所述异常流量监测装置包括:
收集模块,用于基于预置埋点,收集预设时间段内的用户访问记录;
预处理模块,用于对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
分解模块,用于通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
预测模块,用于以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
重构模块,用于采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
比对模块,用于以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;
判定模块,用于若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常;若实际流量值超过流量预测值的置信区间,则判定当前网络流量异常。
进一步地,为实现上述目的,本申请还提供一种异常流量监测设备,所述异常流量监测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被所述处理器执行时实现如上述任一项所述的异常流量监测方法的步骤。
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上述任一项所述的异常流量监测方法的步骤。
本申请基于小波分析进行流量数据处理以突出流量的局部化信息,对信号信息进行多尺度细化后可发现原信号所隐藏的规律和特点,也即得到各层小波系数的预测值,而后对得到的各个小波系数预测值进行小波重构,得到预测的流量时序数据,基于此预测的流量时序数据能够对正常序列和异常序列进行区分,进而可进行异常流量的识别和告警。本申请基于小波分析的优势,既可以去除流量时序数据中由于噪声引起的预测误判,又同时可以包含信号中的时域和频域信息,先通过小波分解,建立ARMA时序模型预测小波系数,而后通过小波重构,可得到预测的系列流量数据,并保留期时序特征,进而可根据流量预测值以及不同的业务场景设定不同的阈值范围,进行异常流量的识别和告警。
附图说明
图1为本申请实施例方案涉及的异常流量监测设备运行环境的结构示意图;
图2为本申请异常流量监测方法一实施例的流程示意图;
图3为图2中步骤S20一实施例的细化流程示意图;
图4为图2中步骤S40一实施例的细化流程示意图;
图5为本申请异常流量监测装置一实施例的功能模块示意图;
图6为图5中预处理模块20一实施例的细化功能模块示意图;
图7为图5中预测模块40一实施例的细化功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
本申请提供一种异常流量监测设备。
参照图1,图1为本申请实施例方案涉及的异常流量监测设备运行环境的结构示意图。
如图1所示,该异常流量监测设备包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的异常流量监测设备的硬件结构并不构成对异常流量监测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及计算机可读指令。其中,操作系统是管理和控制异常流量监测设备和软件资源的程序,支持计算机可读指令以及其它软件和/或程序的运行。
在图1所示的异常流量监测设备的硬件结构中,网络接口1004主要用于接入网络;用户接口1003主要用于侦测确认指令和编辑指令等,而处理器1001可以用于调用存储器1005中存储的计算机可读指令,并执行以下异常流量监测方法的各实施例的操作。
基于上述异常流量监测设备硬件结构,提出本申请异常流量监测方法的各个实施例。
参照图2,图2为本申请异常流量监测方法一实施例的流程示意图。本实施例中,所述异常流量监测方法包括以下步骤:
步骤S10,基于预置埋点,收集预设时间段内的用户访问记录;
通常,网络流量体现在访问量上,因此,本实施例需要获得记录有用户访问量信息的用户访问记录。本实施例中,通过预置埋点,比如在日志数据库中埋点,从而收集预设时间段内的用户访问记录数据。为更真实拟合网络流量的访问量特征,因此优选收集至少一个月以上时间段内的用户访问记录。
可选的,在一实施例中,用户访问记录包含用户ID、用户以及服务方IP地址、用户访问时间、用户停留时间、用户结束访问时间等信息。
步骤S20,对所述用户访问记录中的原始数据进行清洗与统计处理,生成 访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
本实施例中,为便于后续处理,因此预先对收集到的用户访问记录中的原始数据进行清洗与统计处理,从而生成访问量对应的流量时序数据。
数据清洗是指过滤那些不符合要求的数据,主要有不完整的数据、错误的数据和重复的数据三大类。其中,不完整的数据,也即一些应该有的信息缺失,此类数据需要剔除或者通过插值处理进行补全。错误的数据是指与格式不正确,比如字段格式不正确、数据对应的业务意义不正确。重复的数据,此类数据需要剔除。数据统计是指统计不同时间内的系统访问量,从而可得到访问量对应的时序数据,也即第一流量时序数据。
本实施例中,通过对用户访问记录中的原始数据进行清洗与统计处理后,生成访问量对应的流量时序数据,也即不同时间点对应的访问量集合所组成的时序数据。
步骤S30,通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
小波分解是指将原信号按某一小波函数簇展开,即将原信号表示为一系列不同尺度和不同时移的小波函数的线性组合,其中每一项的系数称为小波系数,而同一尺度下所有不同时移的小波函数的线性组合称为信号在该尺度下的小波分量。
小波系数就是小波基函数与原信号相似的系数,由于网络流量数据是离散数据,因此只适用于离散小波变换,同时由于很多小波函数不是正交函数,因此小波变换需要一个尺度函数,也即原信号函数可以分解成尺度函数和小波函数的线性组合,在这个函数中,尺度函数产生低频部分,小波函数产生高频部分,因此小波系数包括流量时序数据中高频部分对应的细节系数以及低频部分对应的逼近系数。尺度函数可以由低通滤波器实现,而小波函数则可由高通滤波器实现。这样的滤波器组就构成了原信号分解的框架。低通滤波器的尺度函数可以作为下一级的小波函数和尺度函数的母函数。
本实施例采用多分辨分析算法将原信号函数按层分解到相应的空间。在一实施例中,多分辨分析算法利用构造的两个滤波器,对原信号进行频带划分,算法表述如下:
cA j+1=H*cA j,cD j+1=G*cD j,j=1,2,...,J;
其中,H、G为分解算子,H表示低通滤波器,G表示高通滤波器,cA 0表示原信号数据,cA j和cD j分别表示在分辨率2 -j下原信号数据的低频信号部分和高频信号部分,J表示最大分解层数,cD 1=G*cA 0
步骤S40,以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
本实施例中,在以实际流量进行小波分解后得到的小波系数(包括高频部分对应的细节系数和低频部分对应的逼近系数)作为实际数据来初始化平 稳时间序列模型(优选ARMA模型)。在使用平稳时间序列模型之前,需要先计算出平稳时间序列模型的参数,然后对各层小波系数进行预测。
步骤S50,采用逆小波变换对各层对应的预测小波系数进行小波重构,得到第二流量时序数据;
本实施例采用逆小波变换对预测得到的各层小波系数进行小波重构,也即对各层细节系数和逼近系数的预测结果进行叠加(原信号等于各个分解层的高频部分和最后一层的低频部分的叠加),最终得到重构的流量时序数据。
需要说明的是,若cAj和cDj已知,则可对小波分解过程进行逆运算,从而实现对原信号的近似部分和细节部分进行重构,得到新的等长流量时序数据。在一实施例中,优选采用逆小波变换进行小波重构,对应公式如下:
cA j-1=H **cA j+G *cD j,j=1,2,...,J;
其中,H、G为分解算子,H表示低通滤波器,G表示高通滤波器,H*、G*分别为分解算子H、G的对偶算子,cA 0表示原信号数据,cA j和cD j分别表示在分辨率2 -j下原信号数据的低频信号部分和高频信号部分,J表示最大分解层数,cD 1=G*cA 0
通过上述重构算法可对原信号的近似部分和细节部分进行重构。
步骤S60,以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;
本实施例中,在得到重构的预测流量时序数据后,将其和实际的流量数据进行对比,同时设定一定的预测值和真实值的差值阈值进行异常流量的判断和告警。
步骤S70,若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常;若实际流量值超过流量预测值的置信区间,则判定当前网络流量异常。
本实施例中,可设置预测值上下的一个范围为置信区间,比如千分之五;不过在实际业务中也可能出现对上下波动容忍度不同的情况,比如阈值为向上波动千分之一(访问量的激增可能源于恶意访问或者攻击,风险较高)向下波动千分之五(对用户的访问量降低容忍度较高)。
本实施例中,通过将同一时间对应的流量时序数据进行对比,若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常;若实际流量值超过流量预测值的置信区间,则判定当前网络流量异常。
本实施例基于小波分析进行流量数据处理以突出流量的局部化信息,对信号信息进行多尺度细化后可发现原信号所隐藏的规律和特点,也即得到各层小波系数的预测值,而后对得到的各个小波系数预测值进行小波重构,得到预测的流量时序数据,基于此预测的流量时序数据能够对正常序列和异常序列进行区分,进而可进行异常流量的识别和告警。本实施例基于小波分析的优势,既可以去除流量时序数据中由于噪声引起的预测误判,又同时可以包含信号中的时域和频域信息,先通过小波分解,建立ARMA时序模型预测小波系数,而后通过小波重构,可得到预测的系列流量数据,并保留期时序特征,进而可根据流量预测值以及不同的业务场景设定不同的阈值范围,进 行异常流量的识别和告警。
参照图3,图3为图2中步骤S20一实施例的细化流程示意图。基于上述实施例,本实施例中,上述步骤S20进一步包括:
步骤S201,检测所述用户访问记录中的原始数据是否存在缺失值;
本实施例中,用户访问日志使用了多个字段记录了多种信息,比如用户ID、用户以及服务方IP地址、用户访问时间、用户停留时间、用户结束访问时间、访问异常情况、访问状态、异常类型code以及异常类型说明等,若某条记录对应字段存在数值缺失,则确定该条记录中存在缺失值。
步骤S202,若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
本实施例中,若用户访问记录中某个或某些个字段存在缺失值,则每个字段对应的缺失值比例,例如,有100条用户访问记录,若某个字段对应有10条记录存在缺失值,则该字段对应的缺失值比例为10%。
本实施例中,在实际应用场景中不同字段的重要程度不同。比如,用户IP地址较服务方IP地址更重要,用户访问时间较用户停留时间更重要。字段的不同重要程度所使用的清洗策略不同。例如,若缺失值比例高且字段重要程度低,则直接删除缺失值字段,而若缺失值比例低且字段重要程度高,则使用插值法补全缺失值。
步骤S203,对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
步骤S204,若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
本实施例中,还进一步对重复的记录进行去重处理,具体为先对用户访问记录中的所有原始数据进行排序,比如基于某个字段的数值大小进行排序,比如基于访问时间进行排序,然后计算排序后的每条记录与相邻记录之间的相似度,比如采用字段匹配算法、标准化欧氏距离等方式计算不同记录之间的相似度。若不同记录之间的相似度超过预置阈值(比如90%),则判定为重复记录并删除多余的数据。
步骤S205,对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
本实施例中,为获得访问量的时序数据,因此进一步对清洗后的数据进行统计处理。例如,在时间点1对应有IP1、IP2、IP3三个IP地址,则该时间点对应的
参照图4,图4为图2中步骤S40一实施例的细化流程示意图。基于上述实施例,本实施例中,上述步骤S40进一步包括:
步骤S401,分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
假如时间序列符合下列要求:(1)对任意时间t,其均值恒为常数;(2) 对于任意的时间t与s,此时间序列的相关系数是由两个时间点之间的时间段决定的,两个时间点的起始点不会造成任何影响。这样的时间序列就是平稳时间序列。
本实施例中,为了确定原始数据序列中有没有随机趋势或确定趋势,需要对数据进行平稳性检测。平稳性检验的方法有数据图、逆序检验、游程检验、单位根检验、DF检验、ADF检验等。
步骤S402,若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
在实际应用中,常常会遇到输入的时间序列经检验是非平稳的,这样就无法采用平稳时间序列模型,通常的处理方法是采用差分的方法将它们变换为平稳的。
经差分后,如果时间序列检验为平稳,就对差分后的时间序列进行处理,便可建立对应的平稳随机过程或模型。一个非平稳时间序列接受了d次差分处理并成为平稳序列时,就能够用一个平稳的ARMA(p,q)模型当作其对应的模型。
步骤S403,若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
本实施例中,为了验证序列中是否是白噪声,如果是白噪声,那么此序列即都是随机扰动,无法进行预测和使用。
步骤S404,若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
相关系数用于度量两个向量的线性相关性,而在平稳时间序列{R t}中,R t与R t-i的线性相关性称为自相关系数。偏自相关系数用于评价,R t-i对R t影响的相关度。具体计算方式与现有技术相同,因此不做过多赘述。
步骤S405,根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
本实施例中,时间序列数据对应的自相关系数及偏自相关系数的不同,其对应的平稳时间序列模型也不同。
可选的,在一具体实施例中,通过判断各层小波系数各自对应的偏自相关系数是否拖尾以及判断各层小波系数各自对应的自相关系数是否截尾,以确定各层小波系数适合的平稳时间序列模型。
若平稳序列的偏自相关系数是截尾的,而自相关系数是拖尾的,可断定序列适合AR模型;若平稳序列的偏自相关系数是拖尾的,而自相关系数是截尾的,则可断定序列适合MA模型;若平稳序列的偏自相关系数和自相关系数均是拖尾的,则序列适合ARMA模型。其中,截尾是指时间序列的自相关系数(ACF)或偏自相关系数(PACF)在某阶后均为0的性质(比如AR的PACF);拖尾是ACF或PACF并不在某阶后均为0的性质(比如AR的ACF)。
步骤S406,若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;
本实施例中,自回归移动平均过程具有随机性的特点,它包括了两个不同的部分,即自回归、移动平均。如果用p代表前一部分的阶数值的上限值 (自回归阶数),用q代表后一部分的阶数值的上限值(移动平均阶数),那么自回归移动平均过程就可以表示为ARMA(p,q)。在一实施例中,优选采用如下表达式确定待构建的自回归移动平均模型,具体如下:
Figure PCTCN2019119204-appb-000001
其中,ε tt-1,...,ε t-q为白噪声,
Figure PCTCN2019119204-appb-000002
为自回归模型的参数,θ 12,...,θ q为移动平均模型的参数,x t,x t-1,x t-2,...,x t-p表示时间序列,t为正整数。
若各层小波系数均属于自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数。使用自相关函数和偏自相关函数的截尾来判断模型为ARMA模型时,并不能确定p和q的阶数,为了比较精确的确定p和q的阶数,就必须与常用的定阶准则联合起来应用。比如最小信息量准则AIC(A-Information Criterion)。
步骤S407,对待构建的自回归移动平均模型进行参数估计,得到模型参数值;
本实施例中,在确定待构建的自回归移动平均模型的阶数之后,需进一步计算出该模型的参数值。例如使用矩估计方法进行参数估计,或者使用近似极大似然估计方法进行参数估计。
步骤S408,基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
步骤S409,基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的预测小波系数。
本实施例中,在确定了自回归移动平均模型的阶数及模型参数值后,即可构建各层小波系数对应的自回归移动平均模型,进而可根据构建的自回归移动平均模型计算各层小波系数的预测值。此外,自回归模型以及移动平均模型的构造方式与自回归移动平均模型相同,因此不做过多赘述。
参照图5,图5为本申请异常流量监测装置一实施例的功能模块示意图。本实施例中,所述异常流量监测装置包括:
收集模块10,用于基于预置埋点,收集预设时间段内的用户访问记录;
预处理模块20,用于对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
分解模块30,用于通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
预测模块40,用于以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
重构模块50,用于采用逆小波变换对各层对应的预测小波系数进行小波 重构,得到第二流量时序数据;
比对模块60,用于以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;
判定模块70,用于若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常;若实际流量值超过流量预测值的置信区间,则判定当前网络流量异常。
基于与上述本申请异常流量监测方法相同的实施例说明内容,因此本实施例对异常流量监测装置的实施例内容不做过多赘述。
本实施例基于小波分析进行流量数据处理以突出流量的局部化信息,对信号信息进行多尺度细化后可发现原信号所隐藏的规律和特点,也即得到各层小波系数的预测值,而后对得到的各个小波系数预测值进行小波重构,得到预测的流量时序数据,基于此预测的流量时序数据能够对正常序列和异常序列进行区分,进而可进行异常流量的识别和告警。本实施例基于小波分析的优势,既可以去除流量时序数据中由于噪声引起的预测误判,又同时可以包含信号中的时域和频域信息,先通过小波分解,建立ARMA时序模型预测小波系数,而后通过小波重构,可得到预测的系列流量数据,并保留期时序特征,进而可根据流量预测值以及不同的业务场景设定不同的阈值范围,进行异常流量的识别和告警。
参照图6,图6为图5中预处理模块20一实施例的细化功能模块示意图。基于上述实施例,本实施例中,上述预处理模块20进一步包括:
检测单元201,用于检测所述用户访问记录中的原始数据是否存在缺失值;
清洗单元202,用于若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
排序单元203,用于对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
判断单元204,用于若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
生成单元205,用于对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
基于与上述本申请异常流量监测方法相同的实施例说明内容,因此本实施例对异常流量监测装置的实施例内容不做过多赘述。
参照图7,图7为图5中预测模块40一实施例的细化功能模块示意图。本实施例中,上述预测模块40进一步包括:
平稳性检测单元401,用于分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
差分运算单元402,用于若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳 时间序列;
白噪声检测单元403,用于若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
系数确定单元404,用于若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
模型确定单元405,用于根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
模型构建单元406,用于若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;对待构建的自回归移动平均模型进行参数估计,得到模型参数值;基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
模型预测单元407,用于基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的预测小波系数。
基于与上述本申请异常流量监测方法相同的实施例说明内容,因此本实施例对异常流量监测装置的实施例内容不做过多赘述。
本申请还提供一种计算机可读存储介质,其中,该计算机可读存储介质可以为易失性的,也可以为非易失性的,具体本申请不做限定。
本实施例中,计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上述任一项实施例中所述的异常流量监测方法的步骤。其中,计算机可读指令被处理器执行时所实现的方法可参照本申请异常流量监测方法的各个实施例,因此不再过多赘述。
基于上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,这些均属于本申请的保护之内。

Claims (20)

  1. 一种异常流量监测方法,所述异常流量监测方法包括以下步骤:
    基于预置埋点,收集预设时间段内的用户访问记录;
    对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
    通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
    以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
    采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
    以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与所述流量预测值进行比对;
    若所述实际流量值在所述流量预测值的置信区间内,则判定当前网络流量正常;若所述实际流量值超过所述流量预测值的置信区间,则判定当前网络流量异常。
  2. 如权利要求1所述的异常流量监测方法,所述对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据包括:
    检测所述用户访问记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据所述缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
  3. 如权利要求1所述的异常流量监测方法,所述以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数包括:
    分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
    若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
    若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
    若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系 数的自相关系数及偏自相关系数;
    根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
    若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;
    对待构建的自回归移动平均模型进行参数估计,得到模型参数值;
    基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
    基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
  4. 如权利要求3所述的异常流量监测方法,所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型包括:
    判断各层小波系数各自对应的偏自相关系数是否拖尾以及判断各层小波系数各自对应的自相关系数是否截尾;
    若各层小波系数各自对应的偏自相关系数均截尾、自相关系数均拖尾,则确定各层小波系数均适合自回归模型;
    若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均截尾,则确定各层小波系数均适合移动平均模型;
    若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均拖尾,则确定各层小波系数均适合自回归移动平均模型。
  5. 如权利要求4所述的异常流量监测方法,在所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型的步骤之后,还包括:
    若各层小波系数均适合自回归模型,则基于所述预置定阶准则,确定待构建的自回归模型的阶数;
    对所述待构建的自回归模型进行参数估计,得到模型参数值;
    基于确定的所述阶数及通过参数估计得到的所述模型参数值,分别构建各层小波系数对应的自回归模型;
    基于构建的各所述自回归模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
  6. 如权利要求4所述的异常流量监测方法,在所述根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型的步骤之后,还包括:
    若各层小波系数均适合移动平均模型,则基于所述预置定阶准则,确定待构建的移动平均模型的阶数;
    对待构建的移动平均模型进行参数估计,得到模型参数值;
    基于确定的阶数及通过参数估计得到的模型参数值,分别构建各层小波系数对应的移动平均模型;
    基于构建的各移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
  7. 如权利要求1所述的异常流量监测方法,采用多分辨分析算法进行多尺度小波分解的对应公式如下:
    cA j+1=H*cA j,cD j+1=G*cD j,j=1,2,...,J;
    采用逆小波变换进行小波重构的对应公式如下:
    cA j-1=H **cA j+G *cD j,j=1,2,...,J;
    其中,H、G为分解算子,H表示低通滤波器,G表示高通滤波器,H*、G*分别为分解算子H、G的对偶算子,cA 0表示原信号数据,cA j和cD j分别表示在分辨率2 -j下原信号数据的低频信号部分和高频信号部分,J表示最大分解层数。
  8. 一种异常流量监测装置,所述异常流量监测装置包括:
    收集模块,用于基于预置埋点,收集预设时间段内的用户访问记录;
    预处理模块,用于对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
    分解模块,用于通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
    预测模块,用于以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
    重构模块,用于采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
    比对模块,用于以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与流量预测值进行比对;
    判定模块,用于若实际流量值在流量预测值的置信区间内,则判定当前网络流量正常;若实际流量值超过流量预测值的置信区间,则判定当前网络流量异常。
  9. 如权利要求8所述的异常流量监测装置,所述预处理模块包括:
    检测单元,用于检测所述用户访问记录中的原始数据是否存在缺失值;
    清洗单元,用于若存在缺失值,则计算每个字段对应的缺失值比例,并根据缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    排序单元,用于对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    判断单元,用于若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    生成单元,用于对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
  10. 如权利要求8所述的异常流量监测装置,所述预测模块包括:
    平稳性检测单元,用于分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
    差分运算单元,用于若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
    白噪声检测单元,用于若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
    系数确定单元,用于若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
    模型确定单元,用于根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
    模型构建单元,用于若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;对待构建的自回归移动平均模型进行参数估计,得到模型参数值;基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
    模型预测单元,用于基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的预测小波系数。
  11. 如权利要求10所述的异常流量监测装置,所述模型确定单元具体还用于:
    判断各层小波系数各自对应的偏自相关系数是否拖尾以及判断各层小波系数各自对应的自相关系数是否截尾;
    若各层小波系数各自对应的偏自相关系数均截尾、自相关系数均拖尾,则确定各层小波系数均适合自回归模型;
    若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均截尾,则确定各层小波系数均适合移动平均模型;
    若各层小波系数各自对应的偏自相关系数均拖尾、自相关系数均拖尾,则确定各层小波系数均适合自回归移动平均模型。
  12. 如权利要求11所述的异常流量监测装置,所述模型构建单元具体还用于:
    若各层小波系数均适合自回归模型,则基于所述预置定阶准则,确定待构建的自回归模型的阶数;对待构建的自回归模型进行参数估计,得到模型参数值;基于确定的阶数及通过参数估计得到的模型参数值,分别构建各层小波系数对应的自回归模型;基于构建的各自回归模型,分别对各层小波系数进行预测,得到各层对应的预测小波系数。
  13. 如权利要求11所述的异常流量监测装置,所述模型构建单元具体还用于:若各层小波系数均适合移动平均模型,则基于所述预置定阶准则,确定待构建的移动平均模型的阶数;对待构建的移动平均模型进行参数估计,得到模型参数值;基于确定的阶数及通过参数估计得到的模型参数值,分别构建各层小波系数对应的移动平均模型;基于构建的各移动平均模型,分别对各层小波系数进行预测,得到各层对应的预测小波系数。
  14. 如权利要求8所述的异常流量监测装置,所述分解模块中采用多分 辨分析算法进行多尺度小波分解的对应公式如下:
    cA j+1=H*cA j,cD j+1=G*cD j,j=1,2,...,J;
    采用逆小波变换进行小波重构的对应公式如下:
    cA j-1=H **cA j+G *cD j,j=1,2,...,J;
    其中,H、G为分解算子,H表示低通滤波器,G表示高通滤波器,H*、G*分别为分解算子H、G的对偶算子,cA 0表示原信号数据,cA j和cD j分别表示在分辨率2 -j下原信号数据的低频信号部分和高频信号部分,J表示最大分解层数。
  15. 一种异常流量监测设备,所述异常流量监测设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被所述处理器执行时实现以下所述的异常流量监测方法的步骤:
    基于预置埋点,收集预设时间段内的用户访问记录;
    对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
    通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
    以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
    采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
    以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与所述流量预测值进行比对;
    若所述实际流量值在所述流量预测值的置信区间内,则判定当前网络流量正常;若所述实际流量值超过所述流量预测值的置信区间,则判定当前网络流量异常。
  16. 如权利要求15所述的异常流量监测设备,所述所述计算机可读指令被所述处理器执行实现所述对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据的步骤时,还包括以下步骤:
    检测所述用户访问记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据所述缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
  17. 如权利要求15所述的异常流量监测设备,所述所述计算机可读指令被所述处理器执行实现所述以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数的步骤时,还包括以下步骤:
    分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
    若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
    若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
    若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
    根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
    若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;
    对待构建的自回归移动平均模型进行参数估计,得到模型参数值;
    基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
    基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
  18. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现以下所述的异常流量监测方法的步骤:
    基于预置埋点,收集预设时间段内的用户访问记录;
    对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据,所述第一流量时序数据反映访问量和时间的对应关系;
    通过预置的低通滤波器和高通滤波器,采用多分辨分析算法对所述第一流量时序数据进行多尺度小波分解,得到各层小波分解分别对应的小波系数;
    以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数;
    采用逆小波变换对各层对应的所述预测小波系数进行小波重构,得到第二流量时序数据;
    以所述第二流量时序数据为流量预测值,将同一时间对应的实际流量值与所述流量预测值进行比对;
    若所述实际流量值在所述流量预测值的置信区间内,则判定当前网络流量正常;若所述实际流量值超过所述流量预测值的置信区间,则判定当前网 络流量异常。
  19. 如权利要求18所述的计算机可读存储介质,所述所述计算机可读指令被处理器执行实现所述对所述用户访问记录中的原始数据进行清洗与统计处理,生成访问量对应的第一流量时序数据的步骤时,还包括以下步骤:
    检测所述用户访问记录中的原始数据是否存在缺失值;
    若存在缺失值,则计算每个字段对应的缺失值比例,并根据所述缺失值比例与字段重要程度进行缺失值清洗,所述缺失值清洗包括:删除缺失值字段、使用插值法补全缺失值;
    对所述用户访问记录中的原始数据进行排序,并计算排序后的每条记录与相邻记录之间的相似度;
    若不同记录之间的相似度超过预置阈值,则判定为重复记录并删除多余的数据;
    对清洗后的数据进行按照时间顺序进行访问量统计,生成访问量对应的所述第一流量时序数据。
  20. 如权利要求18所述的计算机可读存储介质,所述所述计算机可读指令被处理器执行实现所述以各层小波系数为分析对象,分别建立对应的平稳时间序列模型,并通过所述平稳时间序列模型对各层小波系数进行预测,得到各层对应的预测小波系数的步骤时,还包括以下步骤:
    分别对各层小波系数进行平稳性检测,以判断各层小波系数是否为平稳时间序列;
    若存在一层或多层小波系数为非平稳时间序列,则对所述一层或多层小波系数进行差分运算,直到任一层小波系数均为平稳时间序列;
    若任一层小波系数均为平稳时间序列,则分别对各层小波系数进行白噪声检测;
    若任一层小波系数均为平稳非白噪声时间序列,则分别计算各层小波系数的自相关系数及偏自相关系数;
    根据各层小波系数各自对应的自相关系数及偏自相关系数,分别确定各层小波系数适合的平稳时间序列模型;
    若各层小波系数均适合自回归移动平均模型,则基于预置定阶准则,确定待构建的自回归移动平均模型的阶数;
    对待构建的自回归移动平均模型进行参数估计,得到模型参数值;
    基于确定的阶数及所述模型参数值,分别构建各层小波系数对应的自回归移动平均模型;
    基于构建的各自回归移动平均模型,分别对各层小波系数进行预测,得到各层对应的所述预测小波系数。
PCT/CN2019/119204 2019-10-18 2019-11-18 异常流量监测方法、装置、设备及存储介质 WO2021072887A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910991177.5A CN110839016B (zh) 2019-10-18 2019-10-18 异常流量监测方法、装置、设备及存储介质
CN201910991177.5 2019-10-18

Publications (1)

Publication Number Publication Date
WO2021072887A1 true WO2021072887A1 (zh) 2021-04-22

Family

ID=69575425

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119204 WO2021072887A1 (zh) 2019-10-18 2019-11-18 异常流量监测方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110839016B (zh)
WO (1) WO2021072887A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391982A (zh) * 2021-08-17 2021-09-14 云智慧(北京)科技有限公司 一种监控数据的异常检测方法、装置及设备
CN113761022A (zh) * 2021-08-18 2021-12-07 浪潮电子信息产业股份有限公司 一种时序数据趋势预测方法、系统及相关装置
CN113938306A (zh) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 一种基于数据清洗规则的可信认证方法及系统
CN114593375A (zh) * 2022-03-30 2022-06-07 常州通用自来水有限公司 基于泵房能耗的二次供水小区管道漏损监测和定位方法
CN114637263A (zh) * 2022-03-15 2022-06-17 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN115204061A (zh) * 2022-09-09 2022-10-18 深圳市信润富联数字科技有限公司 自动确定冲压建模规模方法、装置、设备及存储介质
CN115412923A (zh) * 2022-10-28 2022-11-29 河北省科学院应用数学研究所 多源传感器数据可信融合方法、系统、设备及存储介质
CN116629843A (zh) * 2023-07-25 2023-08-22 山东比沃斯机电工程有限公司 智能化柴油发电机组的远程预警与维护决策支持系统
CN117240614A (zh) * 2023-11-13 2023-12-15 中通服网盈科技有限公司 一种基于互联网的网络信息安全监测预警系统
CN117421723A (zh) * 2023-10-07 2024-01-19 武汉卓讯互动信息科技有限公司 基于Server Mesh的微服务系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626322B (zh) * 2020-04-08 2024-01-05 中南大学 一种基于小波变换的加密流量的应用活动识别方法
CN111614690B (zh) * 2020-05-28 2022-10-11 上海观安信息技术股份有限公司 一种异常行为检测方法及装置
CN112637021A (zh) * 2020-12-31 2021-04-09 中国建设银行股份有限公司 一种基于线性回归算法的动态流量监控方法和装置
CN113037728B (zh) * 2021-02-26 2023-08-15 上海派拉软件股份有限公司 一种实现零信任的风险判定方法、装置、设备及介质
CN113487316A (zh) * 2021-07-22 2021-10-08 银清科技有限公司 分布式支付系统安全处理方法及装置
CN113849374B (zh) * 2021-09-28 2023-06-20 平安科技(深圳)有限公司 中央处理器占用率预测方法、系统、电子设备及存储介质
CN114048771B (zh) * 2021-11-09 2023-05-30 西安电子科技大学 基于自适应门限平稳小波变换的时序数据异常值处理方法
CN114615051A (zh) * 2022-03-09 2022-06-10 黄河水利职业技术学院 一种网络安全检测方法和系统
CN115442246B (zh) * 2022-08-31 2023-09-26 武汉烽火技术服务有限公司 数据平面网络的流量预测方法、装置、设备及存储介质
CN116821836B (zh) * 2023-08-31 2023-10-27 深圳特力自动化工程有限公司 基于多传感器的轮轴瓦异常状态监测方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6772185B1 (en) * 1999-06-02 2004-08-03 Japan Science And Technology Corporation Time-series predicting method using wavelet number series and device thereof
CN102083087A (zh) * 2011-01-25 2011-06-01 南京金思科技有限公司 一种主客观模型结合的话务量异常检测方法
US20160219067A1 (en) * 2015-01-28 2016-07-28 Korea Internet & Security Agency Method of detecting anomalies suspected of attack, based on time series statistics
CN106357456A (zh) * 2016-10-11 2017-01-25 广东工业大学 一种网络流量的预测方法及装置
CN107026763A (zh) * 2017-06-02 2017-08-08 广东电网有限责任公司中山供电局 一种基于流量分解的数据通信网流量预测方法
CN110210658A (zh) * 2019-05-22 2019-09-06 东南大学 基于小波变换的Prophet与高斯过程用户网络流量预测方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9680693B2 (en) * 2006-11-29 2017-06-13 Wisconsin Alumni Research Foundation Method and apparatus for network anomaly detection
CN100486179C (zh) * 2006-12-15 2009-05-06 华为技术有限公司 一种网络流量异常的检测方法及检测装置
CN102355381B (zh) * 2011-08-18 2014-03-12 网宿科技股份有限公司 自适应的差分自回归移动平均模型的流量预测方法和系统
CN104268408A (zh) * 2014-09-28 2015-01-07 江南大学 一种基于小波系数arma模型的能耗数据宏观预测方法
CN104506378B (zh) * 2014-12-03 2019-01-18 上海华为技术有限公司 一种预测数据流量的装置及方法
US11128648B2 (en) * 2018-01-02 2021-09-21 Maryam AMIRMAZLAGHANI Generalized likelihood ratio test (GLRT) based network intrusion detection system in wavelet domain

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6772185B1 (en) * 1999-06-02 2004-08-03 Japan Science And Technology Corporation Time-series predicting method using wavelet number series and device thereof
CN102083087A (zh) * 2011-01-25 2011-06-01 南京金思科技有限公司 一种主客观模型结合的话务量异常检测方法
US20160219067A1 (en) * 2015-01-28 2016-07-28 Korea Internet & Security Agency Method of detecting anomalies suspected of attack, based on time series statistics
CN106357456A (zh) * 2016-10-11 2017-01-25 广东工业大学 一种网络流量的预测方法及装置
CN107026763A (zh) * 2017-06-02 2017-08-08 广东电网有限责任公司中山供电局 一种基于流量分解的数据通信网流量预测方法
CN110210658A (zh) * 2019-05-22 2019-09-06 东南大学 基于小波变换的Prophet与高斯过程用户网络流量预测方法

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113391982B (zh) * 2021-08-17 2021-11-23 云智慧(北京)科技有限公司 一种监控数据的异常检测方法、装置及设备
CN113391982A (zh) * 2021-08-17 2021-09-14 云智慧(北京)科技有限公司 一种监控数据的异常检测方法、装置及设备
CN113761022A (zh) * 2021-08-18 2021-12-07 浪潮电子信息产业股份有限公司 一种时序数据趋势预测方法、系统及相关装置
CN113938306A (zh) * 2021-10-18 2022-01-14 北京八分量信息科技有限公司 一种基于数据清洗规则的可信认证方法及系统
CN113938306B (zh) * 2021-10-18 2024-01-30 北京八分量信息科技有限公司 一种基于数据清洗规则的可信认证方法及系统
CN114637263B (zh) * 2022-03-15 2024-01-12 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN114637263A (zh) * 2022-03-15 2022-06-17 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN114593375A (zh) * 2022-03-30 2022-06-07 常州通用自来水有限公司 基于泵房能耗的二次供水小区管道漏损监测和定位方法
CN115204061B (zh) * 2022-09-09 2023-01-06 深圳市信润富联数字科技有限公司 自动确定冲压建模规模方法、装置、设备及存储介质
CN115204061A (zh) * 2022-09-09 2022-10-18 深圳市信润富联数字科技有限公司 自动确定冲压建模规模方法、装置、设备及存储介质
CN115412923A (zh) * 2022-10-28 2022-11-29 河北省科学院应用数学研究所 多源传感器数据可信融合方法、系统、设备及存储介质
CN116629843A (zh) * 2023-07-25 2023-08-22 山东比沃斯机电工程有限公司 智能化柴油发电机组的远程预警与维护决策支持系统
CN116629843B (zh) * 2023-07-25 2023-10-20 山东比沃斯机电工程有限公司 智能化柴油发电机组的远程预警与维护决策支持系统
CN117421723A (zh) * 2023-10-07 2024-01-19 武汉卓讯互动信息科技有限公司 基于Server Mesh的微服务系统
CN117240614A (zh) * 2023-11-13 2023-12-15 中通服网盈科技有限公司 一种基于互联网的网络信息安全监测预警系统
CN117240614B (zh) * 2023-11-13 2024-01-23 中通服网盈科技有限公司 一种基于互联网的网络信息安全监测预警系统

Also Published As

Publication number Publication date
CN110839016A (zh) 2020-02-25
CN110839016B (zh) 2022-07-15

Similar Documents

Publication Publication Date Title
WO2021072887A1 (zh) 异常流量监测方法、装置、设备及存储介质
WO2019184557A1 (zh) 定位根因告警的方法、装置和计算机可读存储介质
US11561954B2 (en) Method and system to estimate the cardinality of sets and set operation results from single and multiple HyperLogLog sketches
CN110830450A (zh) 基于统计的异常流量监测方法、装置、设备及存储介质
CN111309539A (zh) 一种异常监测方法、装置和电子设备
CN111049858B (zh) 一种基于交叉验证的基线扫描漏洞去重方法、装置及设备
JP6196196B2 (ja) ログ間因果推定装置、システム異常検知装置、ログ分析システム、及びログ分析方法
US8661113B2 (en) Cross-cutting detection of event patterns
US10404524B2 (en) Resource and metric ranking by differential analysis
CN111147300B (zh) 一种网络安全告警置信度评估方法及装置
CN108804914A (zh) 一种异常数据检测的方法及装置
CN110598959A (zh) 一种资产风险评估方法、装置、电子设备及存储介质
CN111897851A (zh) 异常数据的确定方法、装置、电子设备及可读存储介质
CN116108376A (zh) 一种反窃电的监测系统、方法、电子设备及介质
CN114039765A (zh) 一种配电物联网的安全管控方法、装置及电子设备
CN110990810B (zh) 一种用户操作数据处理方法、装置、设备及存储介质
US11593245B2 (en) System, device and method for frozen period detection in sensor datasets
CN114118208A (zh) 基于多元信息的变压器故障判断方法、装置及电子设备
CN113746862A (zh) 一种基于机器学习的异常流量检测方法、装置和设备
CN112988536A (zh) 一种数据异常检测方法、装置、设备及存储介质
CN113407428B (zh) 人工智能系统的可靠性评估方法、装置和计算机设备
US20230409421A1 (en) Anomaly detection in computer systems
Li et al. Power data cleaning in micro grid
CN116915463B (zh) 一种调用链数据安全分析方法、装置、设备及存储介质
CN110113228B (zh) 一种网络连接检测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19948904

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19948904

Country of ref document: EP

Kind code of ref document: A1