CN113269327A - Flow anomaly prediction method based on machine learning - Google Patents

Flow anomaly prediction method based on machine learning Download PDF

Info

Publication number
CN113269327A
CN113269327A CN202110467791.9A CN202110467791A CN113269327A CN 113269327 A CN113269327 A CN 113269327A CN 202110467791 A CN202110467791 A CN 202110467791A CN 113269327 A CN113269327 A CN 113269327A
Authority
CN
China
Prior art keywords
characteristic
attribute
attributes
flow
correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110467791.9A
Other languages
Chinese (zh)
Inventor
刘泳锐
邢燕祯
秦志鹏
刘中金
陈解元
范广
杨朝晖
吕志梅
李华
安黎东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202110467791.9A priority Critical patent/CN113269327A/en
Publication of CN113269327A publication Critical patent/CN113269327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Abstract

The invention relates to a flow abnormity prediction method based on machine learning, which adopts a brand-new design strategy, comprehensively considers multidimensional characteristic attributes of network flow, combines the correlation among the characteristic attributes, designs a characteristic attribute screening strategy of iterative loop, determines each target characteristic attribute corresponding to the flow, trains aiming at a specified classification network based on the target characteristic attributes and an abnormal label definite in the network flow, obtains an abnormal flow prediction model, finally realizes the detection of whether the abnormal flow exists or not aiming at the target flow, and can effectively improve the working efficiency of network flow abnormity prediction.

Description

Flow anomaly prediction method based on machine learning
Technical Field
The invention relates to a flow anomaly prediction method based on machine learning, and belongs to the technical field of network flow anomaly detection.
Background
With the rapid development of network technology, the network application is five-door, and enterprises have to face more and more malicious network attacks and hacker intrusions. At present, enterprise network security comprehensively uses security products such as firewalls, intrusion monitoring, vulnerability scanning, patch distribution and the like, and aims to build a security management platform integrating functions such as access control, flow monitoring, bandwidth management, terminal management and the like. Through monitoring the network flow, the abnormal flow equipment is found in time, or early warning is carried out in advance according to a threshold value set by the system, so that the safety threat of the intranet network is effectively detected. Therefore, network traffic monitoring is an effective means for realizing management of the operation condition of the enterprise. With the continuous expansion of internet scale and the rapid increase of traffic, the security problem faced by the network is increasingly prominent. Network failures and malicious attacks can cause network flow abnormity, how to effectively monitor the internet flow in real time, find the network flow abnormity in time and send an alarm notice, so that network management personnel can take measures in time to ensure the normal operation of the network, and the method has important significance for improving network controllability and manageability.
Disclosure of Invention
The invention aims to solve the technical problem of providing a traffic anomaly prediction method based on machine learning, and the method can improve the working efficiency of network traffic anomaly detection by adopting a brand-new control strategy.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a flow abnormity prediction method based on machine learning, which comprises the following steps of i to v, obtaining an abnormal flow prediction model, and A, applying the abnormal flow prediction model to detect whether the target flow is abnormal;
step i, extracting characteristic values of each designated feature attribute to be selected, which are respectively corresponding to each sample network flow, based on a preset number of sample network flows and a classification label result of whether each sample network flow is respectively corresponding to abnormality, and then entering step ii;
step ii, based on the characteristic values of the designated feature attributes to be selected respectively corresponding to the sample network flows, performing deletion and deletion processing on the feature attributes to be selected, updating the feature attributes to be initially selected, performing data preprocessing on the characteristic values of the initially selected feature attributes respectively corresponding to the sample network flows, and then entering step iii;
step iii, based on the characteristic values of the sample network flows respectively corresponding to the primary selection characteristic attributes, screening and obtaining the middle-level characteristic attributes according to the correlation among the primary selection characteristic attributes, namely obtaining the characteristic values of the sample network flows respectively corresponding to the middle-level characteristic attributes, and then entering the step iv;
step iv, updating each intermediate characteristic attribute to each target characteristic attribute through a cyclic mode of characteristic attribute derivation and deletion aiming at each intermediate characteristic attribute according to whether the number of the intermediate characteristic attributes meets the preset characteristic attribute number range or not, and then entering the step v;
v, training aiming at a specified classification network to obtain an abnormal flow prediction model based on characteristic values of the sample network flows corresponding to the characteristic attributes of the targets respectively and classification label results of the sample network flows corresponding to the abnormal flow prediction model respectively, wherein the characteristic values of the sample network flows corresponding to the characteristic attributes of the targets are used as input, and the classification label results of the sample network flows corresponding to the abnormal flow prediction model are used as output;
and step A, obtaining the characteristic value of each target characteristic attribute corresponding to the target flow, applying an abnormal flow prediction model to obtain a classification label result of whether the target flow is abnormal or not, and realizing the detection of whether the target flow is abnormal or not.
As a preferred technical scheme of the invention: the step ii comprises the following steps ii-1 to ii-2;
step ii-1, obtaining numerical value missing rates corresponding to the characteristic attributes to be selected respectively based on characteristic values of the characteristic attributes to be selected respectively corresponding to the sample network flows, deleting the characteristic attributes to be selected with the numerical value missing rates larger than a preset threshold missing rate, updating the remaining characteristic attributes to be selected to be initial characteristic attributes, and then entering step ii-2;
and ii-2, respectively corresponding the characteristic values of the initially selected characteristic attributes to each sample network flow, performing 0 supplementing operation to missing values in the characteristic values, updating the characteristic values of the initially selected characteristic attributes respectively corresponding to each sample network flow, and entering the step iii.
As a preferred technical scheme of the invention: said step iii comprises steps iii-1 to iii-3;
step iii-1, obtaining a characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively based on the characteristic value corresponding to each initially selected characteristic attribute of each sample network flow respectively and combining the conversion from the characteristic value to the vector, and then entering step iii-2;
step iii-2, obtaining the correlation between every two initially selected characteristic attributes according to the characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively and in a horse-type distance calculation mode, and then entering step iii-3;
step iii-3, sorting the correlations in descending order and selecting the top
Figure BDA0003043960520000021
Obtaining each correlation, obtaining each primary selection characteristic attribute corresponding to each correlation as each intermediate characteristic attribute, namely obtaining the characteristic value of each sample network flow corresponding to each intermediate characteristic attribute respectively, then entering step iv, wherein A represents the combination number formed by every two primary selection characteristic attributes, a represents the preset proportional value less than 1,
Figure BDA0003043960520000022
indicating rounding up.
As a preferred technical scheme of the invention: said step iv comprises steps iv-1 to iv-6;
step iv-1, judging whether the quantity of the middle-level characteristic attributes is lower than the preset characteristic attribute quantity range, if so, entering step iv-2, otherwise, entering step iv-4;
step iv-2, respectively aiming at each intermediate-level feature attribute, obtaining the correlation between each intermediate-level feature attribute and each primary-selected feature attribute based on each primary-selected feature attribute except the intermediate-level feature attribute in all the primary-selected feature attributes, and selecting the primary-selected feature attribute corresponding to the maximum correlation as the primary-selected feature attribute associated with the intermediate-level feature attribute; further acquiring the initially selected characteristic attributes respectively associated with the middle-level characteristic attributes, and then entering the step iv-3;
iv-3, respectively sorting the correlation between each middle-level characteristic attribute and the associated initially selected characteristic attribute from big to small, and selecting the front-level characteristic attribute
Figure BDA0003043960520000031
Updating each initially selected characteristic attribute related to each correlation into each intermediate characteristic attribute, and returning to the step iv-1; wherein B represents the number of the initial selection characteristic attributes before and during the operation of updating the initial selection characteristic attributes to the middle-level characteristic attributes in the step iv-3, B represents a preset proportion value smaller than 1,
Figure BDA0003043960520000032
represents rounding up;
step iv-4, judging whether the quantity of the middle-level characteristic attributes is higher than the preset characteristic attribute quantity range, if so, entering step iv-5, otherwise, entering step iv-6;
step iv-5, obtaining the correlation between every two middle-level feature attributes, sorting the correlation from low to high, and selecting the correlation before
Figure BDA0003043960520000033
Each correlation is further deleted, each middle-level characteristic attribute related to each correlation is updated to each middle-level characteristic attribute, and the step iv-1 is returned; wherein C represents the number of combinations of every two middle-level feature attributes before the middle-level feature attribute deletion operation in the step iv-5, C represents a preset proportion value smaller than 1,
Figure BDA0003043960520000034
represents rounding up;
and iv-6, updating each middle-level characteristic attribute to each target characteristic attribute, further obtaining the characteristic value of each sample network flow corresponding to each target characteristic attribute respectively, and then entering the step v.
As a preferred technical scheme of the invention: the target flow is network flow between appointed terminals obtained through a flow probe.
Compared with the prior art, the flow abnormity prediction method based on machine learning has the following technical effects by adopting the technical scheme:
the invention designs a flow abnormity prediction method based on machine learning, which adopts a brand-new design strategy, comprehensively considers multidimensional characteristic attributes of network flow, combines the correlation among the characteristic attributes, designs a characteristic attribute screening strategy of iterative loop, determines each target characteristic attribute corresponding to the flow, trains aiming at a specified classification network based on the target characteristic attributes and an abnormal label determined by the network flow, obtains an abnormal flow prediction model, finally realizes the detection of whether the target flow is abnormal or not aiming at the target flow, and can effectively improve the working efficiency of network flow abnormity prediction.
Drawings
Fig. 1 is a schematic flow chart of a flow anomaly prediction method based on machine learning according to the present invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs a flow abnormity prediction method based on machine learning, as shown in fig. 1, the following steps i to v are executed to obtain an abnormal flow prediction model.
And i, extracting the characteristic values of the designated feature attributes to be selected, which are respectively corresponding to the sample network flows, based on the preset number of sample network flows and the classification label result of whether each sample network flow is respectively corresponding to abnormality, and then entering the step ii.
And ii, based on the characteristic values of the designated characteristic attributes to be selected respectively corresponding to the sample network flows, performing deletion and deletion processing on the characteristic attributes to be selected, updating the characteristic attributes to be initially selected, performing data preprocessing on the characteristic values of the initially selected characteristic attributes respectively corresponding to the sample network flows, and then entering the step iii.
In practical applications, the step ii includes the following steps ii-1 to ii-2.
And step ii-1, obtaining numerical value missing rates corresponding to the characteristic attributes to be selected respectively based on the characteristic values of the characteristic attributes to be selected respectively corresponding to the sample network flows, deleting the characteristic attributes to be selected with the numerical value missing rates larger than a preset threshold missing rate, updating the remaining characteristic attributes to be selected to be the initial characteristic attributes, and then entering the step ii-2.
And ii-2, respectively corresponding the characteristic values of the initially selected characteristic attributes to each sample network flow, performing 0 supplementing operation to missing values in the characteristic values, updating the characteristic values of the initially selected characteristic attributes respectively corresponding to each sample network flow, and entering the step iii.
And iii, screening and obtaining each intermediate-level characteristic attribute according to the characteristic value of each sample network flow corresponding to each primary-selected characteristic attribute respectively and the correlation among the primary-selected characteristic attributes, namely obtaining the characteristic value of each sample network flow corresponding to each intermediate-level characteristic attribute respectively, and then entering the step iv.
In practice, step iii above includes steps iii-1 to iii-3.
And step iii-1, obtaining a characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively based on the characteristic values corresponding to each initially selected characteristic attribute respectively of each sample network flow and combining the conversion from the characteristic values to the vectors, and then entering step iii-2.
And iii-2, obtaining the correlation between every two initially selected characteristic attributes according to the characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively and in a horse-type distance calculation mode, and then entering the step iii-3.
Step iii-3, sorting the correlations in descending order and selecting the top
Figure BDA0003043960520000041
Obtaining each correlation, obtaining each primary selection characteristic attribute corresponding to each correlation as each intermediate characteristic attribute, namely obtaining the characteristic value of each sample network flow corresponding to each intermediate characteristic attribute respectively, then entering step iv, wherein A represents the combination number formed by every two primary selection characteristic attributes, a represents the preset proportional value less than 1,
Figure BDA0003043960520000051
indicating rounding up.
And iv, updating each intermediate characteristic attribute to each target characteristic attribute by a cyclic mode of characteristic attribute derivation and deletion aiming at each intermediate characteristic attribute according to whether the number of the intermediate characteristic attributes meets the preset characteristic attribute number range or not, and then entering the step v.
In practice, step iv includes steps iv-1 to iv-6.
And iv-1, judging whether the quantity of the middle-level characteristic attributes is lower than a preset characteristic attribute quantity range, if so, entering a step iv-2, and otherwise, entering a step iv-4.
Step iv-2, respectively aiming at each intermediate-level feature attribute, obtaining the correlation between each intermediate-level feature attribute and each primary-selected feature attribute based on each primary-selected feature attribute except the intermediate-level feature attribute in all the primary-selected feature attributes, and selecting the primary-selected feature attribute corresponding to the maximum correlation as the primary-selected feature attribute associated with the intermediate-level feature attribute; and then the initially selected feature attributes associated with the intermediate-level feature attributes are obtained, and then the step iv-3 is carried out.
Iv-3, respectively sorting the correlation between each middle-level characteristic attribute and the associated initially selected characteristic attribute from big to small, and selecting the front-level characteristic attribute
Figure BDA0003043960520000052
Updating each initially selected characteristic attribute related to each correlation into each intermediate characteristic attribute, and returning to the step iv-1; wherein B represents the number of the initial selection characteristic attributes before and during the operation of updating the initial selection characteristic attributes to the middle-level characteristic attributes in the step iv-3, B represents a preset proportion value smaller than 1,
Figure BDA0003043960520000053
indicating rounding up.
And iv-4, judging whether the quantity of the middle-level characteristic attributes is higher than the preset characteristic attribute quantity range, if so, entering the step iv-5, and otherwise, entering the step iv-6.
Step iv-5, obtaining the correlation between every two middle-level feature attributes, sorting the correlation from low to high, and selecting the correlation before
Figure BDA0003043960520000054
Each correlation is further deleted, each middle-level characteristic attribute related to each correlation is updated to each middle-level characteristic attribute, and the step iv-1 is returned; wherein C represents the number of combinations of every two middle-level feature attributes before the middle-level feature attribute deletion operation in the step iv-5, C represents a preset proportion value smaller than 1,
Figure BDA0003043960520000055
indicating rounding up.
And iv-6, updating each middle-level characteristic attribute to each target characteristic attribute, further obtaining the characteristic value of each sample network flow corresponding to each target characteristic attribute respectively, and then entering the step v.
In step iv, the feature attributes are derived and deleted for each intermediate-level feature attribute, and in practical applications, the derivation and deletion may be performed as follows.
Selecting characteristics: selecting features, calculating with Pearson correlation Coefficient, finding out features with high correlation, and using these features as in-mode features (X, Y stand for two variables)
Figure BDA0003043960520000061
Characteristic derivation: and the residual features after feature selection are less, so that some secondary features are derived based on the existing time and other features as primary features, and then feature selection and feature deletion are carried out until the proper features are subjected to mode entering training.
Characteristic deletion: (ii) feature deletion, judgment by Variance initialization Factor: (
Figure BDA0003043960520000062
Is a decision coefficient between the kth argument and the remaining arguments)
Figure BDA0003043960520000063
For example, the derived feature is introduced into a Variance initialization Factor to be judged, if 0< VIF <10, the data is normal data, and if VIF >10, the collinearity of the feature is severe, and the feature needs to be deleted.
And v, training aiming at the specified classification network to obtain an abnormal flow prediction model based on the characteristic values of the sample network flows corresponding to the characteristic attributes of the targets respectively and the classification label results of the sample network flows corresponding to the abnormal conditions respectively, taking the characteristic values of the sample network flows corresponding to the characteristic attributes of the targets as input, and taking the classification label results of the sample network flows corresponding to the abnormal conditions as output.
For the specified classification network here, and the training for it, the application design is as follows:
model training: the differential autoregressive moving average model (ARIMA) is a model consisting of an autoregressive model, a moving average model and a differential method. Generally, the method comprises three stages of model identification and order determination, parameter estimation and model inspection.
1. And (3) model identification and order determination, which mainly comprises the steps of determining three parameters of p, d and q, wherein d is the difference times when the time sequence is stable, p is a corresponding autoregressive term, and q is the number of moving average terms. The order d of the difference is typically 1 or 2. The values of p and q are typically determined using a partial autocorrelation function pacf (partial autocorrelation function). The partial autocorrelation function PACF describes a linear correlation between a time series observation and its past observations given an intermediate observation.
(1) And (4) checking the variance, the trend and the seasonal change rule of the dispersion, the autocorrelation function and the partial autocorrelation function graph of the time sequence by using an ADF unit root, and identifying the stationarity of the sequence.
(2) And carrying out smoothing treatment on the non-stationary sequence. If the data sequence is non-stationary and has a certain increasing or decreasing trend, the data needs to be processed differentially, and if the data has an variance, the data needs to be processed technically until the autocorrelation function value and the partial correlation function value of the processed data are not significantly different from zero.
The variance of the time sequence can be stabilized by taking log, and the difference can be used for eliminating the variation of the time sequence, so that the average value of the time sequence is stabilized, and the trend elimination and the periodicity are achieved.
(3) And establishing a corresponding model according to the identification rule of the time series model. If the partial correlation function of the stationary sequence is truncated and the autocorrelation function is trailing, it can be concluded that the sequence fits the AR model; if the partial correlation function of the stationary sequence is tail-biting and the autocorrelation function is tail-biting, it can be concluded that the sequence fits the MA model; if both the partial correlation function and the autocorrelation function of the stationary sequence are tail-shifted, the sequence fits the ARMA model.
The p-order AR model can be written as:
Figure BDA0003043960520000071
we limit the AR model to stationary data only:
for the AR (1) model:
Figure BDA0003043960520000072
for the AR (2) model:
Figure BDA0003043960520000073
the MA model uses past prediction errors, unlike the AR model which uses past prediction variables.
Figure BDA0003043960520000074
If we combine the AR and MA models and differentiate, we can get an ARIMA model. The model can be written as:
Figure BDA0003043960520000075
the constant c is important in long-term prediction:
if c is 0 and d is 0, the long-term prediction tends to be 0
If c is 0 and d is 1, the long-term prediction tends to be a non-zero constant
If c is 0 and d is 2, the long-term prediction value becomes a straight line
If c ≠ 0 and d ≠ 0, the long-term prediction tends towards the average of the data
If c ≠ 0 and d ≠ 1, the long-term prediction value will become a straight line
If c ≠ 0 and d ≠ 2, the long-term prediction value will become quadratic parabola
2. And (5) parameter estimation, and checking whether the statistical significance is achieved.
3. And (5) carrying out model test to diagnose whether the residual error sequence is white noise.
4. Predictive analysis was performed using models that have passed the test.
Model optimization: with the same prediction error, according to Occam's Razor, Ockham's Razor (Osckham Razor criteria), the smaller the model, the better. The prediction error and the number of parameters are balanced, and the order of the model can be determined according to an information criterion function method. The prediction error is usually expressed in terms of a squared error, i.e. the sum of the squared residuals. Information criterion function
ACI 2 x (number of model parameters) -2ln (maximum likelihood function of model)
And (3) model evaluation: the real-time network flow of the switch is collected and analyzed, and the established model is trained by using the model building module to verify the model.
The basic idea of the ARIMA model is to convert a non-stationary time sequence into a stationary time sequence, then the dependent variable is subjected to regression modeling only on the hysteresis value of the dependent variable and the current value and the hysteresis value of a random error term, and the ARMIA model has four forms: moving average model-ma (q), autoregressive model-ar (p), autoregressive moving average model ARMA (p, q) and differential autoregressive moving average model ARIMA (p, d, q), where conditions permit, we can choose a model with fewer parameters.
Based on the method, an abnormal flow prediction model is obtained, and then in practical application, the following step a is further executed to detect whether the target flow is abnormal.
And step A, acquiring network flow between appointed terminals through a flow probe to serve as target flow, acquiring characteristic values of the target flow corresponding to each target characteristic attribute, applying an abnormal flow prediction model, acquiring a classification label result of whether the target flow corresponds to abnormality or not, and realizing detection of whether the target flow is abnormal or not.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (5)

1. A flow abnormity prediction method based on machine learning is characterized in that: executing the following steps i to v to obtain an abnormal flow prediction model, and executing the following step A to detect whether the target flow is abnormal or not by applying the abnormal flow prediction model;
step i, extracting characteristic values of each designated feature attribute to be selected, which are respectively corresponding to each sample network flow, based on a preset number of sample network flows and a classification label result of whether each sample network flow is respectively corresponding to abnormality, and then entering step ii;
step ii, based on the characteristic values of the designated feature attributes to be selected respectively corresponding to the sample network flows, performing deletion and deletion processing on the feature attributes to be selected, updating the feature attributes to be initially selected, performing data preprocessing on the characteristic values of the initially selected feature attributes respectively corresponding to the sample network flows, and then entering step iii;
step iii, based on the characteristic values of the sample network flows respectively corresponding to the primary selection characteristic attributes, screening and obtaining the middle-level characteristic attributes according to the correlation among the primary selection characteristic attributes, namely obtaining the characteristic values of the sample network flows respectively corresponding to the middle-level characteristic attributes, and then entering the step iv;
step iv, updating each intermediate characteristic attribute to each target characteristic attribute through a cyclic mode of characteristic attribute derivation and deletion aiming at each intermediate characteristic attribute according to whether the number of the intermediate characteristic attributes meets the preset characteristic attribute number range or not, and then entering the step v;
v, training aiming at a specified classification network to obtain an abnormal flow prediction model based on characteristic values of the sample network flows corresponding to the characteristic attributes of the targets respectively and classification label results of the sample network flows corresponding to the abnormal flow prediction model respectively, wherein the characteristic values of the sample network flows corresponding to the characteristic attributes of the targets are used as input, and the classification label results of the sample network flows corresponding to the abnormal flow prediction model are used as output;
and step A, obtaining the characteristic value of each target characteristic attribute corresponding to the target flow, applying an abnormal flow prediction model to obtain a classification label result of whether the target flow is abnormal or not, and realizing the detection of whether the target flow is abnormal or not.
2. The flow anomaly prediction method based on machine learning according to claim 1, characterized in that: the step ii comprises the following steps ii-1 to ii-2;
step ii-1, obtaining numerical value missing rates corresponding to the characteristic attributes to be selected respectively based on characteristic values of the characteristic attributes to be selected respectively corresponding to the sample network flows, deleting the characteristic attributes to be selected with the numerical value missing rates larger than a preset threshold missing rate, updating the remaining characteristic attributes to be selected to be initial characteristic attributes, and then entering step ii-2;
and ii-2, respectively corresponding the characteristic values of the initially selected characteristic attributes to each sample network flow, performing 0 supplementing operation to missing values in the characteristic values, updating the characteristic values of the initially selected characteristic attributes respectively corresponding to each sample network flow, and entering the step iii.
3. The flow anomaly prediction method based on machine learning according to claim 1, characterized in that: said step iii comprises steps iii-1 to iii-3;
step iii-1, obtaining a characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively based on the characteristic value corresponding to each initially selected characteristic attribute of each sample network flow respectively and combining the conversion from the characteristic value to the vector, and then entering step iii-2;
step iii-2, obtaining the correlation between every two initially selected characteristic attributes according to the characteristic vector sequence of the characteristic value sequence corresponding to each initially selected characteristic attribute respectively and in a horse-type distance calculation mode, and then entering step iii-3;
step iii-3, the correlations are expressed bySorting in big to small order and selecting front
Figure FDA0003043960510000021
Obtaining each correlation, obtaining each primary selection characteristic attribute corresponding to each correlation as each intermediate characteristic attribute, namely obtaining the characteristic value of each sample network flow corresponding to each intermediate characteristic attribute respectively, then entering step iv, wherein A represents the combination number formed by every two primary selection characteristic attributes, a represents the preset proportional value less than 1,
Figure FDA0003043960510000022
indicating rounding up.
4. The flow anomaly prediction method based on machine learning according to claim 1, characterized in that: said step iv comprises steps iv-1 to iv-6;
step iv-1, judging whether the quantity of the middle-level characteristic attributes is lower than the preset characteristic attribute quantity range, if so, entering step iv-2, otherwise, entering step iv-4;
step iv-2, respectively aiming at each intermediate-level feature attribute, obtaining the correlation between each intermediate-level feature attribute and each primary-selected feature attribute based on each primary-selected feature attribute except the intermediate-level feature attribute in all the primary-selected feature attributes, and selecting the primary-selected feature attribute corresponding to the maximum correlation as the primary-selected feature attribute associated with the intermediate-level feature attribute; further acquiring the initially selected characteristic attributes respectively associated with the middle-level characteristic attributes, and then entering the step iv-3;
iv-3, respectively sorting the correlation between each middle-level characteristic attribute and the associated initially selected characteristic attribute from big to small, and selecting the front-level characteristic attribute
Figure FDA0003043960510000023
Updating each initially selected characteristic attribute related to each correlation into each intermediate characteristic attribute, and returning to the step iv-1; wherein B represents that the initial selection feature attribute is updated to be the intermediate-level feature in the step iv-3The number of the middle-level characteristic attributes before the attribute operation, b represents a preset proportional value smaller than 1,
Figure FDA0003043960510000024
represents rounding up;
step iv-4, judging whether the quantity of the middle-level characteristic attributes is higher than the preset characteristic attribute quantity range, if so, entering step iv-5, otherwise, entering step iv-6;
step iv-5, obtaining the correlation between every two middle-level feature attributes, sorting the correlation from low to high, and selecting the correlation before
Figure FDA0003043960510000025
Each correlation is further deleted, each middle-level characteristic attribute related to each correlation is updated to each middle-level characteristic attribute, and the step iv-1 is returned; wherein C represents the number of combinations of every two middle-level feature attributes before the middle-level feature attribute deletion operation in the step iv-5, C represents a preset proportion value smaller than 1,
Figure FDA0003043960510000026
represents rounding up;
and iv-6, updating each middle-level characteristic attribute to each target characteristic attribute, further obtaining the characteristic value of each sample network flow corresponding to each target characteristic attribute respectively, and then entering the step v.
5. The flow anomaly prediction method based on machine learning according to claim 1, characterized in that: the target flow is network flow between appointed terminals obtained through a flow probe.
CN202110467791.9A 2021-04-28 2021-04-28 Flow anomaly prediction method based on machine learning Pending CN113269327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110467791.9A CN113269327A (en) 2021-04-28 2021-04-28 Flow anomaly prediction method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110467791.9A CN113269327A (en) 2021-04-28 2021-04-28 Flow anomaly prediction method based on machine learning

Publications (1)

Publication Number Publication Date
CN113269327A true CN113269327A (en) 2021-08-17

Family

ID=77229664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110467791.9A Pending CN113269327A (en) 2021-04-28 2021-04-28 Flow anomaly prediction method based on machine learning

Country Status (1)

Country Link
CN (1) CN113269327A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242431A (en) * 2022-06-10 2022-10-25 国家计算机网络与信息安全管理中心 Industrial Internet of things data anomaly detection method based on random forest and long-short term memory network
CN116708313A (en) * 2023-08-08 2023-09-05 中国电信股份有限公司 Flow detection method, flow detection device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242431A (en) * 2022-06-10 2022-10-25 国家计算机网络与信息安全管理中心 Industrial Internet of things data anomaly detection method based on random forest and long-short term memory network
CN116708313A (en) * 2023-08-08 2023-09-05 中国电信股份有限公司 Flow detection method, flow detection device, storage medium and electronic equipment
CN116708313B (en) * 2023-08-08 2023-11-14 中国电信股份有限公司 Flow detection method, flow detection device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN110647539B (en) Prediction method and system for vehicle faults
CA2931624A1 (en) Systems and methods for event detection and diagnosis
CN112187528B (en) Industrial control system communication flow online monitoring method based on SARIMA
CN113269327A (en) Flow anomaly prediction method based on machine learning
CN109917777B (en) Fault detection method based on mixed multi-sampling rate probability principal component analysis model
CN108829878B (en) Method and device for detecting abnormal points of industrial experimental data
CN110083507B (en) Key performance index classification method and device
CN110570544A (en) method, device, equipment and storage medium for identifying faults of aircraft fuel system
CN111782484B (en) Anomaly detection method and device
CN115858794B (en) Abnormal log data identification method for network operation safety monitoring
CN113687972A (en) Method, device and equipment for processing abnormal data of business system and storage medium
CN114004331A (en) Fault analysis method based on key indexes and deep learning
US20210287525A1 (en) Method for learning latest data considering external influences in early warning system and system for same
CN114218998A (en) Power system abnormal behavior analysis method based on hidden Markov model
CN114674511B (en) Bridge modal anomaly early warning method for eliminating time-varying environmental factor influence
CN108761250B (en) Industrial control equipment voltage and current-based intrusion detection method
CN116668083A (en) Network traffic anomaly detection method and system
Rizvi et al. Real-time ZIP load parameter tracking using adaptive window and variable elimination with realistic synthetic synchrophasor data
CN116346405A (en) Network security operation and maintenance capability evaluation system and method based on data statistics
Moraes et al. Comparing the inertial effect of MEWMA and multivariate sliding window schemes with confidence control charts
Kang et al. Real-time process quality control for business activity monitoring
CN114637793B (en) Equipment fault frequent region positioning method based on big data analysis
Zhang et al. A brief survey of different statistics for detecting multiplicative faults in multivariate statistical process monitoring
CN117114454B (en) DC sleeve state evaluation method and system based on Apriori algorithm
Carta et al. Bad Data Detection and Handling in ICT Platforms for Energy Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication