CN115801604A - Method for predicting network flow characteristic value - Google Patents

Method for predicting network flow characteristic value Download PDF

Info

Publication number
CN115801604A
CN115801604A CN202310101070.5A CN202310101070A CN115801604A CN 115801604 A CN115801604 A CN 115801604A CN 202310101070 A CN202310101070 A CN 202310101070A CN 115801604 A CN115801604 A CN 115801604A
Authority
CN
China
Prior art keywords
network flow
time sequence
characteristic value
model
predicting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310101070.5A
Other languages
Chinese (zh)
Other versions
CN115801604B (en
Inventor
姜达成
谭帅帅
刘文印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202310101070.5A priority Critical patent/CN115801604B/en
Publication of CN115801604A publication Critical patent/CN115801604A/en
Application granted granted Critical
Publication of CN115801604B publication Critical patent/CN115801604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting a network flow characteristic value, which comprises the steps of obtaining a network flow in an Internet of things network; extracting a characteristic value in the network flow, and obtaining a time sequence based on the characteristic value; checking the time series; and predicting the time sequence through an ARIMA model or normal distribution based on the test result to obtain a network flow characteristic value prediction result.

Description

Method for predicting network flow characteristic value
Technical Field
The invention relates to the technical field of equipment identification of the Internet of things, in particular to a method for predicting a network flow characteristic value.
Background
Along with the development of the ecosystem of the Internet of things, the method has a trend of characterizing and identifying fingerprints of the equipment of the Internet of things. In the fingerprint identification model, a network flow feature vector is needed for judgment, but the feature vector can only be extracted through complete network flow, and the feature vector of the network flow cannot be obtained in real time, so that correct identification cannot be performed on the internet of things equipment.
In order to ensure accurate identification of the internet of things device, how to accurately obtain a feature vector of a network flow, namely a network flow feature value, is an urgent problem to be solved.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method for predicting a network flow characteristic value, which can accurately obtain a characteristic vector of a network flow.
In order to achieve the technical purpose, the invention provides the following technical scheme: a method for predicting network flow characteristic values comprises the following steps:
acquiring a network flow in an Internet of things network; extracting a characteristic value of the network flow, and obtaining a time sequence based on the characteristic value;
checking the time sequence, and checking and judging whether the time sequence is sufficiently random or not; and predicting the time sequence through an ARIMA model or normal distribution based on the test result to obtain a network flow characteristic value prediction result.
Optionally, the network flow in the internet of things network includes data packets less than the actual number of the network flow data packets, that is, the network flow is an incomplete network flow.
Optionally, after the randomness of the time sequence is tested, if the time sequence is not sufficiently random, predicting the time sequence by using an ARIMA model, and if the time sequence is sufficiently random, predicting the time sequence by using normal distribution.
Optionally, the specific process of checking the time series includes:
constructing and initializing a hysteresis value, and iteratively updating the hysteresis value until the hysteresis value reaches a preset condition;
performing significance analysis on the time series based on the lag value after updating is stopped to obtain statistic;
and judging a threshold value of the statistic, wherein when the statistic is smaller than a preset threshold value, the time sequence is sufficiently random, otherwise, the time sequence is not sufficiently random.
Optionally, the preset condition is:
Figure SMS_1
wherein, the first and the second end of the pipe are connected with each other,
Figure SMS_2
in order to be a value of the hysteresis,
Figure SMS_3
in order to round down the function,
Figure SMS_4
is the number of network flow packets.
Optionally, the process of predicting the time series by the ARIMA model includes:
establishing an ARIMA model, inputting the characteristic time sequence into the ARIMA model for parameter fitting to obtain model parameters, substituting the model parameters into the updated ARIMA model, and inputting the characteristic value sequence number t into the updated ARIMA model to obtain a predicted characteristic value
Figure SMS_5
Namely, the network flow characteristic value prediction result:
Figure SMS_6
wherein the ARIMA model
Figure SMS_7
Comprises the following steps:
Figure SMS_8
wherein the content of the first and second substances,
Figure SMS_10
for the order of the autoregressive model,Din order to be a degree of difference,
Figure SMS_12
for the order of the moving average model,
Figure SMS_14
in order to fix the hysteresis operator, the operator,
Figure SMS_11
for the i-th lag operator, the lag operator,
Figure SMS_13
are the parameters of the autoregressive model,
Figure SMS_15
are the parameters of the moving average model and,
Figure SMS_16
is the term for the error as a function of,
Figure SMS_9
is a constant term.
Optionally, the process of predicting the time series by normal distribution includes: calculating a mean of eigenvalues in a time series
Figure SMS_17
(ii) a Calculating variance of eigenvalues in a time series
Figure SMS_18
(ii) a Obtaining a predicted eigenvalue based on the mean and variance
Figure SMS_19
Namely, the network flow characteristic value prediction result:
Figure SMS_20
wherein the content of the first and second substances,
Figure SMS_21
returning a random sample for normal distribution formed by given mean and variance.
The invention has the following technical effects:
the invention judges whether the network flow is sufficiently random or not by analyzing the time sequence of the network flow in transmission, and if the network flow is insufficiently random, the network flow is predicted by using an ARIMA model; if the network flow is sufficiently random, the normal distribution is used for predicting the network flow, so that the characteristic vector of the network flow can be accurately and effectively predicted, and other work can be better carried out.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a method according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the invention provides a method for predicting a network flow characteristic value, which includes capturing a network flow transmitted in an internet of things network, extracting a network flow characteristic value and a time sequence thereof, selecting a prediction method according to the randomness of the characteristic value time sequence, selecting an ARIMA model if the prediction method is not sufficiently random, and selecting normal distribution if the prediction method is sufficiently random, thereby obtaining a predicted network flow characteristic value to perform the next work.
The overall process comprises the following steps: s1, capturing a network stream to obtain an incomplete network stream; s2, extracting characteristic values and time sequences thereof; s3, judging whether the time sequence is sufficiently random or not; s4, forecasting by using an ARIMA model when the prediction is not sufficiently random; and S5, sufficiently randomly predicting by using normal distribution.
The method comprises the following specific steps:
s1 for a certain transmission in the Internet of things networkCapturing the network flow, and recording the captured network flow as
Figure SMS_22
Is a first
Figure SMS_23
A packet of data, wherein,
Figure SMS_24
for the actual number of packets of a stream, co-capture
Figure SMS_25
The number of the packets is one,
Figure SMS_26
obtaining a partial network flow as the difference between the actual number of packets and the number of packets captured
S2, extracting characteristic values and time series thereof
S21 extracting captured network streams
Figure SMS_27
The feature vector, i.e. the feature value, of each packet in the packet and stored in
Figure SMS_28
In (1), the expression is as follows:
Figure SMS_29
wherein the content of the first and second substances,
Figure SMS_30
the number of the characteristic values is determined by the selected characteristic values;
Figure SMS_31
co-extracting for different extracted eigenvalue functions
Figure SMS_32
Then
S22, calculating the time sequence of the corresponding characteristic values according to the extracted characteristic values, and recording the time sequence as
Figure SMS_33
Description of the drawings: taking the average time of arrival of a packet as an example,
Figure SMS_34
and so on, the time series of each characteristic value can be obtained.
S3, detecting whether the characteristic value time sequence is sufficiently random or not, and using Ljung-Box to test the randomness of the time sequence;
s31 initializing random tag variables
Figure SMS_35
Wherein when
Figure SMS_36
Time, explain the time series
Figure SMS_37
Not sufficiently random, prediction was performed using an ARIMA model; when in use
Figure SMS_38
Time, explain the time series
Figure SMS_39
Is sufficiently random to predict using a normal distribution
S32, go through
Figure SMS_40
Starting with a step size of 1, up to
Figure SMS_41
Stopping;
wherein the content of the first and second substances,
Figure SMS_42
to find parameters
Figure SMS_43
The minimum value of (a) to (b),
Figure SMS_44
is a rounded down function;
s321 calculates an assumed value
Figure SMS_45
And the calculation formula is used for judging whether the time sequence is sufficiently random or not, and is as follows:
Figure SMS_46
wherein the content of the first and second substances,
Figure SMS_47
is the size of the time series of stream characteristic values,
Figure SMS_48
is a lagkThe auto-correlation of (a) with (b),
Figure SMS_49
is the hysteresis value in the test.
S33 presumes the value
Figure SMS_50
Comparison with an empirical value of 0.05:
description of the drawings: the empirical value can be set by itself, and the invention is set to 0.05
S331 if
Figure SMS_51
Let us order
Figure SMS_52
S332 if
Figure SMS_53
Without performing an operation
S34 will
Figure SMS_54
And
Figure SMS_55
and (3) comparison:
when in use
Figure SMS_56
Time, explain the time series
Figure SMS_57
Not sufficiently random, and predicting future feature values using an ARIMA model
Figure SMS_58
Time, explain the time series
Figure SMS_59
Is sufficiently random to predict future eigenvalues using a normal distribution.
S4 if
Figure SMS_60
Time series
Figure SMS_61
Not sufficiently random, predicting future eigenvalues using an ARIMA model;
s41, establishing an ARIMA model and predicting characteristic values
Figure SMS_62
The formula of (1) is as follows:
Figure SMS_63
wherein the content of the first and second substances,
Figure SMS_65
the order of the autoregressive model is,Din order to be a degree of difference,
Figure SMS_68
for the order of the moving average model,
Figure SMS_70
for the i-th lag operator, the lag operator,
Figure SMS_66
in order to fix the hysteresis operator, the operator,
Figure SMS_69
is the ith parameter of the autoregressive model,
Figure SMS_71
for the ith parameter of the moving average model,
Figure SMS_72
is the term for the error as a function of,
Figure SMS_64
is a constant term that is used to determine,
Figure SMS_67
whereinNThe number of parameters for the autoregressive/moving average model.
Description of the drawings: an autoregressive model: a method of processing time series using the same variable, e.g.
Figure SMS_73
The previous stages of, i.e.
Figure SMS_74
To
Figure SMS_75
To predict the current period
Figure SMS_76
And assuming that they are in a linear relationship;
moving average model: the current value of the time series is a model formed by a linear function of a random error term and a lag error term
S42 time-series of characteristic values
Figure SMS_77
Inputting the data into an ARIMA model, and performing parameter fitting to obtain the order of the autoregressive model
Figure SMS_78
Difference degree of
Figure SMS_79
Order of moving average model
Figure SMS_80
S43 serial number of characteristic value to be predicted
Figure SMS_81
Inputting the characteristic values into an ARIMA model to obtain predicted characteristic values
Figure SMS_82
The formula is as follows
Figure SMS_83
S5 if
Figure SMS_84
Is not equal to
Figure SMS_85
Description of time series
Figure SMS_86
Is sufficiently random to predict future eigenvalues using normal distributions
S51, calculating the mean value of the characteristic values
Figure SMS_87
The formula is as follows:
Figure SMS_88
s52, calculating variance of characteristic value
Figure SMS_89
The formula is as follows:
Figure SMS_90
s53 randomly extracting random samples from the Gaussian distribution to generate a Gaussian distribution
Figure SMS_91
As a future eigenvalue
Figure SMS_92
Figure SMS_93
Wherein the content of the first and second substances,
Figure SMS_94
returning a random sample for normal distribution formed by given mean and variance.
The invention extracts the characteristic value and the characteristic value time sequence of the network flow being transmitted, selects a prediction method according to the random degree of the characteristic value time sequence, selects an ARIMA model if the random degree is insufficient, and selects normal distribution if the random degree is sufficient, thereby obtaining the predicted characteristic value, carrying out the development of the next work, and providing help for the works such as network flow length prediction, network flow deformation and the like.
In the current network flow analysis method, the characteristics of the non-payload of the network flow, namely, the characteristics are mainly focused on the derivation of data except the data needing to be transmitted. Such derived characteristic values may be referred to as original characteristics, such as the mean, maximum, minimum and standard deviation of the PDU inter-arrival time derived from the inter-arrival time sequence between the packets.
The invention judges whether the characteristic value time sequence is sufficiently random or not by extracting the characteristic value and the characteristic value time sequence of the network flow being transmitted, if the characteristic value time sequence is insufficiently random, the ARIMA model is used for predicting the characteristic value time sequence; if the network flow length is sufficiently random, the normal distribution is used for predicting the network flow length, so that the next work can be better carried out, and the network flow length prediction, the network flow deformation and other works are helped.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A method for predicting network flow characteristic values is characterized by comprising the following steps:
acquiring a network flow in an Internet of things network; extracting a characteristic value of the network flow, and obtaining a time sequence based on the characteristic value;
checking the time sequence, and checking and judging whether the time sequence is sufficiently random or not; and predicting the time sequence through an ARIMA model or normal distribution based on the test result to obtain a network flow characteristic value prediction result.
2. The prediction method according to claim 1, characterized in that:
the network flow in the internet of things network comprises data packets less than the actual data packet quantity of the network flow, namely the network flow is an incomplete network flow.
3. The prediction method according to claim 1, characterized in that:
and after the randomness of the time sequence is tested, predicting the time sequence through an ARIMA model if the time sequence is not sufficiently random, and predicting the time sequence through normal distribution if the time sequence is sufficiently random.
4. The prediction method according to claim 1, characterized in that:
the specific process for checking the time series comprises the following steps:
constructing and initializing a hysteresis value, and iteratively updating the hysteresis value until the hysteresis value reaches a preset condition;
performing significance analysis on the time series based on the lag value after updating is stopped to obtain statistic;
and judging a threshold value of the statistic, wherein when the statistic is smaller than a preset threshold value, the time sequence is sufficiently random, otherwise, the time sequence is not sufficiently random.
5. The prediction method according to claim 4, wherein:
the preset conditions are as follows:
Figure QLYQS_1
wherein the content of the first and second substances,
Figure QLYQS_2
in order to be the value of the hysteresis,
Figure QLYQS_3
in order to get the function of the integer downwards,
Figure QLYQS_4
is the number of network flow packets.
6. The prediction method according to claim 3, characterized in that:
the process of predicting the time series by the ARIMA model comprises the following steps:
establishing an ARIMA model, inputting the characteristic time sequence into the ARIMA model for parameter fitting to obtain model parameters, substituting the model parameters into the updated ARIMA model, and inputting the characteristic value sequence number t into the updated ARIMA model to obtain a predicted characteristic value
Figure QLYQS_5
Namely, the network flow characteristic value prediction result:
Figure QLYQS_6
wherein the ARIMA model
Figure QLYQS_7
Comprises the following steps:
Figure QLYQS_9
wherein the content of the first and second substances,
Figure QLYQS_13
for the order of the autoregressive model,Din order to be a degree of difference,
Figure QLYQS_14
for the order of the moving average model,
Figure QLYQS_10
in order to fix the hysteresis operator, the operator,
Figure QLYQS_12
for the i-th lag operator, the lag operator,
Figure QLYQS_15
are the parameters of the autoregressive model,
Figure QLYQS_16
are the parameters of the moving average model and,
Figure QLYQS_8
is the term for the error as a function of,
Figure QLYQS_11
is a constant term.
7. The prediction method according to claim 3, characterized in that:
the process of predicting the time series by the normal distribution includes:
calculating a mean of eigenvalues in a time series
Figure QLYQS_17
Calculating variance of eigenvalues in a time series
Figure QLYQS_18
Obtaining a predicted eigenvalue based on the mean and variance
Figure QLYQS_19
Namely, the network flow characteristic value prediction result:
Figure QLYQS_20
wherein the content of the first and second substances,
Figure QLYQS_21
returning a random sample for normal distribution formed by given mean and variance.
CN202310101070.5A 2023-02-13 2023-02-13 Prediction method of network flow characteristic value Active CN115801604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310101070.5A CN115801604B (en) 2023-02-13 2023-02-13 Prediction method of network flow characteristic value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310101070.5A CN115801604B (en) 2023-02-13 2023-02-13 Prediction method of network flow characteristic value

Publications (2)

Publication Number Publication Date
CN115801604A true CN115801604A (en) 2023-03-14
CN115801604B CN115801604B (en) 2023-05-02

Family

ID=85430836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310101070.5A Active CN115801604B (en) 2023-02-13 2023-02-13 Prediction method of network flow characteristic value

Country Status (1)

Country Link
CN (1) CN115801604B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002039254A1 (en) * 2000-11-09 2002-05-16 Spss Inc. System and method for building a time series model
CN106990763A (en) * 2017-04-20 2017-07-28 浙江大学 A kind of Vertical Mill operation regulator control system and method based on data mining
CN109951358A (en) * 2019-03-21 2019-06-28 北京交通大学 Data network method for predicting
CN111882135A (en) * 2020-08-05 2020-11-03 杭州安恒信息技术股份有限公司 Internet of things equipment intrusion detection method and related device
CN112929214A (en) * 2021-02-02 2021-06-08 北京明朝万达科技股份有限公司 Model construction method, device, equipment and storage medium
CN115695046A (en) * 2022-12-28 2023-02-03 广东工业大学 Network intrusion detection method based on reinforcement ensemble learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002039254A1 (en) * 2000-11-09 2002-05-16 Spss Inc. System and method for building a time series model
CN106990763A (en) * 2017-04-20 2017-07-28 浙江大学 A kind of Vertical Mill operation regulator control system and method based on data mining
CN109951358A (en) * 2019-03-21 2019-06-28 北京交通大学 Data network method for predicting
CN111882135A (en) * 2020-08-05 2020-11-03 杭州安恒信息技术股份有限公司 Internet of things equipment intrusion detection method and related device
CN112929214A (en) * 2021-02-02 2021-06-08 北京明朝万达科技股份有限公司 Model construction method, device, equipment and storage medium
CN115695046A (en) * 2022-12-28 2023-02-03 广东工业大学 Network intrusion detection method based on reinforcement ensemble learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田中大;李树江;王艳红;高宪文;: "经验模式分解与时间序列分析在网络流量预测中的应用" *

Also Published As

Publication number Publication date
CN115801604B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN111796957B (en) Transaction abnormal root cause analysis method and system based on application log
CN110222148B (en) Confidence evaluation method and device suitable for grammar analysis
CN114218998A (en) Power system abnormal behavior analysis method based on hidden Markov model
CN117041017A (en) Intelligent operation and maintenance management method and system for data center
CN117056834A (en) Big data analysis method based on decision tree
CN112926621A (en) Data labeling method and device, electronic equipment and storage medium
CN116361191A (en) Software compatibility processing method based on artificial intelligence
CN117784710B (en) Remote state monitoring system and method for numerical control machine tool
CN117562311A (en) Detection system of high-performance electronic cigarette atomizer
CN110111311B (en) Image quality evaluation method and device
CN117849193A (en) Online crack damage monitoring method for neodymium iron boron sintering
CN115801604A (en) Method for predicting network flow characteristic value
JP7484065B1 (en) Control device and method for intelligent manufacturing equipment
CN111814776A (en) Image processing method, device, server and storage medium
CN117171619A (en) Intelligent power grid terminal network anomaly detection model and method
CN117056902A (en) Password management method and system for Internet of things
WO2024103470A1 (en) Performance testing system and method for escalator production
CN113239075A (en) Construction data self-checking method and system
CN114553473A (en) Abnormal login behavior detection system and method based on login IP and login time
CN113393325A (en) Transaction detection method, intelligent device and computer storage medium
CN111798237A (en) Abnormal transaction diagnosis method and system based on application log
CN115329968B (en) Method, system and electronic equipment for determining fairness of quantum machine learning algorithm
CN112598118B (en) Method, device, storage medium and equipment for processing abnormal labeling in supervised learning
CN113641848B (en) Online assembly connection guiding method and system for electronic product
CN114374561B (en) Network security state evaluation method, device and storable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant