CN111046323A - Network traffic data preprocessing method based on EMD - Google Patents

Network traffic data preprocessing method based on EMD Download PDF

Info

Publication number
CN111046323A
CN111046323A CN201911343753.1A CN201911343753A CN111046323A CN 111046323 A CN111046323 A CN 111046323A CN 201911343753 A CN201911343753 A CN 201911343753A CN 111046323 A CN111046323 A CN 111046323A
Authority
CN
China
Prior art keywords
sequence
data
emd
imf
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911343753.1A
Other languages
Chinese (zh)
Inventor
尚立
赵炜
杨会峰
李井泉
徐珊
刘芳
董正坤
李英敏
郭少勇
徐思雅
杨杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
Information and Telecommunication Branch of State Grid Hebei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing University of Posts and Telecommunications
Information and Telecommunication Branch of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing University of Posts and Telecommunications, Information and Telecommunication Branch of State Grid Hebei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201911343753.1A priority Critical patent/CN111046323A/en
Publication of CN111046323A publication Critical patent/CN111046323A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations

Abstract

The invention discloses a network flow data preprocessing method based on EMD, relating to the technical field of information communication; the network flow sequence is decomposed by adopting EMD and an EMD decomposition subsequence is obtained, so that the complexity of time sequence data is reduced; according to the method, the network flow sequence is decomposed by adopting the EMD, the EMD decomposition subsequence is obtained, and the like, so that the applicability of network flow data preprocessing is improved, the data integrity is maintained, and the data characteristic information is enriched.

Description

Network traffic data preprocessing method based on EMD
Technical Field
The invention relates to the technical field of information communication, in particular to a network traffic data preprocessing method based on EMD.
Background
The network flow data in the power data communication network reflects the data flow condition in the current power data communication network and can also be used as an information basis for judging the operation condition of the power data communication network. The single network flow data is preprocessed, so that richer and more reliable information can be provided for later analysis, prediction, fault diagnosis and the like of the network flow data, and the preprocessing of the network flow data has important research value and application prospect.
Network traffic data is essentially time series data, and in recent years, many scholars have studied on the preprocessing of time series data, and many methods for preprocessing time series data have been proposed. Currently, methods for preprocessing time series data include multivariate time series data complementation, clustering algorithms, and the like. However, because the network traffic data sequence is affected by various uncertain factors, the influencing factors are difficult to express, and the network traffic sequence has the complex characteristics of high nonlinearity and non-stationarity, the problems that the processed data does not have applicability, the data information is lost and the like easily occur by adopting the traditional preprocessing method.
Therefore, how to improve the applicability of network traffic data preprocessing, maintain data integrity, enrich data characteristic information, and the like is a problem to be solved by those skilled in the art.
In order to solve the development state of the prior art, the existing patents and documents are searched, compared and analyzed, and the following technical information with high relevance to the invention is screened out:
patent scheme 1: 201610818702.X preprocessing method and device for multi-source time sequence data
The invention provides a preprocessing method and device for multi-source time sequence data. The method comprises the following steps: acquiring and analyzing multi-source time sequence data, namely acquiring original data with different structures from different data sources respectively, and converting the original data with different structures into a plurality of time sequence data with a unified structure; a data cleaning step of cleaning the plurality of time series data with the unified structure; and a preprocessing step aiming at the characteristics of the time sequence data, and performing mutual verification and supplementation by using a plurality of time sequence data describing the same object according to the specific attributes of the time sequence data. The method solves the problem that multi-source time sequence data can not be thoroughly preprocessed in the prior art, so that more complete and more reliable structured time sequence data can be obtained, and subsequent data analysis and prediction are facilitated.
Patent scheme 2: 201710158447.5 network flow time sequence prediction method based on distributed clustering
The invention provides a network flow time sequence prediction method based on distributed clustering, which comprises network flow data preprocessing of a distributed clustering algorithm. The method comprises the following steps: the time slice tuple is obtained by carrying out fragmentation processing on the time slice data, distributed clustering preprocessing is carried out on the time slice tuple by using a distributed K-means clustering algorithm, normal distribution is obtained by carrying out normal fitting on a clustering result, data preparation is carried out for a distributed time sequence prediction, and the accuracy of network flow time sequence prediction is improved.
Patent scheme 3: 201810174986.2 time sequence data prediction method, device and equipment
The invention provides a time sequence data prediction method, a time sequence data prediction device and time sequence data prediction equipment, wherein the time sequence data prediction method comprises a time sequence data preprocessing method, and the time sequence data preprocessing method comprises the following steps: acquiring historical time sequence data, and performing data cleaning and data slicing on the historical time sequence data to obtain a corresponding time sequence data sequence; and carrying out stabilization operation on the time sequence data sequence, and carrying out feature reconstruction on the time sequence data sequence subjected to the stabilization operation by adopting an immune genetic feature reconstruction algorithm to obtain a corresponding feature sequence. The method is different from the prior art that the data set characteristics are acquired by a sampling method, and the effectiveness of the acquired time sequence data characteristics is ensured through the steps of data preprocessing, stabilizing operation, characteristic reconstruction and the like, so that the time sequence characteristics of the time sequence data can be learned by a deep learning model, and the prediction accuracy of the deep learning model is ensured.
The defects of the prior art are as follows:
the defects of the above patent scheme 1: the scheme acquires a plurality of time sequence data from different data sources for processing, converts original data into a plurality of time sequence data with the same structure, then carries out data cleaning on the time sequence data, and finally carries out time sequence data preprocessing according to the specific attribute of the time sequence data to complement the characteristics of the time sequence data. In the scheme, different data sources are mainly collected for processing, and the characteristics of the time sequence data are complemented and complemented by utilizing the time sequence data of multiple sources, but in practical situations, the data are difficult to collect many times, and the data of the multiple sources are difficult to collect, so that the universality of the scheme is not high.
The defects of the above patent scheme 2: the scheme provides a network flow time sequence preprocessing method based on distributed clustering. Dividing time sequence data into time slices with fixed length, storing the time slices in a multi-tuple form, combining a value of a next time point corresponding to each time slice tuple with the time slice tuple to be recorded as a binary group, then carrying out distributed clustering on the binary group, and clustering the time slice tuples by using a k-means clustering algorithm, thereby completing data preprocessing and preparing data for subsequent prediction. In the scheme, a distributed clustering algorithm is mainly adopted for data preprocessing, data is provided for subsequent normal fitting and prediction correction, so that the data preprocessed basically only aims at the scheme, and more general network traffic data preprocessing cannot be popularized.
The defect of the above patent scheme 3: the scheme provides a sequential data preprocessing method for subsequent prediction, which comprises the steps of firstly obtaining historical sequential data to carry out data cleaning and data slicing, then carrying out stabilization operation on the sequential data sequence, and carrying out feature reconstruction on the sequential data sequence subjected to the stabilization operation by adopting an immune genetic feature reconstruction algorithm to obtain a corresponding feature sequence. In the scheme, a smoothing and immune genetic characteristic reconstruction algorithm is mainly adopted for time sequence data preprocessing, the preprocessing method is complex, and the smoothing and reconstruction operations are adopted, so that part of data information is actually eliminated in the preprocessing process, and all complete information is not reserved in the data characteristics.
Problems with the prior art and considerations:
how to solve the technical problems to be solved by the application, such as improving the applicability of network traffic data preprocessing, maintaining the integrity of data, enriching data characteristic information and the like.
Disclosure of Invention
The invention aims to solve the technical problem of providing a network traffic data preprocessing method based on EMD, which improves the applicability of network traffic data preprocessing, keeps data integrity and enriches data characteristic information by decomposing a network traffic sequence and obtaining an EMD decomposition subsequence and the like by adopting EMD.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a network flow data preprocessing method based on EMD is to decompose a network flow sequence by adopting EMD and obtain an EMD decomposition subsequence, thereby reducing the complexity of time sequence data.
The further technical scheme is as follows: the method specifically comprises the following steps:
s1, acquiring historical network flow data;
s2, mirror image continuation of the network flow data sequence, the extended time sequence is used as the original time sequence of the EMD;
s3, initializing an original time sequence, wherein i is 1;
s4, obtaining the ith IMF;
s5, subtracting the newly obtained IMF component from the original sequence;
s6, if the number of extreme points obtained in the remaining sequences is still more than 2, calculating i to i +1, and turning to the step S4, otherwise, turning to the step S7;
s7, the decomposition is finished, and the remaining sequence is the residual component.
The further technical scheme is as follows: the step S2 specifically includes:
s21, finding out the network traffic data sequence x (t) ═ x (t)1),x(t2),…,x(tn) All the maximum and minimum points of the symbol are set as xM(i) I ∈ {1,2, …, M }, corresponding to a time point TM(i) I is equal to {1,2, …, M }, and the minimum point is xN(i) I ∈ {1,2, …, N }, corresponding to a time of: t isN(i),i∈{1,2,…,N};
S22, performing continuation on the left end of the sequence x (t), there are two cases:
(1)TM(1)<TN(1) with the axis of continuation passing through TM(1) Longitudinal axis of (a):
TM(-i+2)=TM(i)-2TM(1),xM(-i+2)=xM(i) wherein i>1;
TN(-i+1)=TN(i)-2TM(1),xN(-i+1)=xN(i);
(2)TN(1)<TM(1) With the axis of continuation passing through TN(1) Longitudinal axis of (a):
TM(-i+1)=TM(i)-2TN(1),xM(-i+1)=xM(i);
TN(-i+2)=TM(i)-2TM(1),xN(-i+2)=xN(i) wherein i>1;
S23, extending the right end of the sequence x (t), which includes the following two cases:
(1)TM(M)<TN(N) the continuation symmetry axis is through TMLongitudinal axis of (M):
TM(M+i)=2TM(M)-TM(M-i),xM(M+i)=xM(M-i);
TN(N+i)=2TM(M)-TN(N-i+1),xN(N+i)=xN(N-i+1);
(2)TN(N)<TM(M) the continuation symmetry axis is through TNLongitudinal axis of (N):
TM(M+i)=2TN(N)-TM(M-i+1),xM(M+i)=xM(M-i+1);
TN(N+i)=2TN(N)-TN(N-i),xN(N+i)=xN(N-i)。
the further technical scheme is as follows: the step S3 specifically includes: initialization time sequence, r0=x(t),i=1。
The further technical scheme is as follows: the step S4 specifically includes:
s41, initialization: h is0=ri-1(t),j=1;
S42, finding hj-1(t) all local maxima and local minima points;
s43, pair hj-1(t) performing cubic spline function interpolation on all the maximum and minimum value points to form an upper line envelope line;
s44, calculating the average value of the upper envelope and the lower envelope to form an average envelope mi-1(t);
S45, subtracting the average envelope from the original sequence to obtain a new sequence:
hj(t)=hj-1(t)-mi-1(t)
s46, judgment hj(t) whether IMF function conditions are satisfied, and if so, hj(t) is the IMF functionNumber imfi(t)=hj(t), otherwise, j equals j +1, and the process goes to step S42.
The further technical scheme is as follows: the step S5 specifically includes: r isi(t)=ri-1(t)-imfi(t)。
The further technical scheme is as follows: the algorithm end of the step S7 can be finally verified to obtain:
Figure BDA0002332789460000051
i.e. the sum of all IMF sequences and residual components is the original sequence.
The further technical scheme is as follows: the method is run on a server basis.
The further technical scheme is as follows: the server displays the EMD decomposition sub-sequence through a display connected thereto.
The further technical scheme is as follows: the server prints the EMD decomposition sub-sequence through the printer connected thereto.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in:
a network flow data preprocessing method based on EMD is to decompose a network flow sequence by adopting EMD and obtain an EMD decomposition subsequence, thereby reducing the complexity of time sequence data. According to the method, the network flow sequence is decomposed by adopting the EMD, the EMD decomposition subsequence is obtained, and the like, so that the applicability of network flow data preprocessing is improved, the data integrity is maintained, and the data characteristic information is enriched.
See detailed description of the preferred embodiments.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an EMD decomposition subsequence diagram in the invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the application, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways than those described herein, and it will be apparent to those of ordinary skill in the art that the present application is not limited to the specific embodiments disclosed below.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
In the description of the present application, it is to be understood that the orientation or positional relationship indicated by the directional terms such as "front, rear, upper, lower, left, right", "lateral, vertical, horizontal" and "top, bottom", etc., are generally based on the orientation or positional relationship shown in the drawings, and are used for convenience of description and simplicity of description only, and in the case of not making a reverse description, these directional terms do not indicate and imply that the device or element being referred to must have a particular orientation or be constructed and operated in a particular orientation, and therefore, should not be considered as limiting the scope of the present application; the terms "inner and outer" refer to the inner and outer relative to the profile of the respective component itself.
Spatially relative terms, such as "above … …," "above … …," "above … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial relationship to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. For example, if a device in the figures is turned over, devices described as "above" or "on" other devices or configurations would then be oriented "below" or "under" the other devices or configurations. Thus, the exemplary term "above … …" can include both an orientation of "above … …" and "below … …". The device may be otherwise variously oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.
It should be noted that the terms "first", "second", and the like are used to define the components, and are only used for convenience of distinguishing the corresponding components, and the terms have no special meanings unless otherwise stated, and therefore, the scope of protection of the present application is not to be construed as being limited.
As shown in fig. 1, the present invention discloses a network traffic data preprocessing method based on EMD, which decomposes a network traffic sequence by using EMD and obtains an EMD decomposition subsequence, thereby reducing the complexity of time sequence data.
The method specifically comprises the following steps:
and S1, acquiring historical network traffic data.
And S2, mirroring the continuation network flow data sequence, wherein the extended time sequence is used as the original time sequence of the EMD. The method specifically comprises the following steps:
s21, finding out the network traffic data sequence x (t) ═ x (t)1),x(t2),…,x(tn) All the maximum and minimum points of the symbol are set as xM(i) I ∈ {1,2, …, M }, corresponding to a time point TM(i) I is equal to {1,2, …, M }, and the minimum point is xN(i) I ∈ {1,2, …, N }, corresponding to a time of: t isN(i),i∈{1,2,…,N}。
S22, performing continuation on the left end of the sequence x (t), there are two cases:
(1)TM(1)<TN(1) with the axis of continuation passing through TM(1) Longitudinal axis of (a):
TM(-i+2)=TM(i)-2TM(1),xM(-i+2)=xM(i) wherein i>1。
TN(-i+1)=TN(i)-2TM(1),xN(-i+1)=xN(i)。
(2)TN(1)<TM(1) With the axis of continuation passing through TN(1) Longitudinal axis of (a):
TM(-i+1)=TM(i)-2TN(1),xM(-i+1)=xM(i)。
TN(-i+2)=TM(i)-2TM(1),xN(-i+2)=xN(i) wherein i>1。
S23, extending the right end of the sequence x (t), which includes the following two cases:
(1)TM(M)<TN(N) the continuation symmetry axis is through TMLongitudinal axis of (M):
TM(M+i)=2TM(M)-TM(M-i),xM(M+i)=xM(M-i)。
TN(N+i)=2TM(M)-TN(N-i+1),xN(N+i)=xN(N-i+1)。
(2)TN(N)<TM(M) the continuation symmetry axis is through TNLongitudinal axis of (N):
TM(M+i)=2TN(N)-TM(M-i+1),xM(M+i)=xM(M-i+1)。
TN(N+i)=2TN(N)-TN(N-i),xN(N+i)=xN(N-i)。
s3, initializing the original time sequence, i being 1. The method specifically comprises the following steps: initialization time sequence, r0=x(t),i=1。
S4, the ith IMF is obtained. The method specifically comprises the following steps:
s41, initialization: h is0=ri-1(t),j=1。
S42, finding hj-1All local maximum points and local minimum points of (t).
S43, pair hj-1And (t) respectively carrying out cubic spline function interpolation on all the maximum and minimum value points of the (t) to form an upper line envelope line.
S44, calculating the average value of the upper envelope and the lower envelope to form an average envelope mi-1(t)。
S45, subtracting the average envelope from the original sequence to obtain a new sequence:
hj(t)=hj-1(t)-mi-1(t)
s46, judgment hj(t) whether IMF function conditions are satisfied, and if so, hj(t) is the IMF function, IMFi(t)=hj(t), otherwise, j equals j +1, and the process goes to step S42.
S5, subtracting the newly obtained IMF component from the original sequence. The method specifically comprises the following steps: r isi(t)=ri-1(t)-imfi(t)。
And S6, if the number of extreme points obtained in the residual sequence is still more than 2, calculating i to i +1, and turning to the step S4, otherwise, turning to the step S7.
S7, the decomposition is finished, and the remaining sequence is the residual component.
Finally, it can be verified that:
Figure BDA0002332789460000081
i.e. the sum of all IMF sequences and residual components is the original sequence.
The EMD algorithm, the formula and the parameters in the specific steps are not described herein again for the prior art.
The purpose of the invention is as follows:
the network flow data preprocessing aims to provide reliable data for network planning and maintenance and provide more characteristic information for data analysis and data prediction. Most of the existing network traffic data preprocessing methods adopt methods such as data complementation, clustering algorithm, data feature reconstruction and the like to perform data preprocessing according to the characteristic that network traffic data is essentially time sequence data, but have the problems of low applicability, complex steps, data information loss and the like. In order to solve the above problems, the present patent proposes an EMD-based network traffic data preprocessing method. According to the method, the data complexity is reduced by performing EMD on a single network flow data sequence, data information is enriched, and the method has high applicability. For the network traffic data in the common situation, the change of the network traffic data is influenced by various factors which are difficult to express, and the sequence has the complex characteristics of high nonlinearity and non-stationarity. In order to improve the effect of the network traffic data in the applications of data analysis, data prediction and the like, the invention integrates the EMD method in the field of signal analysis into the network traffic data preprocessing. According to the network traffic data preprocessing method based on the EMD, the EMD is utilized to decompose the complex and variable nonlinear network traffic data into a smoother sequence, the complexity of the data sequence is effectively reduced on the premise that data information is not lost, the characteristic information of the data sequence is enriched, and the difficulty is reduced for possible analysis or prediction operation later. The invention aims to overcome the defects in the prior art on the basis of the prior art, improve the applicability of data preprocessing, reduce the complexity of data and enrich the characteristic information of the data.
The technical contribution of the invention is as follows:
first, the present invention needs to explain variables used in the EMD-based network traffic data preprocessing method. The variables used were as follows:
r0an original time sequence;
hj(t) the jth subsequence;
imfi(t): the ith imf sequence;
ri(t): the original sequence has the residual components after the i imf sequences removed.
The network traffic data preprocessing method based on the EMD decomposes a network traffic sequence by using the EMD, reduces the complexity of time sequence data and enriches characteristic information. The solution according to the invention is explained in detail below with reference to fig. 1, with the above-defined variables.
As shown in fig. 1, the steps are described as follows:
s1, acquiring historical network flow data;
s2, mirror image continuation of the network flow data sequence, the extended time sequence is used as the original time sequence of the EMD;
s3, initializing an original time sequence, wherein i is 1;
s4, obtaining the ith IMF;
s5, subtracting the newly obtained IMF component from the original sequence;
s6, if the number of extreme points obtained in the remaining sequences is still more than 2, calculating i to i +1, and turning to the step S4, otherwise, turning to the step S7;
s7, the decomposition is finished, and the remaining sequence is the residual component.
Wherein, definition 1: the Intrinsic Mode Function, IMF for short. IMF is a function that satisfies the following requirements:
(1) the number of extreme points of an eigenmode function must be equal to the number of zero crossings, or the number of both differs by only one.
(2) At all points in time, the average of the upper envelope defined by the local maxima and the lower envelope defined by the local minima is zero.
Wherein, step S2 specifically includes:
s21, finding out the network traffic data sequence x (t) ═ x (t)1),x(t2),…,x(tn) All the maximum and minimum points of the symbol are set as xM(i) I ∈ {1,2, …, M }, corresponding to a time point TM(i) I is equal to {1,2, …, M }, and the minimum point is xN(i) I ∈ {1,2, …, N }, corresponding to a time of: t isN(i),i∈{1,2,…,N}。
S22, performing continuation on the left end of the sequence x (t), there are two cases:
(1)TM(1)<TN(1) with the axis of continuation passing through TM(1) Longitudinal axis of (a):
TM(-i+2)=TM(i)-2TM(1),xM(-i+2)=xM(i) wherein i>1;
TN(-i+1)=TN(i)-2TM(1),xN(-i+1)=xN(i)。
(2)TN(1)<TM(1) With the axis of continuation passing through TN(1) Longitudinal axis of (a):
TM(-i+1)=TM(i)-2TN(1),xM(-i+1)=xM(i);
TN(-i+2)=TM(i)-2TM(1),xN(-i+2)=xN(i) wherein i>1。
S23, extending the right end of the sequence x (t), which includes the following two cases:
(1)TM(M)<TN(N) the continuation symmetry axis is through TMLongitudinal axis of (M):
TM(M+i)=2TM(M)-TM(M-i),xM(M+i)=xM(M-i);
TN(N+i)=2TM(M)-TN(N-i+1),xN(N+i)=xN(N-i+1)。
(2)TN(N)<TM(M) the continuation symmetry axis is through TNLongitudinal axis of (N):
TM(M+i)=2TN(N)-TM(M-i+1),xM(M+i)=xM(M-i+1);
TN(N+i)=2TN(N)-TN(N-i),xN(N+i)=xN(N-i)。
wherein, step S3 specifically includes: initialization time sequence, r0=x(t),i=1;
Wherein, step S4 specifically includes:
s41, initialization: h is0=ri-1(t),j=1;
S42, finding hj-1(t) all local maxima and local minima points;
s43, pair hj-1(t) performing cubic spline function interpolation on all the maximum and minimum value points to form an upper line envelope line;
s44, calculating the average value of the upper envelope and the lower envelope to form an average envelope mi-1(t);
S45, subtracting the average envelope from the original sequence to obtain a new sequence:
hj(t)=hj-1(t)-mi-1(t)
s46, judgment hj(t) whether IMF function conditions are satisfied, and if so, hj(t) is the IMF function, IMFi(t)=hj(t), otherwise, j ═ j +1, go to step S42;
wherein, step S5 specifically includes:
ri(t)=ri-1(t)-imfi(t)
wherein, the end of the S7 algorithm can be finally verified to obtain:
Figure BDA0002332789460000111
i.e. the sum of all IMF sequences and residual components is the original sequence.
The key points of the invention are as follows:
the network flow data preprocessing is widely applied to various fields of networks, the network flow data sequence is a nonlinear time sequence in nature, but the network flow data preprocessing is influenced by various uncertain factors and has the characteristic of high instability, the network flow data is difficult to express and apply due to the characteristic, and further planning and maintaining of future networks become difficult. For this reason, network traffic data preprocessing is very important. The invention provides a network flow data preprocessing method based on EMD. Compared with the prior work, the main contributions of the invention lie in the following aspects:
(1) different from the prior method, the network traffic prediction method provided by the invention combines EMD decomposition in the field of signal analysis, and aims to decompose highly nonlinear and unstable network traffic sequences into a plurality of relatively stable sequences, reduce the difficulty of network traffic prediction and simplify the expression of subsequent models.
(2) The invention reserves complete network flow data information in data preprocessing and enriches data characteristics.
After the application runs secretly for a period of time, the feedback of field technicians has the advantages that:
the invention decomposes the highly unstable network flow sequence into more stable subsequences by EMD decomposition, and simultaneously ensures that data information is not lost. Through EMD decomposition, complex and changeable network flow data can be decomposed into more stable and easily expressed subsequences, and complete, rich and reliable information is provided for the following aspects that the processed data can be widely applied to data analysis, data prediction and the like.
Examples of the invention illustrate:
as shown in fig. 2, in the embodiment of the present invention, 14776 pieces of network traffic sequence data are collected as a data set, and EMD decomposition is performed on a set signal (t) of network traffic data to obtain 13 subsequences imf1(t), imf2(t), … … imf12(t), and res (t).

Claims (10)

1. A network flow data preprocessing method based on EMD is characterized in that: and the network flow sequence is decomposed by adopting EMD and an EMD decomposition subsequence is obtained, so that the complexity of time sequence data is reduced.
2. The method for preprocessing network traffic data based on EMD of claim 1, wherein: the method specifically comprises the following steps:
s1, acquiring historical network flow data;
s2, mirror image continuation of the network flow data sequence, the extended time sequence is used as the original time sequence of the EMD;
s3, initializing an original time sequence, wherein i is 1;
s4, obtaining the ith IMF;
s5, subtracting the newly obtained IMF component from the original sequence;
s6, if the number of extreme points obtained in the remaining sequences is still more than 2, calculating i to i +1, and turning to the step S4, otherwise, turning to the step S7;
s7, the decomposition is finished, and the remaining sequence is the residual component.
3. The method according to claim 2, wherein the method comprises: the step S2 specifically includes:
s21, finding out the network traffic data sequence x (t) ═ x (t)1),x(t2),…,x(tn) All the maximum and minimum points of the symbol are set as xM(i) I ∈ {1,2, …, M }, corresponding to a time point TM(i) I is equal to {1,2, …, M }, and the minimum point is xN(i) I ∈ {1,2, …, N }, corresponding to a time of: t isN(i),i∈{1,2,…,N};
S22, performing continuation on the left end of the sequence x (t), there are two cases:
(1)TM(1)<TN(1) with the axis of continuation passing through TM(1) Longitudinal axis of (a):
TM(-i+2)=TM(i)-2TM(1),xM(-i+2)=xM(i) wherein i>1;
TN(-i+1)=TN(i)-2TM(1),xN(-i+1)=xN(i);
(2)TN(1)<TM(1) With the axis of continuation passing through TN(1) Longitudinal axis of (a):
TM(-i+1)=TM(i)-2TN(1),xM(-i+1)=xM(i);
TN(-i+2)=TM(i)-2TM(1),xN(-i+2)=xN(i) wherein i>1;
S23, extending the right end of the sequence x (t), which includes the following two cases:
(1)TM(M)<TN(N) the continuation symmetry axis is through TMLongitudinal axis of (M):
TM(M+i)=2TM(M)-TM(M-i),xM(M+i)=xM(M-i);
TN(N+i)=2TM(M)-TN(N-i+1),xN(N+i)=xN(N-i+1);
(2)TN(N)<TM(M) the continuation symmetry axis is through TNLongitudinal axis of (N):
TM(M+i)=2TN(N)-TM(M-i+1),xM(M+i)=xM(M-i+1);
TN(N+i)=2TN(N)-TN(N-i),xN(N+i)=xN(N-i)。
4. the method according to claim 2, wherein the method comprises: the step S3 specifically includes: initialization time sequence, r0=x(t),i=1。
5. The method according to claim 2, wherein the method comprises: the step S4 specifically includes:
s41, initialization: h is0=ri-1(t),j=1;
S42, finding hj-1(t) all local maxima and local minima points;
s43, pair hj-1(t) performing cubic spline function interpolation on all the maximum and minimum value points to form an upper line envelope line;
s44, calculating the upper and lower bagsThe average value of the envelope constitutes the average envelope mi-1(t);
S45, subtracting the average envelope from the original sequence to obtain a new sequence:
hj(t)=hj-1(t)-mi-1(t)
s46, judgment hj(t) whether IMF function conditions are satisfied, and if so, hj(t) is the IMF function, IMFi(t)=hj(t), otherwise, j equals j +1, and the process goes to step S42.
6. The method according to claim 2, wherein the method comprises: the step S5 specifically includes: r isi(t)=ri-1(t)-imfi(t)。
7. The method according to claim 2, wherein the method comprises: the algorithm end of the step S7 can be finally verified to obtain:
Figure FDA0002332789450000021
i.e. the sum of all IMF sequences and residual components is the original sequence.
8. The method for preprocessing network traffic data based on EMD according to any one of claims 1-7, characterized in that: the method is run on a server basis.
9. The method according to claim 8, wherein the method comprises: the server displays the EMD decomposition sub-sequence through a display connected thereto.
10. The method according to claim 8, wherein the method comprises: the server prints the EMD decomposition sub-sequence through the printer connected thereto.
CN201911343753.1A 2019-12-24 2019-12-24 Network traffic data preprocessing method based on EMD Pending CN111046323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911343753.1A CN111046323A (en) 2019-12-24 2019-12-24 Network traffic data preprocessing method based on EMD

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911343753.1A CN111046323A (en) 2019-12-24 2019-12-24 Network traffic data preprocessing method based on EMD

Publications (1)

Publication Number Publication Date
CN111046323A true CN111046323A (en) 2020-04-21

Family

ID=70238654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911343753.1A Pending CN111046323A (en) 2019-12-24 2019-12-24 Network traffic data preprocessing method based on EMD

Country Status (1)

Country Link
CN (1) CN111046323A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103941091A (en) * 2014-04-25 2014-07-23 福州大学 Power system HHT harmonious wave detection method based on improved EMD end point effect
WO2017144007A1 (en) * 2016-02-25 2017-08-31 深圳创维数字技术有限公司 Method and system for audio recognition based on empirical mode decomposition
CN107908863A (en) * 2017-11-14 2018-04-13 哈尔滨理工大学 A kind of hydraulic turbine operating condition decision method based on EMD theories with HHT conversion
CN109802862A (en) * 2019-03-26 2019-05-24 重庆邮电大学 A kind of combined network flow prediction method based on set empirical mode decomposition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103941091A (en) * 2014-04-25 2014-07-23 福州大学 Power system HHT harmonious wave detection method based on improved EMD end point effect
WO2017144007A1 (en) * 2016-02-25 2017-08-31 深圳创维数字技术有限公司 Method and system for audio recognition based on empirical mode decomposition
CN107908863A (en) * 2017-11-14 2018-04-13 哈尔滨理工大学 A kind of hydraulic turbine operating condition decision method based on EMD theories with HHT conversion
CN109802862A (en) * 2019-03-26 2019-05-24 重庆邮电大学 A kind of combined network flow prediction method based on set empirical mode decomposition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢萍: "EMD端点问题的研究及HHT理论的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 05, pages 2 - 3 *

Similar Documents

Publication Publication Date Title
Kruiger et al. Graph Layouts by t‐SNE
Adel et al. Discovering interpretable representations for both deep generative and discriminative models
Wang et al. ProgFed: effective, communication, and computation efficient federated learning by progressive training
CN108052665A (en) A kind of data cleaning method and device based on distributed platform
CN109567793A (en) A kind of ECG signal processing method towards cardiac arrhythmia classification
DE102022201746A1 (en) MANAGE DATA CENTERS WITH MACHINE LEARNING
CN104965821B (en) A kind of data mask method and device
Berkolaiko et al. No quantum ergodicity for star graphs
Wang et al. An encrypted traffic classification framework based on convolutional neural networks and stacked autoencoders
CN114418129A (en) Deep learning model training method and related device
Lee et al. Tensor denoising and completion based on ordinal observations
Wehenkel et al. Diffusion priors in variational autoencoders
DE112021004559T5 (en) SYSTEM FOR ROBUST PREDICTION OF ERGONOMIC TIME SERIES IN DIALYSIS PATIENT RECORDS
CN108121962A (en) Face identification method, device and equipment based on non-negative self-adaptive feature extraction
CN108519994A (en) Distributed origin based on Pregel ensures canonical path query algorithm
Lin et al. Fedcluster: A federated learning framework for cross-device private ecg classification
CN111914166B (en) Correction strategy personalized recommendation system applied to community correction personnel
US20170031891A1 (en) Determining incident codes using a decision tree
CN111046323A (en) Network traffic data preprocessing method based on EMD
CN112286996A (en) Node embedding method based on network link and node attribute information
CN111062511B (en) Aquaculture disease prediction method and system based on decision tree and neural network
CN116542956B (en) Automatic detection method and system for fabric components and readable storage medium
Elidan Bagged structure learning of bayesian network
CN110569897A (en) Community detection method in scale-free attribute network based on generative model
CN112905845B (en) Multi-source unstructured data cleaning method for discrete intelligent manufacturing application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421