CN113222145A - MODWT-EMD-based time sequence hybrid prediction method - Google Patents

MODWT-EMD-based time sequence hybrid prediction method Download PDF

Info

Publication number
CN113222145A
CN113222145A CN202110624151.4A CN202110624151A CN113222145A CN 113222145 A CN113222145 A CN 113222145A CN 202110624151 A CN202110624151 A CN 202110624151A CN 113222145 A CN113222145 A CN 113222145A
Authority
CN
China
Prior art keywords
emd
prediction
time sequence
modwt
gru
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110624151.4A
Other languages
Chinese (zh)
Other versions
CN113222145B (en
Inventor
高聪
贾靖文
王忠民
陈彦萍
陈煜喆
钟威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Posts and Telecommunications
Original Assignee
Xian University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Posts and Telecommunications filed Critical Xian University of Posts and Telecommunications
Priority to CN202110624151.4A priority Critical patent/CN113222145B/en
Publication of CN113222145A publication Critical patent/CN113222145A/en
Application granted granted Critical
Publication of CN113222145B publication Critical patent/CN113222145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a hybrid time sequence prediction method based on MODWT-EMD (modified dead time domain-empirical mode decomposition) aiming at the problem of time sequence prediction. Firstly, the maximum overlapping discrete wavelet decomposition is carried out on an original time sequence to obtain N frequency components. Secondly, the obtained N frequency components are subjected to empirical mode decomposition to obtain a plurality of IMF components and residual errors. And then inputting the plurality of IMF components and residual errors obtained by decomposition into a random forest classifier, scoring and sequencing the importance of each IMF component and residual error, and selecting characteristic information with larger influence. And finally, inputting the selected characteristic information into a Bi-GRU for training, predicting an original time sequence by using a trained model, and evaluating the prediction capability of the text method through average absolute error, root mean square error, average absolute percentage error and goodness of fit. The method can obtain the prediction result in less training time on the premise of ensuring the prediction accuracy.

Description

MODWT-EMD-based time sequence hybrid prediction method
Technical Field
The invention belongs to the field of wireless sensor networks, and particularly relates to a time sequence hybrid prediction method based on MODWT-EMD.
Background
In recent years, with the continuous development of industrial internet and the increasing demand of industrial application for data analysis, the number of sensor nodes in a wireless sensor network is also gradually increased. Meanwhile, data acquired by the sensors is increased explosively, and massive data needs to be acquired, transmitted, stored and analyzed through the sensor nodes. The problems of data transmission quality, sensor energy consumption, network congestion and the like inevitably occur in the process of collecting data by the sensor nodes. The dense deployment of nodes in sensor networks and the amount of data communicated between sensor nodes can grow explosively as the size of networks continues to scale. With the continuous expansion of network scale, the deployment of nodes in the sensor network is gradually intensive, and the data communication between the sensor nodes will increase explosively. Therefore, a large amount of redundant data is transmitted between the nodes, wasting the energy of the nodes, resulting in an increase in energy consumption of the sensor nodes. Meanwhile, the sensor nodes need to process and store a large amount of redundant data, resulting in a slow processing speed of the sensor nodes. In addition, due to the limited energy of the sensor node, the bandwidth of the sensor network may be limited, which may cause network congestion, packet loss, and retransmission of a large number of data packets, and these problems may also cause the energy consumption of the sensor node to increase.
In the industrial production process, the production condition, the equipment state and the environmental condition in a workshop need to be known in time to ensure that the production operation is normally carried out. Therefore, it is very important to monitor the operation condition of the equipment in the production plant and the production environment through the sensor nodes. The sensors need to continuously collect data of the engine and transmit the collected data to other nodes. In the process, frequent communication is maintained among the sensor nodes in the wireless sensor network. The wireless communication module is used as the most energy-consuming module in the sensor, and the high sampling frequency and frequent communication undoubtedly increase the energy consumption of the sensor node. In such a case, reducing the amount of transmitted redundant data and the number of times the data is collected by the sensor through data prediction becomes one of the main solutions to reduce the amount of data transmission.
The time series, which is the main representation of sensor data, is composed of successive real-valued data points collected over a unit of time. The time series prediction is to collect data of equipment or production environment according to a certain frequency in equal time intervals, then dig out rules from the collected data, and predict future data by using the rules. Therefore, the problems of sensor data transmission quality, sensor energy consumption, network congestion and the like can be solved by predicting the time series.
The conventional time series prediction methods mainly include a statistical prediction model, a machine learning prediction method, and a hybrid prediction method. The prediction model based on statistics has many condition limitations on data, such as stationarity, data scale and the like, and is suitable for time series with small data scale and single variable. Most of prediction methods based on machine learning are combination of two neural networks, the training difficulty of a plurality of neural networks is high, the consumed time is long, and each parameter of the neural networks needs to be continuously adjusted to enable the prediction performance of the model to reach the best. The hybrid prediction method can combine the advantages of two or more models, has an effect obviously superior to that of a single model in the hybrid model, and can improve the prediction accuracy and the universality of the prediction model.
A time series hybrid prediction method is provided, and a multivariate time series with large data scale can be predicted compared with a prediction model based on statistics. Compared with a prediction method based on machine learning, the method reduces the difficulty and training time of model training. The method for extracting the features of the MODWT-EMD improves the prediction accuracy of the model.
Disclosure of Invention
The invention mainly solves the problems that: aiming at the problems of long training time and complex parameter adjustment of the existing time sequence prediction method based on machine learning, the time sequence hybrid prediction method based on MODWT-EMD is provided. The invention aims to improve the capability of characteristic extraction of the method and reduce the training time and difficulty of a neural network by MODWT-EMD, thereby improving the prediction precision of Bi-GRU.
The technical scheme of the invention is as follows: a time sequence hybrid prediction method based on MODWT-EMD is characterized by mainly comprising the following steps:
(1) decomposing an original time sequence by Maximum Overlap Discrete Wavelet Transform (MODWT), and decomposing the original time sequence into N frequency components from different time scales according to the size of a data sample;
(2) respectively performing Empirical Mode Decomposition (EMD) on the N frequency components obtained by the maximum overlapping discrete wavelet Decomposition in the step (1), extracting characteristic information of each frequency component from high frequency to low frequency to obtain a plurality of Intrinsic Mode Functions (IMFs) and residual errors, wherein each IMF contains characteristic information of different time scales of a time sequence.
(3) And (3) inputting a plurality of IMFs and residual errors obtained in the step (2) through the empirical mode decomposition into a random forest to score the importance of the IMFs and the residual errors. The random forest is a classifier for training and predicting data after a plurality of decision trees are integrated. Each decision tree in the random forest votes for the importance of the multiple IMFs and residuals entered. And obtaining the importance score of each feature by the random forest according to the final voting result. The features are sorted according to importance scores obtained by a random forest, and the features with higher scoring results are screened out for training a prediction model.
(4) Inputting the features screened out in the step (3) into a Bi-GRU for training, and predicting the time sequence by using the trained model.
(5) The prediction results obtained in (4) are further evaluated for specificity by a plurality of evaluation indexes, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and goodness of fit (R-Square, R)2)。
In step (2) of the above method for automatically generating test data based on an independent path, performing EMD decomposition on frequency components obtained by MODWT to obtain a plurality of IMFs and residuals. Since the decomposition of the time sequence by the MODWT cannot have very high precision in both time and frequency, the decomposition precision of the MODWT can be improved by the EMD, and at the same time, the problem of modal aliasing occurring when the EMD is decomposed is reduced, so that the decomposition precision when the time sequence is subjected to feature extraction is improved, that is, the capability of the method for feature extraction is improved.
In the step (3) of the automatic test data generation method based on the independent path, importance scores of each IMF and residual errors can be obtained through a random forest. Each decision tree in the random forest scores the importance of IMFs and residuals. We rank the scores from high to low and screen out the features with higher scores as the features trained in prediction.
In the step (4) of the above method for automatically generating test data based on an independent path, a Bi-directional Gated recovery Unit (Bi-GRU) of a prediction model is established. The Gated Recurrent Unit (GRU) as a variant of Recurrent Neural Network (RNN) has less gating and parameters than another variant of Long-Short-Term Memory (LSTM), simplifies the internal structure, but is easier to train and implement. The Bi-GRU combines information of the input sequence in both forward and backward directions on the basis of the GRU. For output at time t, the forward GRU layer has information of time t and previous times in the input sequence, and the backward GRU layer has information of time t and later times in the input sequence. By using this structure, the influence of input information from the past and the future on the current state is captured, thereby improving the accuracy of the prediction result.
In the step (5) of the above method for automatically generating test data based on independent path, the prediction result of the time series is obtained by MAE, RMSE, MAPE and R2Four evaluation indexes were evaluated. The calculation formula is as follows:
Figure BDA0003100379400000031
Figure BDA0003100379400000032
Figure BDA0003100379400000033
Figure BDA0003100379400000034
wherein n is the number of observations,
Figure BDA0003100379400000035
as observed, y ═ y1,y2,…,ynThe real values are, for each i 1,2, …,
Figure BDA0003100379400000036
is the mean value.
The invention is characterized in that: (1) the method has the advantages that the time sequence is subjected to feature extraction by combining MODWT and EMD, so that the capability of the method for extracting features is improved; (2) scoring the importance of the extracted features through a random forest, then sorting the scores and screening out the features with higher scores; (3) inputting the screened features into a Bi-GRU for training, and predicting a time sequence. Experiments show that the method can accurately and stably predict the time sequence.
According to the invention, by extracting the characteristics of the time sequence by MODWT-EMD, the extraction capability of the MODWT on detailed characteristics is improved during characteristic extraction; scoring the importance of the features through a random forest, and screening out the features with higher importance to the original time sequence; and training and predicting the screened features through the Bi-GRU, so that the accuracy of Bi-GRU prediction is improved. The advantages of MODWT-EMD, random forest and Bi-GRU are combined, so that the characteristics of a time sequence can be effectively extracted, and the difficulty of adjusting parameters of a neural network and the training time are reduced.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a diagram of time series MODWT decomposition in the embodiment.
Fig. 3 is a graph showing the result of EMD decomposition in the embodiment.
FIG. 4 is a diagram of the scoring and ranking results of a random forest in an embodiment.
FIG. 5 is a comparison of Bi-GRU predictions in the present invention: when Batch Size is set to 16, the Units Number is set to 50 and 100, respectively.
FIG. 6 is a comparison of Bi-GRU predictions in the present invention: when Batch Size is set to 32, the Units Number is set to 50 and 100, respectively.
FIG. 7 is a comparison of Bi-GRU predictions in the present invention: when Batch Size is set to 64, the Units Number is set to 50 and 100, respectively.
Detailed Description
The following further describes embodiments of the present invention with reference to examples, but the practice of the present invention is not limited thereto. In the following description, those not specifically described are all parts that can be understood by those skilled in the art to be implemented, such as general implementation steps of random forests.
Referring to fig. 1, a MODWT-EMD based time series hybrid prediction method includes the following steps:
(1) a plurality of frequency components of a time series is obtained.
The time series is decomposed by MODWT to obtain a plurality of frequency components as shown in fig. 2, and each frequency component contains characteristic information of the time series on different time scales.
(2) The time series characteristics are obtained by decomposing a plurality of frequency components.
And (3) decomposing the plurality of frequency components obtained in the step (1) through EMD to obtain a plurality of IMFs and residual errors. Each IMF is sorted from high to low according to frequency and contains characteristic information on different time scales of the time sequence.
(3) And (5) screening the characteristics.
And (4) grading the importance of the IMFs and the residual errors by using a random forest, and screening the features according to the grading result so as to obtain feature information which has a large influence on the prediction result.
(4) And predicting the time series.
Inputting the screened features into a Bi-GRU to train the model, and predicting the time sequence.
(5) The prediction result is evaluated by a plurality of evaluation indexes.
And evaluating the data of the experimental prediction result through different evaluation indexes, and comprehensively evaluating the prediction capability of the invention. The results of different indexes under different parameters are as follows:
TABLE 1 evaluation of Performance of prediction results
Figure BDA0003100379400000041
The operation result is the estimation of the prediction result after the model is trained for multiple times, and the estimation result of the test case is predicted for multiple times possibly to be different. However, experiments show that the method can stably and accurately predict the time sequence.

Claims (5)

1. A time series hybrid prediction method based on MODWT-EMD is characterized by comprising the following steps:
(1) decomposing an original time sequence by Maximum Overlap Discrete Wavelet Transform (MODWT), and decomposing the original time sequence into N frequency components from different time scales according to the size of a data sample;
(2) respectively performing Empirical Mode Decomposition (EMD) on the N frequency components obtained by the maximum overlapping discrete wavelet Decomposition in the step (1), extracting characteristic information of each frequency component from high frequency to low frequency to obtain a plurality of Intrinsic Mode Functions (IMFs) and residual errors, wherein each IMF contains characteristic information of different time scales of a time sequence;
(3) inputting a plurality of IMFs and residual errors obtained in the step (2) through empirical mode decomposition into a random forest to score the importance of the IMFs and the residual errors; the random forest is a classifier which trains and predicts data after a plurality of decision trees are integrated, wherein each decision tree votes and scores the importance of a plurality of input IMFs and residuals, and the importance score of each feature is obtained according to the final voting result; finally, sorting the features according to importance scores obtained by the random forest and screening the features with higher scoring results for training a prediction model;
(4) inputting the features screened out in the step (3) into a Bi-GRU for training, and predicting the time sequence by using the trained model;
(5) the prediction results obtained in (4) are further evaluated for specificity by a plurality of evaluation indexes, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and goodness of fit (R-Square, R)2)。
2. A MODW-EMD based time series hybrid prediction method as claimed in claim 1, characterized in that: in the step (2), performing EMD on the frequency components obtained by MODWT to obtain a plurality of IMFs and residual errors; since the decomposition of the time sequence by the MODWT cannot have very high precision in both time and frequency, the decomposition precision of the MODWT can be improved by the EMD, and at the same time, the problem of modal aliasing occurring when the EMD is decomposed is reduced, so that the decomposition precision when the time sequence is subjected to feature extraction is improved, that is, the capability of the method for feature extraction is improved.
3. A MODW-EMD based time series hybrid prediction method as claimed in claim 1, characterized in that: in the step (3), importance scores of each IMF and the residual errors can be obtained through a random forest; each decision tree scores the importance of IMFs and residual errors, ranks the scores from high to low, and screens out the features with higher scores as the features for training in prediction.
4. A MODW-EMD based time series hybrid prediction method as claimed in claim 1, characterized in that: in the step (4), a bidirectional gating cycle Unit (Bi-directional Gated Recurrent Unit, Bi-GRU) of a prediction model is established; a Gated Recurrent Unit (GRU) as a variant of a Recurrent Neural Network (RNN) has fewer gates and parameters compared with another variant of a Long Short-Term Memory (LSTM), which simplifies the internal structure but is easier to train and implement, the Bi-GRU combines information of an input sequence in both forward and backward directions on the basis of the GRU, for output at t, the forward GRU layer has information of t and previous times in the input sequence, and the backward GRU layer has information of t and subsequent times in the input sequence; by using this structure, the influence of input information from the past and the future on the current state is captured, thereby improving the accuracy of the prediction result.
5. A MODW-EMD based time series hybrid prediction method as claimed in claim 1, characterized in that: in step (5), the prediction result of the time series is determined by MAE, RMSE, MAPE and R2Four evaluation indexes are evaluated, and the calculation formula is as follows:
Figure FDA0003100379390000021
Figure FDA0003100379390000022
Figure FDA0003100379390000023
Figure FDA0003100379390000024
wherein n is the number of observations,
Figure FDA0003100379390000025
as observed, y ═ y1,y2,…,ynThe real values are, for each i 1,2, …,
Figure FDA0003100379390000026
is the mean value.
CN202110624151.4A 2021-06-04 2021-06-04 MODTT-EMD-based time sequence hybrid prediction method Active CN113222145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110624151.4A CN113222145B (en) 2021-06-04 2021-06-04 MODTT-EMD-based time sequence hybrid prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110624151.4A CN113222145B (en) 2021-06-04 2021-06-04 MODTT-EMD-based time sequence hybrid prediction method

Publications (2)

Publication Number Publication Date
CN113222145A true CN113222145A (en) 2021-08-06
CN113222145B CN113222145B (en) 2023-12-22

Family

ID=77082754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110624151.4A Active CN113222145B (en) 2021-06-04 2021-06-04 MODTT-EMD-based time sequence hybrid prediction method

Country Status (1)

Country Link
CN (1) CN113222145B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064203A (en) * 2021-10-28 2022-02-18 西安理工大学 Cloud virtual machine load prediction method based on multi-scale analysis and deep network model
CN117851920A (en) * 2024-03-07 2024-04-09 国网山东省电力公司信息通信公司 Power Internet of things data anomaly detection method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014196892A1 (en) * 2013-06-04 2014-12-11 Siemens Aktiengesellschaft System for leakage and collapse detection of levees and method using the system
CN107992447A (en) * 2017-12-13 2018-05-04 电子科技大学 A kind of feature selecting decomposition method applied to river level prediction data
CN111507221A (en) * 2020-04-09 2020-08-07 北华大学 Gear signal denoising method based on VMD and maximum overlapping discrete wavelet packet transformation
AU2020101854A4 (en) * 2020-08-17 2020-09-24 China Communications Construction Co., Ltd. A method for predicting concrete durability based on data mining and artificial intelligence algorithm
US20210165770A1 (en) * 2019-12-02 2021-06-03 Alibaba Group Holding Limited Periodicity detection and period length estimation in time series

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014196892A1 (en) * 2013-06-04 2014-12-11 Siemens Aktiengesellschaft System for leakage and collapse detection of levees and method using the system
CN107992447A (en) * 2017-12-13 2018-05-04 电子科技大学 A kind of feature selecting decomposition method applied to river level prediction data
US20210165770A1 (en) * 2019-12-02 2021-06-03 Alibaba Group Holding Limited Periodicity detection and period length estimation in time series
CN111507221A (en) * 2020-04-09 2020-08-07 北华大学 Gear signal denoising method based on VMD and maximum overlapping discrete wavelet packet transformation
AU2020101854A4 (en) * 2020-08-17 2020-09-24 China Communications Construction Co., Ltd. A method for predicting concrete durability based on data mining and artificial intelligence algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHONGMIN WANG; JINGWEN JIA; CONG GAO; YANPING CHEN; HONG XIA: "A Hybrid Prediction Method Based on MODWT-EMD for Time Series in Wireless Sensor Networks", 《2021 16TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064203A (en) * 2021-10-28 2022-02-18 西安理工大学 Cloud virtual machine load prediction method based on multi-scale analysis and deep network model
CN117851920A (en) * 2024-03-07 2024-04-09 国网山东省电力公司信息通信公司 Power Internet of things data anomaly detection method and system

Also Published As

Publication number Publication date
CN113222145B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN112633317A (en) CNN-LSTM fan fault prediction method and system based on attention mechanism
CN113222145B (en) MODTT-EMD-based time sequence hybrid prediction method
CN110968069B (en) Fault prediction method of wind generating set, corresponding device and electronic equipment
CN106649479A (en) Probability graph-based transformer state association rule mining method
CN111340282A (en) DA-TCN-based method and system for estimating residual service life of equipment
CN110987436B (en) Bearing fault diagnosis method based on excitation mechanism
CN110188654A (en) A kind of video behavior recognition methods not cutting network based on movement
CN112668775A (en) Air quality prediction method based on time sequence convolution network algorithm
CN115689008A (en) CNN-BilSTM short-term photovoltaic power prediction method and system based on ensemble empirical mode decomposition
DE112021003629T5 (en) COMPACT REPRESENTATION AND TIME SERIES SEGMENT RETRIEVAL THROUGH DEEP LEARNING
CN113468796A (en) Voltage missing data identification method based on improved random forest algorithm
CN110673568A (en) Method and system for determining fault sequence of industrial equipment in glass fiber manufacturing industry
CN114062812A (en) Fault diagnosis method and system for metering cabinet
CN113780640A (en) TCN-Attention-based solar radiation prediction method
Song et al. A new ensemble method for multi-label data stream classification in non-stationary environment
CN116502155A (en) Safety supervision system for numerical control electric screw press
Wang Research on the fault diagnosis of mechanical equipment vibration system based on expert system
Rivero et al. Short time series prediction: Bayesian Enhanced modified Approach with application to cumulative rainfall series
CN117475191A (en) Bearing fault diagnosis method for feature alignment domain antagonistic neural network
CN114372640A (en) Wind power prediction method based on fluctuation sequence classification correction
CN113049249A (en) Motor bearing fault diagnosis method and system
CN113326882A (en) Model integration method and device based on classification and regression algorithm
Wang et al. A Hybrid Prediction Method Based on MODWT-EMD for Time Series in Wireless Sensor Networks
Zeyang Research on intelligent acceleration algorithm for big data mining in communication network based on support vector machine
Li et al. Efficient Time Series Predicting with Feature Selection and Temporal Convolutional Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant