WO2023103130A1 - 一种基于量子游走的时间序列多尺度分析方法 - Google Patents

一种基于量子游走的时间序列多尺度分析方法 Download PDF

Info

Publication number
WO2023103130A1
WO2023103130A1 PCT/CN2021/143601 CN2021143601W WO2023103130A1 WO 2023103130 A1 WO2023103130 A1 WO 2023103130A1 CN 2021143601 W CN2021143601 W CN 2021143601W WO 2023103130 A1 WO2023103130 A1 WO 2023103130A1
Authority
WO
WIPO (PCT)
Prior art keywords
time series
quantum
time
regression
analysis method
Prior art date
Application number
PCT/CN2021/143601
Other languages
English (en)
French (fr)
Inventor
俞肇元
孙玲玲
潘炳煌
罗文�
袁林旺
Original Assignee
南京师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京师范大学 filed Critical 南京师范大学
Priority to US18/036,654 priority Critical patent/US20240346210A1/en
Publication of WO2023103130A1 publication Critical patent/WO2023103130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N10/00Quantum computing, i.e. information processing based on quantum-mechanical phenomena
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/12Timing analysis or timing optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the invention belongs to the fields of data analysis and quantum computing, and in particular relates to a quantum walk-based time series multi-scale analysis method.
  • Time series analysis is a series of analysis methods that use statistical methods to extract the changing characteristics of the original data sequence, and then perform modeling and prediction.
  • Time series exist widely, and any indicator changes over time can be expressed in the form of time series.
  • the change characteristics over time included in the time series can be used to reveal the development law, change trend, etc.
  • the multi-time series associated with the geographical location also includes the characteristics of spatial interaction.
  • time series decomposition and modeling models which are mainly divided into parametric and non-parametric methods.
  • Common time series analysis methods include autoregressive AR model, moving average MA model, nonlinear time series model, etc.
  • time series analysis methods usually need to make certain assumptions when performing inferential statistics, such as the assumption of stationarity of data. This assumption determines that the statistical law of process characteristics does not change with time; secondly, some time series analysis The method is based on the decomposition of the time series to find the factors that affect the sequence change. This method belongs to inverse deduction; there are also random data superposition fittings used to model the time series, but the traditional random data generation is also data generation under specific rules, which cannot be calculated into truly random data, and the spatial correlation between time series cannot be considered when modeling multiple time series.
  • Quantum walk is one of the most typical and simplest quantum computing methods. It constitutes a general model of quantum computing and is a small amount of quantum computing methods that can be efficiently simulated and solved by numerical computing methods.
  • the present invention proposes a multi-scale analysis method for time series based on quantum walks. Based on the multi-feature sequences generated by quantum walks, specific feature combinations are screened out for different time series, and linear, Modeling and analysis of time series from multiple perspectives such as nonlinearity and time can extract multi-scale time series structural features. In addition, evaluating the correlation between the modeled and predicted result series and the original time series can also be carried out from multiple angles including frequency domain and time domain.
  • the technical scheme adopted in the present invention is: a kind of time series multi-scale analysis method based on quantum walk, specifically comprises the following steps:
  • Step 1 for the original observation time series, generate several feature sequences at different time scales based on quantum walks;
  • Step 2 perform feature screening on several feature sequences generated in step 1 under different time scales to obtain the optimal feature sequence combination
  • Step 3 Based on the regression analysis method, the correlation model of the combination of the original observation time series and the optimal feature sequence is established;
  • Step 4 use the correlation model described in step 3 to predict the actual observed time series, and evaluate the prediction results in the time-frequency domain.
  • the method also includes:
  • Step 5 performing experimental verification on the multi-scale analysis method; the experimental configuration in the experimental verification is specifically:
  • Experimental data configuration Select several satellites in the Pacific Ocean, periodically collect the absolute data of sea level obtained from the above-mentioned satellite altimetry, and obtain the experimental data after processing;
  • Evaluation index configuration select the coefficient of determination R 2 , the root mean square error RMSE and the mean absolute error MAE as the evaluation index of the model prediction results, specifically expressed as follows:
  • y i is the ith element of the actual observation time series, is the i-th element of the predicted fitted sequence, is the average value of the elements of the actual observed time series, and N is the length of the time series.
  • step 1 is specifically as follows:
  • the spectrum of the Hamiltonian H is decomposed to obtain the eigenvalues and eigenvectors of the Hamiltonian H; where the decomposed Hamiltonian H is:
  • the Hamiltonian H is represented by the adjacency matrix of the graph G, and the elements in the adjacency matrix of the graph G are expressed as:
  • (u, v) represents the edge connecting vertex u and vertex v
  • a uv represents the edge between vertex u and vertex v, u ⁇ V, v ⁇ V
  • a uv A vu
  • the stepwise regression method is used to perform feature screening on the generated feature sequences at several different time scales, the method is as follows:
  • the weights of the feature sequences at different time scales mentioned in step 1 are calculated, and sorted according to the weights from large to small, and the first Q feature sequences at different time scales are combined as the optimal feature sequence combination.
  • the regression analysis method described in step 3 includes linear regression, nonlinear regression or vector autoregressive methods based on time correlation; the linear regression includes but is not limited to stepwise regression, principal component regression and partial least squares regression; the Nonlinear regression includes, but is not limited to, projection pursuit regression.
  • the correlation model of the combination of the original observation time series and the optimal feature sequence is established based on linear regression, specifically as follows:
  • Y is the time series after fitting
  • X 1 , X 2 ,...,X q are the sequences in the optimal feature sequence combination respectively
  • ⁇ 1 , ⁇ 2 ,..., ⁇ q represent the sequence
  • the coefficient of , ⁇ is a constant term.
  • the correlation model of the combination of the original observation time series and the optimal feature sequence is established based on projection pursuit regression, specifically as follows:
  • F(x) represents the time series after fitting
  • G m (Z m ) represents the m-th ridge function
  • ⁇ m is the weight value, representing the contribution of the m-th ridge function to the output value
  • M represents the ridge function the total number of is the independent variable of the m-th ridge function, representing the projection of the P-dimensional vector X in the direction of a m
  • X represents the high-dimensional data input by the model
  • a mp is the pth component of the projection in the direction of a m
  • the superscript T represents Transpose
  • P is the dimension of the input space
  • requires a p represents the pth component in a projection direction.
  • step 3 a correlation relationship model between the original observation time series and the optimal feature sequence combination is established based on the time-correlated vector autoregressive, and the sequence in the optimal feature sequence combination is expressed in matrix form w ⁇ [1,L], as follows:
  • N represents the length of the time series
  • L represents the number of sequences in the optimal feature sequence combination
  • X w represents the w-th column vector of matrix Y
  • X wz represents the wz-th column vector of matrix Y
  • X Nw represents the matrix Y
  • the element value of row N and column w of is the coefficient matrix of vector autoregression based on time correlation
  • z is the lag order
  • d is the total lag order
  • ⁇ w is the noise.
  • step 4 the prediction result is evaluated in the time-frequency domain, specifically as follows:
  • the coefficient of determination R 2 , the root mean square error RMSE and the mean absolute error MAE are selected as the evaluation indicators of the prediction results, which are expressed as follows:
  • y i is the ith element of the actual observation time series, is the i-th element of the predicted fitted sequence, is the average value of the elements of the actual observed time series, and N is the length of the time series.
  • the present invention proposes a general time series multi-scale analysis method based on quantum walks, and constructs an analysis method including multi-feature sequence generation, feature sequence extraction, data modeling and prediction, and model evaluation based on quantum walks. Generate sequence combinations with spatio-temporal characteristics without any pre-assumptions, extract feature sequence combinations according to the analysis requirements of different time series, use the actual time series and feature sequence combinations in different perspectives to establish a feature relationship based on different perspectives Time series model based on the model and then forecast.
  • the method proposed by the present invention does not belong to inverse deduction.
  • the feature sequence proposed by the present invention is generated based on the universal rules of quantum walk, and the specific time sequence is expressed according to the partial characteristics generated by quantum walk.
  • the method proposed by the present invention expresses the change characteristics of quantum walk in time and space in the form of characteristic sequence, and uses these characteristics in data analysis, which is a major breakthrough in the application of quantum walk in the field of data analysis.
  • Fig. 1 is a flow chart of a time series multi-scale analysis method based on quantum walk according to an embodiment of the present invention
  • Fig. 2 is a data processing flowchart of a time series multi-scale analysis method based on quantum walk according to an embodiment of the present invention
  • Fig. 3 is the display diagram of the sea level height variation of research point under a kind of embodiment
  • Fig. 4 is a display diagram of the first four sets of quantum walk characteristic sequences under an embodiment
  • Fig. 5 is a linear regression and prediction result figure by stepwise regression screening results under an embodiment
  • Fig. 6 is the linear regression and prediction result figure of screening result by RreliefF algorithm under a kind of embodiment
  • Fig. 7 is the PPR regression and prediction result figure of stepwise regression and RReliefF screening result under a kind of embodiment
  • Fig. 8 is a PPR regression and prediction result figure of stepwise regression and RReliefF screening results under an embodiment
  • Fig. 9 is a power spectral density diagram of modeling and prediction results based on stepwise regression screening results under an embodiment
  • Fig. 10 is a power spectral density diagram of modeling and prediction results based on RReliefF screening results under an embodiment
  • Fig. 11 is a statistical comparison result figure of different regression methods under an embodiment
  • Fig. 12 is a graph of statistical comparison results of different regression and prediction methods under an embodiment.
  • a quantum walk-based time series multi-scale analysis method specifically includes the following steps:
  • Step 1 Generate multi-scale and multi-feature sequences based on quantum walks
  • Quantum walks are generally regarded as a general computing tool, and all quantum calculations can be performed in the form of quantum walks on the graph.
  • the graph for quantum walk consists of vertices and edges, which can be expressed in the form of adjacency matrix.
  • the vertices of the graph represent the quantum states corresponding to the vertices during the walk of the quantum walker, and the edges connecting the vertices carry the conversion of the quantum states between the vertices.
  • the probability of the walker changing over time on each vertex is collected to form a feature sequence.
  • the probability that the quantum walker changes with time on each vertex reflects the changing characteristics of the wave function.
  • the process of quantum walk is simulated based on graph adjacency matrix calculation.
  • the Hamiltonian H is an N ⁇ N Hermitian matrix, which can be replaced by an adjacency matrix or a Laplacian matrix.
  • the present invention uses the adjacency matrix A of the graph G to replace the Hamiltonian H. is a state vector with complex elements.
  • the evolution equation can pass through the initial state Solving from formula (2), the state vector at time t It can be expressed as:
  • e -iHt is a time evolution operator, which is used to construct a dynamic evolution quantum walk
  • i is an imaginary number unit
  • H is a Hamiltonian.
  • the state vector of the quantum walk at time t is the ground state linear combination.
  • the probability that a quantum walker is found at each vertex is the modulus of the corresponding probability amplitude at each vertex in the state vector.
  • is an N ⁇ N matrix, representing a set of eigenvectors, and ⁇ represents matrix transposition.
  • can be expressed as:
  • Formula (3) can be expressed as:
  • the probability that the quantum walker is found on each vertex can be expressed by calculating the modulus of the corresponding probability amplitude on each vertex in the eigenvector.
  • a scale factor is set, and the quantum walk is sampled at equal time intervals based on the scale factor, and the probability sequence corresponding to all vertices is obtained, which represents the quantum walk on a time scale. change characteristics.
  • the quantum walk is sampled multiple times using multiple different scaling factors.
  • Step 2 feature screening:
  • a suitable feature sequence can be generated, and the regression method can be used to establish the relationship between the original observation time series and the generated feature sequence, so as to model the original time series.
  • the regression method can be used to establish the relationship between the original observation time series and the generated feature sequence, so as to model the original time series.
  • the present invention proposes to use two feature screening methods: model-driven stepwise regression and data-driven RReliefF.
  • Stepwise regression can also be used for modeling and prediction.
  • stepwise regression belongs to a regression method of linear modeling.
  • AIC Akaike Information Criterion
  • the RReliefF algorithm calculates the k-nearest neighbors of each mode sample according to the original time series, and calculates the relationship between all modes relative to the original.
  • the relative weight value of the time series sample sort all the modes according to the weight value and select the mode with high weight in turn. For each modality, test all possible k closest instances and return the highest value.
  • the RReliefF algorithm can calculate the weight of all quantum walk feature sequences based on the observed time series, and select the required number
  • Step 3 Time series modeling and forecasting based on regression analysis:
  • the present invention proposes to seek the relationship between the actual time series and the screened feature series from multiple perspectives, including three types of modeling methods including linear regression, nonlinear regression, and regression based on time correlation, and establishes the relationship between time series and quantum
  • linear regression includes stepwise regression, principal component regression (PCR) and partial least squares regression (PLSR), etc.
  • Non-linear regression includes projection pursuit regression (PPR) and time-based regression (VAR).
  • the original time series is represented by a linear combination of the feature sequence generated based on the quantum walk based on different linear regression rules.
  • the focus of the linear regression is to determine each feature The parameters of the sequence, so that these feature sequences can represent all the changing characteristics of the original time series as much as possible.
  • Y is the time series after fitting
  • X 1 , X 2 ,...,X q are the multi-scale feature sequences generated by quantum walks respectively
  • ⁇ 1 , ⁇ 2 ,..., ⁇ q represent the sequence coefficient
  • is a constant term.
  • the three linear regression methods basically express the original time series by a linear combination of modes, but different linear regression methods have specific algorithms to determine the coefficients.
  • Projection pursuit regression is a nonlinear regression analysis method for high-dimensional data, which is widely used in forecasting.
  • the basic idea of PPR is to project high-dimensional data into low-dimensional space (1-3 dimensions), find the projection that can reflect the structure or characteristics of high-dimensional data, and perform regression analysis.
  • the key to PPR is to determine the projection direction.
  • the projection pursuit regression analysis model can be expressed as:
  • G m (Z m ) represents the m-th ridge function
  • ⁇ m is the weight value, which represents the contribution of the m-th ridge function to the output value.
  • a mp is the p-th component of the m-th projection direction
  • P is the dimension of the input space
  • represents the transpose
  • VAR Time-dependent vector autoregression
  • the VAR method uses each internal variable in the system as a function of the lag value of all internal variables in the system to build a model, which is often used in serial correlation analysis.
  • the multi-time series as a matrix, which means that there are L groups of time series of length N.
  • the VAR(z) model can be expressed as formula (12):
  • Step 4 Evaluation of results based on frequency domain and time domain
  • Time series includes structural features in the frequency domain and data features in the time domain.
  • the present invention uses power spectrum analysis to evaluate the characteristics of the time series in the frequency domain. By calculating the power spectral density, the time-related sequence can be converted into a signal intensity distribution that changes with frequency, which can reflect the fitting between sequences in the frequency domain. degree. Evaluation of the correlation between the results of modeling and prediction and the original time series on the time characteristics, the present invention uses the coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE) between two time series to represent Data relationship between two time series.
  • R 2 coefficient of determination
  • RMSE root mean square error
  • MAE mean absolute error
  • y i is the ith element of the original time series, is the ith element of the fitted sequence, is the sample mean and N is the length of the time series.
  • Experimental configuration of the present invention mainly comprises the following parts: (1) experimental data configuration: the present invention selects the sea level absolute data that the satellite altimetry of seven Pacific Ocean positions obtains as experimental data (data collection period is with week as unit); (2) evaluation Index configuration: the present invention selects MAE, RMSE and R 2 as model evaluation indexes.
  • the results of the present invention are divided into the following three parts: (1) multiple modeling methods and predicted results based on the satellite altimetry data of the quantum walk characteristic sequence; (2) modeling based on two kinds of perspectives and the accuracy of prediction results.
  • the absolute sea level data of 7 locations since November 1, 2000 are found, and the records are recorded on a weekly basis.
  • the coordinates of these seven positions are P1 (160.125°E, 0.125°N), P2 (170.125°E, 0.125°N), P3 (180.125°E, 0.125°N), P4 (190.125°E, 0.125°N ), P5 (200.125°E, 0.125°N), P6 (210.125°E, 0.125°N), P7 (220.125°E, 0.125°N), the data display is shown in Figure 3. A total of 1,000 pieces of data are used, of which the first 800 pieces of data are training samples, and the last 200 pieces of data are testing samples.
  • Quantum walks are used to generate multi-scale and multi-feature distribution data related to these seven locations, and two feature screening methods are used to obtain feature combinations similar to those of satellite altimetry data, and then combined with multiple regression methods to obtain satellite altimetry data and satellite altimetry data
  • Two feature screening methods are used to obtain feature combinations similar to those of satellite altimetry data, and then combined with multiple regression methods to obtain satellite altimetry data and satellite altimetry data
  • the data processing process includes:
  • Quantum walk can simulate a time-varying feature sequence with structural features.
  • an adjacency matrix needs to be input.
  • the 7 locations selected in this embodiment are at the same latitude, and are set to generate quantum walk features
  • the adjacency matrix of the sequence is:
  • Quantum stepwise regression and RReliefF screening methods are used to screen the feature sequence combinations generated by quantum walks, and the mode combinations similar to the original time series features are obtained. Since stepwise regression is a model-driven screening method, this algorithm can obtain an optimal mode combination; RReliefF is a data-based weight calculation method, which can calculate the weight of each mode relative to the original time series, based on the size of the weight Choose a modality. In this step, the number of feature sequences screened by stepwise regression is uncertain, and 100 feature sequences are screened for each research point based on RReliefF.
  • the present invention uses five regression algorithms: stepwise regression, principal component regression, partial least squares regression, projection pursuit regression, and vector autoregression to model and predict the original time series, and divides 1000 sets of data into 800 training samples and 200 testing samples. These three modeling and forecasting are carried out based on the results of stepwise regression and RReliefF screening respectively.
  • Figure 5 and Figure 6 are the fitting results of modeling using stepwise regression and RReliefF modal screening results and the results of prediction based on the established model.
  • Fig. 7 is a diagram showing the modeling and prediction results of projection pursuit regression.
  • Figure 8 shows the modeling and prediction results of vector autoregressive.
  • the present invention analyzes the correlation between sequences from two aspects of frequency domain and time domain characteristics, analyzes the power spectrum structure of sea level data, fitting data and predicted data from frequency domain, and obtains from time domain
  • the coefficient of determination and error between the two sequences reflect the correlation index of time domain characteristics.
  • Figure 9 and Figure 10 are the comparison of the power spectrum structure of the modeling and prediction results based on stepwise regression and RReliefF screening results respectively. From the figure, it can be seen intuitively that all the experimental results are very similar to the spectrum structure of the initial time series, especially the non- Projection pursuit regression for linear regression and vector autoregression based on temporal relationships.
  • the result evaluation based on the time domain starts from the data of the experimental results, and obtains various accuracy indicators of the experimental results and the original time series.
  • the present invention calculates the coefficient of determination R 2 , the square root error RMSE and the mean absolute error MAE, and the results are shown in Fig. 11 and Fig. 12 .
  • Figure 11 shows the fitting results of the first 800 pieces of data
  • Figure 12 shows the accuracy statistics of the fitting results of the first 800 pieces of data and the prediction results of the last 200 pieces of data.
  • the first 3 subplots of each figure are experiments performed using stepwise regression screening results
  • the last 3 subplots are experiments using RReliefF screening results.
  • Figure 5 Figure 6, Figure 7, and Figure 8 show the results of regression and prediction based on the results of two feature screenings. From the fitting results, projection pursuit regression based on nonlinear relationship and vector based on time relationship autoregression has better consistency with the original time series, but from the forecast results, the forecast results based on linear relationship are more stable.
  • Figures 9 and 10 show the power spectral density plots of the modeling and forecasting results and the original time series, and projection pursuit regression and vector autoregression have a better fit.
  • Figure 11 and Figure 12 show the evaluation metrics based on the time domain.
  • root mean square error and mean absolute error are related to the average level of the data itself, and cannot be used as evaluation indicators of fitting accuracy between sites, but they can compare the differences in fitting accuracy of different modeling methods at the same site. From Figure 11, both projection pursuit regression based on nonlinear relationship and vector autoregression based on time relationship can achieve good fitting results, and the fitting accuracy of the three methods based on linear regression is relatively low. The accuracy of the fitting results of the features screened by stepwise regression is higher than that of the features screened by RReliefF.
  • the quantum walk-based time series multi-scale analysis method proposed by the present invention analyzes the time series from data generation, data screening, data modeling and prediction, and result evaluation, and can also achieve high modeling or prediction accuracy.
  • the different approaches used in the present invention each have advantages. Nonlinear regression based on quantum walk feature sequence and time-based self-vector regression can have high accuracy in time series fitting, but they are not stable enough in time series prediction; linear regression based on quantum walk time series Regression will lose some details of time series changes in time series fitting, but it is stable in time series prediction.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Mathematics (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于量子游走的时间序列多尺度分析方法,包括:1.基于量子游走产生多尺度、多特征的序列;2.特征序列筛选;3.基于回归分析的时间序列建模与预测;4.基于频率域、时间域的结果评价;5.实验验证。本发明的优势在于将量子游走的多尺度特征运用于时间序列的分析当中,并结合了两种规则下的特征提取方法,结合提取出的特征,使用线性、非线性、基于时间的回归方法建立原始时间序列的模型。这种时间序列的分析方法不需要有平稳性假设等预先假设,是一种通用的时间序列分析方法。本发明使用特征序列的方式表示量子游走在时空上的变化特征,并将这些特征用于数据的分析当中,是量子游走在数据分析领域应用的一次重大突破。

Description

一种基于量子游走的时间序列多尺度分析方法 技术领域
本发明属于数据分析、量子计算领域,具体涉及一种基于量子游走的时间序列多尺度分析方法。
背景技术
时间序列分析是利用统计学方法提取原始数据序列的变化特征,进而进行建模和预测的一系列分析方法。时间序列广泛存在,任何涉及随时间的指标变化都可以以时间序列的形式表示。时间序列中包括的随时间的变化特征可以用来揭示发展规律、变化趋势等,与地理位置相关联的多时间序列还包括空间上的相互影响特征。当前存在大量时间序列分解和建模的模型,主要分为参数和非参数的方法。常见的时间序列分析方法有自回归AR模型、移动平均MA模型、非线性时间序列模型等,有从时域角度、也有从频域角度对时间序列的分析方法,目前时间序列的分析方法逐渐完善。但目前多数时间序列分析方法在进行推断统计时通常需要做出某种假设,比如数据的平稳性假设,这一假设决定了过程特征的统计规律则不随时间变化而变化;其次有的时间序列分析方法基于时间序列的分解找到影响序列变化的因素,这种方式属于逆推;也有使用随机数据的叠加拟合对时间序列建模,但是传统的随机数据生成也是特定规则下的数据产生,不能算成真正的随机数据,并且在对多时间序列建模时不能考虑时间序列间的空间相关关系。
量子游走的发展带来了基于量子规则的随机数据模拟,基于量子规则产生的特征序列不仅具有时间上的相关性,还具备空间上的相干性。基于量子规律的数据分析、计算与模拟是现代科学的前沿领域。量子游走是最典型和最简单的量子计算方法之一,构成量子计算的通用模型,是可以利用数值计算方法进行高效模拟和求解的少量量子计算方法。
发明内容
发明目的:针对以上问题,本发明提出一种基于量子游走的时间序列多尺度分析方法,基于量子游走产生的多特征序列,针对不同的时间序列筛选出特定的特征组合,并且从线性、非线性、时间等多个视角对时间序列进行建模分析,可提取多尺度的时间序列结构特征。此外,评估建模和预测的结果序列与原始时间序列之间的相关性同样可以从频率域、时间域在内的多个角度进行。
技术方案:为实现本发明的目的,本发明所采用的技术方案是:一种基于量子游走 的时间序列多尺度分析方法,具体包括如下步骤:
步骤1,对于原始观测时间序列,基于量子游走生成若干个不同时间尺度下的特征序列;
步骤2,对步骤1生成的若干个不同时间尺度下的特征序列进行特征筛选,得到最优特征序列组合;
步骤3,基于回归分析方法建立原始观测时间序列与最优特征序列组合的相关关系模型;
步骤4,利用步骤3所述相关关系模型对实际观测的时间序列进行预测,并对预测结果进行时频域结果评价。
进一步地,所述方法还包括:
步骤5,对所述多尺度分析方法进行实验验证;所述实验验证中的实验配置具体为:
实验数据配置:选择若干个太平洋位置的卫星,周期性的采集前述卫星测高所得到的海平面绝对数据,并进行处理后得到实验数据;
评价指标配置:选择决定系数R 2、均方根误差RMSE和平均绝对误差MAE作为模型预测结果的评价指标,具体表示如下:
Figure PCTCN2021143601-appb-000001
Figure PCTCN2021143601-appb-000002
Figure PCTCN2021143601-appb-000003
式中,y i是实际观测时间序列的第i个元素,
Figure PCTCN2021143601-appb-000004
是预测得到的拟合的序列的第i个元素,
Figure PCTCN2021143601-appb-000005
是实际观测时间序列的元素的平均值,N为时间序列的长度。
进一步地,所述步骤1的方法,具体如下:
利用任意无向图G=(V,E)表示量子游走过程,其中,V是顶点的集合,E是边的集合;顶点表示量子游走过程中的量子态,边表示量子态在顶点之间的转换;
Figure PCTCN2021143601-appb-000006
表示量子游走过程中初始时刻的量子态向量,通过时间演化算子e -iHt,将 量子游走过程中t时刻的量子态向量
Figure PCTCN2021143601-appb-000007
表示为:
Figure PCTCN2021143601-appb-000008
式中,|>为标记态矢量的符号,e -iHt为时间演化算子,i为虚数单位,H为哈密顿量,以邻接矩阵或拉普拉斯矩阵表示;
利用谱分解算法,将哈密顿量H的谱进行分解,得到哈密顿量H的特征值和特征向量;其中,分解后的哈密顿量H为:
H=ΦΛΦ Τ
式中,Φ为N×N矩阵,表示特征向量的集合,Τ表示转置,Λ为N×N对角矩阵,具体表示为Λ=diag(λ 12,...,λ n,...,λ N),λ 12,...,λ N为哈密顿量H的有序特征值,N为时间序列的长度;
时间演化算子表示为:e -iHt=Φe -iΛtΦ Τ
进而量子游走过程中t时刻的量子态向量
Figure PCTCN2021143601-appb-000009
表示为:
Figure PCTCN2021143601-appb-000010
构建尺度因子集合
Figure PCTCN2021143601-appb-000011
J表示尺度因子的总数,k j表示第j个尺度因子;利用k jn代替时刻t,则量子游走过程中的量子态向量表示为:
Figure PCTCN2021143601-appb-000012
式中,
Figure PCTCN2021143601-appb-000013
表示正实数,n为自然数,n=0,1,2,...;
基于所述尺度因子k j对量子游走过程进行等时间间隔的采样,得到所有顶点对应的概率幅的模方的序列,由此生成量子游走在不同时间尺度下的特征序列。
进一步地,所述哈密顿量H以图G的邻接矩阵表示,所述图G的邻接矩阵中的元素表示为:
Figure PCTCN2021143601-appb-000014
式中,(u,v)表示连接顶点u与顶点v的边,A uv表示顶点u与顶点v之间的边,u∈V,v∈V,且A uv=A vu,A vv=A uu=0。
进一步地,所述步骤2中,利用逐步回归法对生成的若干个不同时间尺度下的特征序列进行特征筛选,方法如下:
对不同时间尺度下的特征序列进行组合,并不断调整上述组合,利用赤池信息准则评价上述组合对原始观测时间序列建模的拟合精度,并选择评价结果最好的组合作为最优特征序列组合;
或者,
利用RReliefF算法对生成的若干个不同时间尺度下的特征序列进行特征筛选,方法如下:
基于原始观测时间序列对步骤1所述若干个不同时间尺度下的特征序列进行权重计算,并根据权重自大到小进行排序,将前Q个不同时间尺度下的特征序列组合起来作为最优特征序列组合。
进一步地,步骤3所述回归分析方法包括线性回归、非线性回归或基于时间相关的向量自回归方法;所述线性回归包括但不限于逐步回归、主成分回归和偏最小二乘回归;所述非线性回归包括但不限于投影寻踪回归。
进一步地,所述步骤3中,基于线性回归建立原始观测时间序列与最优特征序列组合的相关关系模型,具体如下:
Y=β 1X 12X 2+...+β qX q
式中,Y是拟合后的时间序列,X 1,X 2,...,X q分别是最优特征序列组合中的序列,β 12,...,β q分别表示序列的系数,ε是常数项。
进一步地,所述步骤3中,基于投影寻踪回归建立原始观测时间序列与最优特征序列组合的相关关系模型,具体如下:
Figure PCTCN2021143601-appb-000015
式中,F(x)表示拟合后的时间序列,G m(Z m)表示第m个岭函数,β m为权值,表示第m个岭函数对输出值的贡献,M表示岭函数的总个数,
Figure PCTCN2021143601-appb-000016
为第m个岭函数的自变量,表示P维向量X在a m方向上的投影,X表示模型输入的高维数据,a mp为a m方向的投影的第p个分量,上标T表示转置,P为输入空间的维数,要求
Figure PCTCN2021143601-appb-000017
a p表示在一个投影方向上的第p个分量。
进一步地,所述步骤3中,基于时间相关的向量自回归建立原始观测时间序列与最优特征序列组合的相关关系模型,将最优特征序列组合中的序列以矩阵形式表示
Figure PCTCN2021143601-appb-000018
w∈[1,L],具体如下:
Figure PCTCN2021143601-appb-000019
Figure PCTCN2021143601-appb-000020
式中,N表示时间序列的长度,L表示最优特征序列组合中的序列个数,X w表示矩阵Y的第w列向量,X w-z表示矩阵Y的第w-z列向量,X Nw表示矩阵Y的第N行第w列的元素值,
Figure PCTCN2021143601-appb-000021
为基于时间相关的向量自回归的系数矩阵,z为滞后阶数,d为滞后总阶数,ε w表示噪声。
进一步地,步骤4所述对预测结果进行时频域结果评价,具体如下:
选择决定系数R 2、均方根误差RMSE和平均绝对误差MAE作为预测结果的评价指标,表示如下:
Figure PCTCN2021143601-appb-000022
Figure PCTCN2021143601-appb-000023
Figure PCTCN2021143601-appb-000024
式中,y i是实际观测时间序列的第i个元素,
Figure PCTCN2021143601-appb-000025
是预测得到的拟合的序列的第i个元素,
Figure PCTCN2021143601-appb-000026
是实际观测时间序列的元素的平均值,N为时间序列的长度。
有益效果:与现有技术相比,本发明技术方案具有以下有益技术效果:
本发明基于量子游走提出了一个通用的时间序列多尺度分析方法,构建了一个包括基于量子游走的多特征序列生成、特征序列提取、数据建模与预测、模型评估的分析方法。在不进行任何预先假设的前提下产生具有时空特征的序列组合,针对不同时间序列的分析需求对特征序列组合进行提取,利用实际时间序列与特征序列组合在不同视角下 的特征联系建立基于不同视角的时间序列模型,基于模型进而预测。本发明提出的方法不属于逆推,本发明所提出的特征序列基于量子游走的通用性规则产生,根据量子游走产生的部分特征对特定时间序列进行表达。本发明提出的方法使用特征序列的方式表示量子游走在时空上的变化特征,并将这些特征用于数据的分析当中,是量子游走在数据分析领域应用的一次重大突破。
附图说明
图1是一种实施例下本发明所述一种基于量子游走的时间序列多尺度分析方法流程图;
图2是一种实施例下本发明所述一种基于量子游走的时间序列多尺度分析方法数据处理流程图;
图3是一种实施例下研究点的海平面高度变化展示图;
图4是一种实施例下前四组量子游走特征序列展示图;
图5是一种实施例下通过逐步回归筛选结果的线性回归与预测结果图;
图6是一种实施例下通过RreliefF算法筛选结果的线性回归与预测结果图;
图7是一种实施例下逐步回归和RReliefF筛选结果的PPR回归与预测结果图;
图8是一种实施例下逐步回归和RReliefF筛选结果的PPR回归与预测结果图;
图9是一种实施例下基于逐步回归筛选结果的建模和预测结果功率谱密度图;
图10是一种实施例下基于RReliefF筛选结果的建模和预测结果功率谱密度图;
图11是一种实施例下不同回归方法的统计比较结果图;
图12是一种实施例下不同回归和预测方法的统计比较结果图。
具体实施方式
下面结合附图和实施例对本发明的技术方案作进一步的说明。
本发明所述的一种基于量子游走的时间序列多尺度分析方法,参考图1,具体包括如下步骤:
步骤1:基于量子游走产生多尺度、多特征的序列;
实际时间序列往往具有空间位置,时间序列的演化会相互影响。使用量子游走方法能依据不同的空间关系产生与之相匹配的特征序列。在利用量子游走产生特征序列之前,需要确定时间序列之间的空间位置关系,并以图的形式进行抽象。
量子游走通常被视为一种通用计算工具,所有的量子计算都可以在图上以量子游走的方式进行。进行量子游走的图形由顶点和边组成,可以用邻接矩阵的形式表达。以图 的顶点表示量子游走者在游走过程中顶点上对应的量子态,连接顶点的边承载着量子态在顶点之间的转换。为了数据化量子游走的特征,将游走者在每个顶点上随时间变化的概率收集起来,形成特征序列。在量子游走过程中,量子游走者在各个顶点上随时间变化的概率反映了波函数的变化特征。通过谱分解的算法,基于图的邻接矩阵计算模拟出量子游走的过程。
使用任意无向图来描述量子游走过程。设G=(V,E)是一个无向无权图,其中V是N个顶点的集合,E是边的集合。对于任意顶点v,(u,v)表示连接顶点u与顶点v的边。图G的邻接矩阵A可以定义为:
Figure PCTCN2021143601-appb-000027
其中,A uv表示顶点u与顶点v之间的边,u∈V,v∈V,并且有A uv=A vu,A vv=A uu=0。
与经典的随机游走不同,量子游走的过程不是马尔可夫链。通常,状态向量
Figure PCTCN2021143601-appb-000028
随时间t的演化可以描述为
Figure PCTCN2021143601-appb-000029
方程的形式:
Figure PCTCN2021143601-appb-000030
其中,
Figure PCTCN2021143601-appb-000031
表示量子游走过程中t时刻所有顶点对应的量子态向量。|>是标记态矢量的符号。哈密顿量H是一个N×N的Hermitian矩阵,可以使用邻接矩阵或拉普拉斯矩阵代替。为简便起见,本发明使用图G的邻接矩阵A代替哈密顿量H。
Figure PCTCN2021143601-appb-000032
是一个元素为复数的状态向量。
演化方程可以通过初始状态
Figure PCTCN2021143601-appb-000033
从公式(2)求解,t时刻的状态向量
Figure PCTCN2021143601-appb-000034
可以表示为:
Figure PCTCN2021143601-appb-000035
其中,e -iHt是时间演化算子,用于构建动态演化的量子游走,i为虚数单位,H为哈密顿量。t时刻量子游走的状态向量
Figure PCTCN2021143601-appb-000036
是基态线性组合。量子游走者在每个顶点上被发现的概率为状态向量中每个顶点上对应概率幅的模方。
为了获得状态向量
Figure PCTCN2021143601-appb-000037
需要计算带有矩阵和复数的时间演化算子e -iHt。将哈密顿量的谱分解为:
H=ΦΛΦ Τ                         (4)
其中,Φ为N×N矩阵,表示特征向量的集合,Τ表示矩阵转置。Λ可表示为:
Λ=diag(λ 12,...,λ n,...,λ N)                    (5)
为N×N对角矩阵,其中λ 12,...,λ N为H的有序特征值。利用哈密顿量H的谱分解,时间演化算子可表示为:
e -iHt=Φe -iΛtΦ Τ                       (6)
公式(3)可以表示为:
Figure PCTCN2021143601-appb-000038
使用QR分解来计算哈密顿量H的特征值和特征向量。使用特征值、特征向量和时间t来模拟状态向量的演化,由公式(7)实现。
通过计算特征向量中每个顶点上对应的概率幅的模方就可以来表示量子游走者在每个顶点上被发现的概率。为了获得不同时间尺度下量子游走的变化特征,设置一个尺度因子,基于尺度因子对量子游走进行等时间间隔的采样,得到所有顶点对应的概率序列,表示量子游走在一个时间尺度上的变化特征。为了获得用于数据建模与预测的特征序列集合,使用多个不同的尺度因子对量子游走进行多次采样。为便于理解,定义尺度因子集合
Figure PCTCN2021143601-appb-000039
其中J表示尺度因子的个数。时间t则可以用k jn代替,k jn中的n用一组自然数表示,n=0,1,2,...,
Figure PCTCN2021143601-appb-000040
表示正实数。因此,公式(7)可以表示为:
Figure PCTCN2021143601-appb-000041
步骤2,特征筛选:
基于步骤1,通过调整参数k j,可以生成合适的特征序列,利用回归方法来建立原始观测时间序列和生成的特征序列之间的关系,从而对原始时间序列建模。为了尽可能得到更多的特征,增加尺度因子以模拟尽可能多的序列。然而,不是所有生成的特征都与原始序列具有相关联系,并且使用过多模态来对原始时间序列建模会导致过拟合。因此,在所有生成的模态中,选择出可以用来表示原始时间序列特征的模态。
本发明提出使用两种特征筛选方法:分别是模型驱动的逐步回归以及数据驱动的RReliefF,逐步回归也可用于建模和预测。其中:逐步回归属于一种线性建模的回归方法,通过不断改变特征序列组合,利用赤池信息准则(AIC)等标准评价使用这些特征序列组合对原始时间序列建模的精度,并决定是否保留最新改变的特征组合,如果拟合 精度更好,就保留最新的更改,否则保持原来的特征组合;RReliefF算法根据原始时间序列计算每个模态样本的k最近邻,计算得到所有模态相对于原始时间序列样本的相对权重值,对根据权重值对所有模态进行排序并可以依次选择权重高的模态。对于每个模态,测试所有可能的k个最近的实例,并返回最高值。RReliefF算法能基于观测的时间序列对所有的量子游走特征序列进行权重计算,并根据权重选择所需的特征序列数。
步骤3:基于回归分析的时间序列建模与预测:
本发明提出从多个视角寻求实际时间序列与筛选的特征序列之间的相互关系,包括线性回归、非线性回归、基于时间相关性的回归在内的三类建模方法,建立时间序列与量子游走特征序列之间的相关关系模型,在模型的基础上通过量子游走特征序列的组合实现对原始时间序列的预测。其中线性回归包括逐步回归、主成分回归(PCR)和偏最小二乘回归(PLSR)等,非线性回归包括投影寻踪回归(PPR)等和基于时间关系的回归(VAR)。
线性回归方法在基于量子游走生成的特征序列的回归分析中,基于不同的线性回归规则将原始时间序列使用基于量子游走产生的特征序列的线性组合表示,线性回归的重点就是确定每个特征序列的参数,使这些特征序列能尽可能表示原始时间序列的所有变化特征。
Y=β 1X 12X 2+...+β qX q+ε     (9)
其中,Y是拟合后的时间序列,X 1,X 2,...,X q分别是量子游走生成的多尺度特征序列,β 12,...,β q分别表示序列系数,ε是常数项。三种线性回归方法基本上是通过模态的线性组合来表达原始时间系列,但不同的线性回归方法有特定的算法来确定系数。
投影寻踪回归是一种针对高维数据的非线性回归分析方法,被广泛应用于预测。PPR的基本思想是将高维数据投影到低维空间(1~3维),找到能反映高维数据结构或特征的投影,并进行回归分析。PPR的关键是确定投影方向。
投影寻踪回归分析模型可表示为:
Figure PCTCN2021143601-appb-000042
其中G m(Z m)表示第m个岭函数,β m为权值,表示第m个岭函数对输出值的贡献大小,
Figure PCTCN2021143601-appb-000043
为岭函数的自变量,表示P维向量X在a m方向上的投影,a mp为第 m个投影方向的第p个分量,P为输入空间的维数,Τ表示转置,并且要求
Figure PCTCN2021143601-appb-000044
时间相关的向量自回归(VAR)常用于预测具有内在相关因素的时间序列系统,分析随机扰动对变量系统的动态影响。VAR方法将系统中的每个内在变量作为系统中所有内在变量滞后值的函数来构建模型,常用于序列相关分析。对于多时间序列
Figure PCTCN2021143601-appb-000045
将多时间序列理解为一个矩阵,表示有L组长度为N的时间序列。在任意w时间下,VAR(z)模型可表示为公式(12):
Figure PCTCN2021143601-appb-000046
Figure PCTCN2021143601-appb-000047
其中
Figure PCTCN2021143601-appb-000048
为VAR的系数矩阵,ε w为噪声,z为滞后阶数。
步骤4:基于频率域、时间域的结果评价
时间序列包括频率域上的结构特征以及时间域上的数据特征。本发明在评价时间序列频率域上的特征上采用功率谱分析,通过计算功率谱密度可以将与时间相关的序列转换成随频率变化的信号强度分布,可以体现序列间在频率域上的拟合程度。评价建模和预测的结果与原始时间序列在时间特征上的相关关系,本发明使用两个时间序列间决定系数(R 2)、均方根误差(RMSE)、平均绝对误差(MAE)来表示两个时间序列的数据关系。
Figure PCTCN2021143601-appb-000049
Figure PCTCN2021143601-appb-000050
Figure PCTCN2021143601-appb-000051
其中y i是原始时间序列的第i个元素,
Figure PCTCN2021143601-appb-000052
是拟合的序列的第i个元素,
Figure PCTCN2021143601-appb-000053
是样本平均值,N为时间序列的长度。
步骤5:实验验证
本发明实验配置主要包括以下部分:(1)实验数据配置:本发明选择七个太平洋位置 的卫星测高得到的海平面绝对数据作为实验数据(数据采集周期以周为单位);(2)评价指标配置:本发明选择MAE、RMSE以及R 2作为模型评价指标。
基于以上实验配置,本发明的结果分为如下三部分:(1)基于量子游走特征序列的卫星测高数据的多种建模方法及预测的结果;(2)基于两种视角的建模和预测结果的精度评定。
以卫星测高数据为例,找到7个位置的自2000年11月1日开始的海平面绝对数据,以周为周期记录。这7个位置的坐标分别是P1(160.125°E,0.125°N),P2(170.125°E,0.125°N),P3(180.125°E,0.125°N),P4(190.125°E,0.125°N),P5(200.125°E,0.125°N),P6(210.125°E,0.125°N),P7(220.125°E,0.125°N),数据的展示如图3所示。一共使用1000条数据,其中前800条数据为训练样本,后200条数据为检验样本。使用量子游走生成与这7个位置相关的多尺度多特征分布数据,使用两种特征筛选的方式得到与卫星测高数据特征相似的特征组合,再结合多种回归方法得到卫星测高数据与特征之间的关系,建立模型并对训练样本之后进行200条数据的预测。分别评价模型拟合的精度以及预测的精度。
参考图2,数据处理过程包括:
1.基于量子游走产生多尺度、多特征的序列:
量子游走能模拟出具有结构特征的随时间变化的特征序列,进行量子游走模拟需要输入邻接矩阵,本实施例选择的7个位置点处在同一纬度上,设置用于产生量子游走特征序列的邻接矩阵为:
Figure PCTCN2021143601-appb-000054
将P1设置为量子游走者的初始位置。由于使用的原始数据一共有1000条数据,所以设置每个时间尺度下得到的数据长度为1000。为了尽可能生成量子点分布所有可能的情况,本实施例将设置2000个尺度因子进行采样,尺度因子最小为0.01,依次以0.01递增。对前四个尺度因子产生的量子游走特征序列制图,如图4所示。
2.特征筛选:
分别使用量子逐步回归和RReliefF的筛选方法对量子游走产生的特征序列组合进行筛选,得到与原始时间序列特征相似的模态组合。由于逐步回归是模型驱动的筛选方法,该算法能得到一个最优的模态组合;RReliefF是基于数据的权重计算方法,能计算得到每个模态相对于原始时间序列的权重,基于权重的大小选择模态。在此步骤中,使用逐步回归筛选的特征序列的数量是不确定的,基于RReliefF对每个研究点筛选100个特征序列。
3.基于回归分析的时间序列建模与预测:
基于特征筛选的结果,本发明使用逐步回归、主成分回归、偏最小二乘回归、投影寻踪回归、向量自回归五种回归算法对原始时间序列进行建模和预测,将1000组数据分成800的训练样本和200的检验样本。分别基于逐步回归和RReliefF筛选的结果进行这三种的建模与预测。图5和图6分别是使用逐步回归和RReliefF模态筛选结果的进行建模的拟合结果和基于建立模型预测的结果展示。图7是投影寻踪回归的建模与预测结果展示图。图8是向量自回归的建模与预测结果展示。
4.基于频率域、时间域的结果评价:
基于步骤3,本发明从频域、时域特征两个方面对序列间的相关性进行分析,从频域上分析海平面数据与拟合数据以及预测数据的功率谱结构,从时域上得到两序列间的决定系数与误差等体现时域特征的相关性指标。图9和图10分别是基于逐步回归和RReliefF筛选结果的建模和预测结果的功率谱结构比较,从图中直观看出所有的实验结果与初始时间序列的频谱结构是很相似的,尤其是非线性回归的投影寻踪回归以及基于时间关系的向量自回归。
基于时间域的结果评价从实验结果的数据出发,得到实验结果与原始时间序列的各个精度指标。本发明计算了决定系数R 2、平方根误差RMSE以及平均绝对误差MAE,结果如图11和图12所示。图11表示的是前800条数据的拟合结果,图12表示的是前800条数据的拟合结果以及后200条数据的预测结果的精度统计图。每个图的前3个子图都是使用逐步回归筛选结果进行的实验,后3个子图是使用RReliefF筛选结果进行的实验。
5.实验验证:
图5、图6、图7、图8中展示了基于两种特征筛选结果进行回归和预测的结果,从拟合的结果上看,基于非线性关系的投影寻踪回归以及基于时间关系的向量自回归与原 始时间序列具有更好的一致性,但是从预测的结果上看,基于线性关系的预测结果更稳定。图9和图10展示了建模和预测结果与原始时间序列的功率谱密度图,投影寻踪回归和向量自回归具有更好的拟合程度。
图11和图12展示了基于时间域的评价指标。在对模拟预测结果的评价中,决定系数R 2越大,平方根误差和平均绝对误差越小,表示两序列的相关性越大。但是均方根误差与平均绝对误差与数据本身的平均水平相关,不能作为站点间拟合精度的评价指标,但是能比较同一站点间不同建模方法的拟合精度差异。从图11来看,基于非线性关系的投影寻踪回归与基于时间关系的向量自回归都能取得很好的拟合结果,基于线性回归的三种方法的拟合精度相对较低,其中使用逐步回归筛选的特征的拟合结果的精度要高于使用RReliefF筛选特征的拟合结果。由于RReliefF筛选结果的特征个数比逐步回归的筛选结果多,所以可以说明使用逐步回归的筛选结果更适用于线性回归。基于RReliefF筛选结果向量自回归能在数据的拟合上取得较高的精度,但是在序列的预测上表现不佳,出现了较大的误差。在RMSE和MAE上,投影寻踪回归与向量自回归在前800条的拟合数据的误差上明显低于线性回归,但是向量自回归在RReliefF筛选结果的预测上出现部分站点预测偏差。
本发明提出的基于量子游走的时间序列多尺度分析方法从数据产生、数据筛选、数据建模与预测、结果评估上对时间序列进行分析,也能取得较高的建模或者预测精度。本发明使用的不同方法各有优势。基于量子游走特征序列的非线性回归、基于时间的自向量回归都能在时间序列的拟合上具有较高的精度,但是在时间序列的预测上不够稳定;基于量子游走时间序列的线性回归在时间序列的拟合上会丢失一些时间序列的变化细节,但是在时间序列的预测上稳定。

Claims (10)

  1. 一种基于量子游走的时间序列多尺度分析方法,其特征在于,具体包括如下步骤:
    步骤1,对于原始观测时间序列,基于量子游走生成若干个不同时间尺度下的特征序列;
    步骤2,对步骤1生成的若干个不同时间尺度下的特征序列进行特征筛选,得到最优特征序列组合;
    步骤3,基于回归分析方法建立原始观测时间序列与最优特征序列组合的相关关系模型;
    步骤4,利用步骤3所述相关关系模型对实际观测的时间序列进行预测,并对预测结果进行时频域结果评价。
  2. 根据权利要求1所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述方法还包括:
    步骤5,对所述多尺度分析方法进行实验验证;所述实验验证中的实验配置具体为:
    实验数据配置:选择若干个太平洋位置的卫星,周期性的采集上述卫星测高所得到的海平面绝对数据,并进行处理后得到实验数据;
    评价指标配置:选择决定系数R 2、均方根误差RMSE和平均绝对误差MAE作为模型预测结果的评价指标,具体表示如下:
    Figure PCTCN2021143601-appb-100001
    Figure PCTCN2021143601-appb-100002
    Figure PCTCN2021143601-appb-100003
    式中,y i是实际观测时间序列的第i个元素,
    Figure PCTCN2021143601-appb-100004
    是预测拟合得到的序列的第i个元素,
    Figure PCTCN2021143601-appb-100005
    是实际观测时间序列的元素的平均值,N为时间序列的长度。
  3. 根据权利要求1所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述步骤1的方法,具体如下:
    利用任意无向图G=(V,E)表示量子游走过程,其中,V是顶点的集合,E是边的集 合;顶点表示量子游走过程中的量子态,边表示量子态在顶点之间的转换;
    Figure PCTCN2021143601-appb-100006
    表示量子游走过程中初始时刻的量子态向量,通过时间演化算子e -iHt,将量子游走过程中t时刻的量子态向量
    Figure PCTCN2021143601-appb-100007
    表示为:
    Figure PCTCN2021143601-appb-100008
    式中,|>为标记态矢量的符号,e -iHt为时间演化算子,i为虚数单位,H为哈密顿量,以邻接矩阵或拉普拉斯矩阵表示;
    利用谱分解算法,将哈密顿量H的谱进行分解,得到哈密顿量H的特征值和特征向量;其中,分解后的哈密顿量H为:
    H=ΦΛΦ Τ
    式中,Φ为N×N矩阵,表示特征向量的集合,Τ表示转置,Λ为N×N对角矩阵,具体表示为Λ=diag(λ 12,...,λ n,...,λ N),λ 12,...,λ N为哈密顿量H的有序特征值,N为时间序列的长度;
    时间演化算子表示为:e -iHt=Φe -iΛtΦ Τ
    进而量子游走过程中t时刻的量子态向量
    Figure PCTCN2021143601-appb-100009
    表示为:
    Figure PCTCN2021143601-appb-100010
    构建尺度因子集合
    Figure PCTCN2021143601-appb-100011
    J表示尺度因子的总数,k j表示第j个尺度因子;利用k jn代替时刻t,则量子游走过程中的量子态向量表示为:
    Figure PCTCN2021143601-appb-100012
    式中,
    Figure PCTCN2021143601-appb-100013
    表示正实数,n为自然数,n=0,1,2,...;
    基于所述尺度因子k j对量子游走过程进行等时间间隔的采样,得到所有顶点对应的概率幅的模方的序列,由此生成量子游走在不同时间尺度下的特征序列。
  4. 根据权利要求3所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述哈密顿量H以图G的邻接矩阵表示,所述图G的邻接矩阵中的元素表示为:
    Figure PCTCN2021143601-appb-100014
    式中,(u,v)表示连接顶点u与顶点v的边,A uv表示顶点u与顶点v之间的边, u∈V,v∈V,且A uv=A vu,A vv=A uu=0。
  5. 根据权利要求1所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述步骤2中,利用逐步回归法对生成的若干个不同时间尺度下的特征序列进行特征筛选,方法如下:
    对不同时间尺度下的特征序列进行组合,并不断调整上述组合,利用赤池信息准则评价上述组合对原始观测时间序列建模的拟合精度,并选择评价结果最好的组合作为最优特征序列组合;
    或者,
    利用RReliefF算法对生成的若干个不同时间尺度下的特征序列进行特征筛选,方法如下:
    基于原始观测时间序列对步骤1所述若干个不同时间尺度下的特征序列进行权重计算,并根据权重自大到小进行排序,将前Q个不同时间尺度下的特征序列组合起来作为最优特征序列组合。
  6. 根据权利要求1所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,步骤3所述回归分析方法包括线性回归、非线性回归或基于时间相关的向量自回归方法;所述线性回归包括但不限于逐步回归、主成分回归和偏最小二乘回归;所述非线性回归包括但不限于投影寻踪回归。
  7. 根据权利要求6所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述步骤3中,基于线性回归建立原始观测时间序列与最优特征序列组合的相关关系模型,具体如下:
    Y=β 1X 12X 2+…+β qX q
    式中,Y是拟合后的时间序列,X 1,X 2,...,X q分别是最优特征序列组合中的序列,β 12,...,β q分别表示序列的系数,ε是常数项。
  8. 根据权利要求6所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述步骤3中,基于投影寻踪回归建立原始观测时间序列与最优特征序列组合的相关关系模型,具体如下:
    Figure PCTCN2021143601-appb-100015
    式中,F(x)表示拟合后的时间序列,G m(Z m)表示第m个岭函数,β m为权值,表示第m个岭函数对输出值的贡献,M表示岭函数的总个数,
    Figure PCTCN2021143601-appb-100016
    为第m个岭函数的自变量,表示P维向量X在a m方向上的投影,X表示模型输入的高维数据,a mp为a m方向的投影的第p个分量,上标T表示转置,P为输入空间的维数,要求
    Figure PCTCN2021143601-appb-100017
    a p表示在一个投影方向上的第p个分量。
  9. 根据权利要求6所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,所述步骤3中,基于时间相关的向量自回归建立原始观测时间序列与最优特征序列组合的相关关系模型,将最优特征序列组合中的序列以矩阵形式表示
    Figure PCTCN2021143601-appb-100018
    具体如下:
    Figure PCTCN2021143601-appb-100019
    Figure PCTCN2021143601-appb-100020
    式中,N表示时间序列的长度,L表示最优特征序列组合中的序列个数,X w表示矩阵Y的第w列向量,X w-z表示矩阵Y的第w-z列向量,X Nw表示矩阵Y的第N行第w列的元素值,
    Figure PCTCN2021143601-appb-100021
    为基于时间相关的向量自回归的系数矩阵,z为滞后阶数,d为滞后总阶数,ε w表示噪声。
  10. 根据权利要求1所述的一种基于量子游走的时间序列多尺度分析方法,其特征在于,步骤4所述对预测结果进行时频域结果评价,具体如下:
    选择决定系数R 2、均方根误差RMSE和平均绝对误差MAE作为预测结果的评价指标,表示如下:
    Figure PCTCN2021143601-appb-100022
    Figure PCTCN2021143601-appb-100023
    Figure PCTCN2021143601-appb-100024
    式中,y i是实际观测时间序列的第i个元素,
    Figure PCTCN2021143601-appb-100025
    是预测得到的拟合的序列的第i个元素,
    Figure PCTCN2021143601-appb-100026
    是实际观测时间序列的元素的平均值,N为时间序列的长度。
PCT/CN2021/143601 2021-12-09 2021-12-31 一种基于量子游走的时间序列多尺度分析方法 WO2023103130A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/036,654 US20240346210A1 (en) 2021-12-09 2021-12-31 Multi-scale analysis method for time series based on quantum walk

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111499360.7A CN114429077B (zh) 2021-12-09 2021-12-09 一种基于量子游走的时间序列多尺度分析方法
CN202111499360.7 2021-12-09

Publications (1)

Publication Number Publication Date
WO2023103130A1 true WO2023103130A1 (zh) 2023-06-15

Family

ID=81310951

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143601 WO2023103130A1 (zh) 2021-12-09 2021-12-31 一种基于量子游走的时间序列多尺度分析方法

Country Status (3)

Country Link
US (1) US20240346210A1 (zh)
CN (1) CN114429077B (zh)
WO (1) WO2023103130A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116822253A (zh) * 2023-08-29 2023-09-29 山东省计算中心(国家超级计算济南中心) 适用于masnum海浪模式的混合精度实现方法及系统
CN116881693A (zh) * 2023-07-13 2023-10-13 江苏省地质矿产局第一地质大队 刀群序贯破岩掌子面形态演化特征提取方法
CN117370714A (zh) * 2023-12-07 2024-01-09 南京气象科技创新研究院 一种代表站定量确定方法
CN118504289A (zh) * 2024-07-17 2024-08-16 泰安市瑞亨建材有限公司 一种土工膜三维应力变形分析方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614335A (zh) * 2020-11-17 2021-04-06 南京师范大学 一种基于生成-滤波机制的交通流特征模态分解方法
CN113393488A (zh) * 2021-06-08 2021-09-14 南京师范大学 一种基于量子游走的行为轨迹序列多特征模拟方法
US20210365606A1 (en) * 2019-12-09 2021-11-25 Nanjing Normal University Quantum computing method for expressway traffic flow distribution simulation considering destination selection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651941B (zh) * 2020-04-30 2022-05-17 北京航空航天大学 一种全球电离层电子总含量预测的算法
CN113392583A (zh) * 2021-06-08 2021-09-14 南京师范大学 一种基于量子游走的海面高度模拟方法
CN113657014B (zh) * 2021-08-10 2024-08-23 南京师范大学 一种基于量子游走的临近空间大气状态模拟方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210365606A1 (en) * 2019-12-09 2021-11-25 Nanjing Normal University Quantum computing method for expressway traffic flow distribution simulation considering destination selection
CN112614335A (zh) * 2020-11-17 2021-04-06 南京师范大学 一种基于生成-滤波机制的交通流特征模态分解方法
CN113393488A (zh) * 2021-06-08 2021-09-14 南京师范大学 一种基于量子游走的行为轨迹序列多特征模拟方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HU YONG, LUO WEN; YU ZHAOYUAN; FENG LINYAO: "Multi-mode Tensor Expression Model of Multidimensional Spatio-temporal Field Data", GEOMATICS AND INFORMATION SCIENCE OF WUHAN UNIVERSITY., vol. 40, no. 7, 1 July 2015 (2015-07-01), pages 977 - 982, XP093070487, DOI: 10.13203/j.whugis20130491 *
SUN, YI: "Multi-Objective Differential Evolution Algorithm and Its Application on Data Clustering", MASTER'S THESIS, 1 June 2018 (2018-06-01), CN, pages 1 - 84, XP009546079 *
YU, ZHAOYUAN: "Multi-dimensional Unified GIS Data Model Based on Geometric Algebra", DOCTORAL DISSERTATION, 10 May 2011 (2011-05-10), CN, pages 1 - 136, XP009546085 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881693A (zh) * 2023-07-13 2023-10-13 江苏省地质矿产局第一地质大队 刀群序贯破岩掌子面形态演化特征提取方法
CN116881693B (zh) * 2023-07-13 2024-03-29 江苏省地质矿产局第一地质大队 刀群序贯破岩掌子面形态演化特征提取方法
CN116822253A (zh) * 2023-08-29 2023-09-29 山东省计算中心(国家超级计算济南中心) 适用于masnum海浪模式的混合精度实现方法及系统
CN116822253B (zh) * 2023-08-29 2023-12-08 山东省计算中心(国家超级计算济南中心) 适用于masnum海浪模式的混合精度实现方法及系统
CN117370714A (zh) * 2023-12-07 2024-01-09 南京气象科技创新研究院 一种代表站定量确定方法
CN117370714B (zh) * 2023-12-07 2024-03-19 南京气象科技创新研究院 一种代表站定量确定方法
CN118504289A (zh) * 2024-07-17 2024-08-16 泰安市瑞亨建材有限公司 一种土工膜三维应力变形分析方法

Also Published As

Publication number Publication date
US20240346210A1 (en) 2024-10-17
CN114429077A (zh) 2022-05-03
CN114429077B (zh) 2024-09-13

Similar Documents

Publication Publication Date Title
Feng et al. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting
Fang et al. Examining the applicability of different sampling techniques in the development of decomposition-based streamflow forecasting models
WO2023103130A1 (zh) 一种基于量子游走的时间序列多尺度分析方法
Mellit et al. Least squares support vector machine for short-term prediction of meteorological time series
Reich et al. A hierarchical max-stable spatial model for extreme precipitation
Stroud et al. A Bayesian adaptive ensemble Kalman filter for sequential state and parameter estimation
Xu et al. A deep learning approach to predict sea surface temperature based on multiple modes
Horenko On the identification of nonstationary factor models and their application to atmospheric data analysis
Gad et al. Performance evaluation of predictive models for missing data imputation in weather data
Tyass et al. Wind speed prediction based on statistical and deep learning models
Ramu et al. A review on crop yield prediction using machine learning methods
Fuglstad et al. Compression of climate simulations with a nonstationary global SpatioTemporal SPDE model
Salazar et al. Comparing and blending regional climate model predictions for the American Southwest
Dueben et al. Deep learning to improve weather predictions
Zhang et al. Bayesian Geostatistics Using Predictive Stacking
Shen et al. Least squares kernel ensemble regression in reproducing kernel hilbert space
CN113392583A (zh) 一种基于量子游走的海面高度模拟方法
Moosavi et al. Temporal cluster-based local deep learning or signal processing-temporal convolutional transformer for daily runoff prediction?
Lee et al. Overview of sensitivity analysis methods in earth observation modeling
Li et al. Forecasting large collections of time series: feature-based methods
Minsan et al. Enhancing decomposition and Holt-Winters weekly forecasting of pm2. 5 concentrations in Thailand’s eight northern provinces using the cuckoo search algorithm
Gonzalez et al. Hierarchical classifier-regression ensemble for multi-phase non-linear dynamic system response prediction: application to climate analysis
Fawzy et al. A comparative simulation study of ARIMA and computational intelligent techniques for forecasting time series data
Morris et al. Exploration and inference in spatial extremes using empirical basis functions
Zanetta et al. Efficient modeling of sub-kilometer surface wind with Gaussian processes and neural networks

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 18036654

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21967042

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21967042

Country of ref document: EP

Kind code of ref document: A1