CN115309754B - Frequency mixing prediction method, device and equipment based on information sparse situation - Google Patents

Frequency mixing prediction method, device and equipment based on information sparse situation Download PDF

Info

Publication number
CN115309754B
CN115309754B CN202211238794.6A CN202211238794A CN115309754B CN 115309754 B CN115309754 B CN 115309754B CN 202211238794 A CN202211238794 A CN 202211238794A CN 115309754 B CN115309754 B CN 115309754B
Authority
CN
China
Prior art keywords
data
prediction
information
index
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211238794.6A
Other languages
Chinese (zh)
Other versions
CN115309754A (en
Inventor
张崇辉
陈思博
王永恒
苏为华
周家敏
苏田恬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Zhejiang Lab
Original Assignee
Zhejiang Gongshang University
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University, Zhejiang Lab filed Critical Zhejiang Gongshang University
Priority to CN202211238794.6A priority Critical patent/CN115309754B/en
Publication of CN115309754A publication Critical patent/CN115309754A/en
Application granted granted Critical
Publication of CN115309754B publication Critical patent/CN115309754B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a frequency mixing prediction method, a frequency mixing prediction device and frequency mixing prediction equipment based on information sparse situation, wherein different types of data information are integrated by constructing a multi-input frequency mixing data access toolkit; formulating an information sparsity level judgment rule and an index frequency conversion target; if the index information sparse level cannot meet the index frequency conversion requirement, performing interpolation processing on the index set until the frequency conversion requirement is met to form an analysis index set; mixing different models to carry out prediction, and forming a prediction result by utilizing an AIC information criterion; and storing the prediction result, converting the prediction result into shape data, and displaying the shape data through a visualization technology. The invention solves the problem that accurate results cannot be obtained due to too few low-frequency index observations in the frequency mixing prediction analysis.

Description

Frequency mixing prediction method, device and equipment based on information sparse situation
Technical Field
The invention relates to the technical field of data prediction analysis of sparse data information, in particular to a frequency mixing prediction method, a frequency mixing prediction device and frequency mixing prediction equipment based on sparse information.
Background
With the continuous development of digital economy and technology thereof, the information input of modern predictive analysis systems is becoming diversified and complicated. Under the condition of complex information input, the problem of multi-frequency data input is often faced because the update frequency and update time of different data acquisition systems (such as equipment sensors, information sensors, statistical surveys and the like) are inconsistent. For the cold start situation, the data acquisition system with a low update frequency (such as statistical survey) is difficult to provide data with information density meeting the analysis requirement, that is, the problem of information sparsity occurs.
The information sparseness problem generally exists in various cold start prediction analysis systems, such as a newly-built communication base station load prediction system, a newly-built hospital visitor flow prediction system, a high-frequency economy prediction analysis system in a general investigation period and the like.
The traditional prediction analysis system has a plurality of difficulties which are difficult to solve in the prediction work for processing the problem of information sparsity. For example, too few observations can not satisfy key statistical assumptions; the index information amount is too low, so that the traditional data analysis system cannot accurately predict and the like.
Disclosure of Invention
In order to solve the defects of the prior art, realize the long-panel data processing and simultaneously improve the precision of the prediction of sparse information in high-dimensional data analysis, the invention adopts the following technical scheme:
a mixing prediction method based on information sparsity situation comprises the following steps:
s1, analyzing the frequency mixing data and integrating different types of data;
s2, constructing an information sparsity criterion and determining a frequency conversion target, and comprising the following steps of:
s21, obtaining an analysis data matrix according to user retrieval data;
step S22, setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;
step S23, setting information sparsity criterionSIf at alls<SStep S3 is entered, otherwise, step S4 is entered directly;
s3, processing the data by using a cubic spline interpolation method for the time sequence frequency, and processing the data which do not meet the sparsity criterionSThe sequence of (2) is adjusted, and the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D;
s4, mixing different models, carrying out prediction on the analysis data set D, and forming a prediction result by using a Chichi information criterion;
and S5, displaying the prediction result.
Further, the step S1 includes the steps of:
s11, according to data access characteristics, constructing a structured data storage specification, and analyzing different files to obtain different data access information;
s12, designing a memory pointer, and dividing storage areas for different data access information;
s13, confirming the data index position, and establishing data association according to the data index;
and S14, setting an index retrieval rule, and enabling the un-retrieved observation value to be NA to form a structured database.
Further, the step S3 includes the steps of:
s31, extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, and dividing the specific interval into sub-intervals, wherein each sub-interval meets a cubic spline equation;
s32, calculating the step length of each data node in the subinterval;
s33, under three boundary conditions, filling the data nodes and the specified head end point condition into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary, and a non-kinking boundary;
s34, solving a matrix equation to obtain a second differential value;
step S35, obtaining coefficients of a spline interpolation function through the secondary differential values;
s36, creating a cubic equation in each subinterval according to the coefficient;
step S37, using cubic equation pair not to satisfy sparsity criterionSAnd the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D.
Further, in the step S31, the original data is extracted to obtain a section of [ 2 ]a,b]Low frequency time series data index ofxSection of handlea,b]Is divided intokAn interval [ () ]x 0 ,x 1 ),(x 1 ,x 2 ),…,(x k-1 ,x k )]In all, havek+1 nodes, end-point ofx 0 =ax 1 =bEach subinterval (x i ,x i+1 ) In the above-mentioned order of magnitude,S(x)=S i (x) Is a cubic spline equation, all points satisfy the interpolation conditionS(x i )= y i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS i (x i )= y i S i (x i+1 )= y i+1 (i=0,1,…,n-1);
In step S32, the step length ish i = x i+1 - x i
In step S33, the first boundary condition is a natural boundary: the second derivative of the endpoint is designated as 0M i = S’’ i (x i ) The matrix equation is:S’’(x 0 ) =0= S’’(x n )
Figure 100002_DEST_PATH_IMAGE001
the second boundary condition is a fixed boundary: the designated end point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ 0 (x 0 ) =A,S’ n-1 (x n ) = B, the matrix equation is:
Figure 100002_DEST_PATH_IMAGE002
the third boundary condition is a non-kinked boundary: forcing the third-order derivative of the first interpolation point to be equal to the third-order derivative of the second point, and the third-order derivative of the last first point to be equal to the third-order derivative of the second point to the reciprocal, i.e.S’’’ 0 (x 0 ) =S’’’ 1 (x 1 ),S’’’ n-2 (x n-1 )= S’’’ n-1 (x n-1 ) The matrix equation is:
Figure 100002_DEST_PATH_IMAGE003
in step S34, the matrix equation is solved to obtain the second order differential valueM i (i=0,1,…,n);
In the step S35: byM i Obtaining coefficients of a spline interpolation function:
Figure 100002_DEST_PATH_IMAGE004
Figure 100002_DEST_PATH_IMAGE005
Figure 100002_DEST_PATH_IMAGE006
Figure 100002_DEST_PATH_IMAGE007
in the step S36: according to the coefficient, in each subintervalx i xx i+1 In (1), a cubic equation is created:
g i (x)=a i +b i (x-x i )+c i (x-x i ) 2 +d i (x-x i ) 3
in step S37, the analysis data set is formed as follows:
Figure 100002_DEST_PATH_IMAGE008
further, in step S34, the coefficient matrix is a tri-diagonal matrix, and the LU decomposition is performed on the coefficient matrix to decompose the coefficient matrix into a unit lower triangular matrix and a unit upper triangular matrix, that is, a unit lower triangular matrix and a unit upper triangular matrix
B=Ax=(LU)x=L(Ux)=Ly
Wherein the content of the first and second substances,La lower triangular matrix is represented, which is,Urepresenting the upper triangular matrix.
Further, in the step S4, a model importance index is constructed according to the model akachi pool information content index AICR i And satisfies the following conditions:
Figure 100002_DEST_PATH_IMAGE009
wherein the content of the first and second substances,R i to representiAn importance index of the model;
constructing importance weights using the model importance indicators, wherein:
Figure 100002_DEST_PATH_IMAGE010
performing weighted integration on the prediction result according to the importance index of the model, and outputting the prediction result
Figure 100002_DEST_PATH_IMAGE011
And satisfies the following conditions:
Figure 100002_DEST_PATH_IMAGE012
wherein
Figure 717338DEST_PATH_IMAGE011
Representing the predicted results of different models.
Further, in the step S4, performing regression prediction on the analysis data set D by using a Lasso model includes the following steps:
step S411, constructing a loss function with a penalty term, wherein the specific formula is as follows:
L=(Y-W T X) T (Y-W T X)+λ||W||
wherein |WI isWThe L1-norm of (a),W=(w 1 ,w 2 ,…,w m ) Representing the weight vector calculated by the model,Y=(y 1 ,y 2 ,…,y n ) It is shown that the process of the present invention,λthe amount of penalty is indicated and,Xas in data set DxTransposition of the columns satisfies:
Figure 100002_DEST_PATH_IMAGE013
step S412, using coordinate descent method to pairWSolving is carried out;
step S413, according toWSolving the result and calculating Lasso model predicted value L YInformation content of Hechi pool L AIC
In step S4, performing regression prediction on the analysis data set D using an auto. Arima model, including:
step S421, calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;
step S422, according to the parameter estimation result, calculating the auto. ARIMA model predicted value A YInformation content of Hechi pool A AIC
In step S4, performing regression prediction on the analysis data set D by using a time-series multivariate regression model, including:
step S431, calculating the explanation variable in sequenceXAnd an interpreted variableYThe time sequence incidence relation between the two;
step S432, estimating regression parameters by using a least square method;
step S433, calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result M YInformation content of Hechi pool M AIC
Further, in step S5, storing the prediction result, converting the prediction result into shape data, and displaying the shape data through a visualization technology, including:
s51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely a date index, a prediction index actual value and a prediction value;
and S52, storing the prediction result into the data table in the S51, and providing an API data interface.
A frequency mixing prediction device based on the information sparse situation is used for executing the frequency mixing prediction method based on the information sparse situation, and comprises an integration module, a frequency conversion target determination module, an analysis data set generation module, a mixing prediction module and a display module;
the integration module analyzes the mixing data and integrates different types of data;
the frequency conversion target determining module is used for constructing an information sparsity criterion and determining a frequency conversion target, and the executing process is as follows:
obtaining an analysis data matrix according to user retrieval data;
setting information rarityDegree of sparsenesssCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;
setting information sparsity criterionSIf, ifs<SStep S3 is entered, otherwise, step S4 is directly entered;
the analysis data set generation module processes the data by using a cubic spline interpolation method for time series frequency and does not meet the sparsity criterionSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;
the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the akage pool information criterion;
and the display module is used for displaying the prediction result.
A mixing prediction device based on information sparse situations comprises a memory and one or more processors, wherein the memory stores executable codes, and the one or more processors are used for realizing the mixing prediction method based on the information sparse situations when executing the executable codes.
The invention has the advantages and beneficial effects that:
the invention solves the problem that under the condition of complex information input, the traditional prediction analysis system is difficult to give out accurate prediction results due to the problem of sparse part of index information in the mixing prediction data set. On the basis of comprehensively integrating various types of data input, the invention provides a relatively objective sparse judgment standard by using the consistency of communication theoretical information; meanwhile, an automatic information filling means is provided by using a cubic spline function method; then, from three perspectives of section, panel and time sequence analysis, a prediction result under the cold start condition is given through statistical modeling; and finally, constructing an importance measure according to the model residual information quantity to integrate the prediction result, thereby ensuring the accuracy, the robustness and the fairness of the prediction result.
Drawings
FIG. 1 is a flow chart of a mixing prediction method based on sparse information.
Fig. 2 is a flowchart of a mixing prediction method based on information sparsity according to an embodiment of the present invention.
FIG. 3 is a graph of the result of cubic spline interpolation in the mixing prediction method based on time series data according to an embodiment of the present invention.
FIG. 4 is a diagram of the Lasso regression prediction result of the mixing prediction method based on time series data according to the embodiment of the present invention.
FIG. 5 is a graph of autoregressive prediction results of a mixing prediction method based on time series data according to an embodiment of the present invention.
FIG. 6 is a time series multiple regression prediction result chart of the mixing prediction method based on time series data according to the embodiment of the present invention.
FIG. 7 is a diagram of a prediction output of the mixing prediction method based on time series data according to the embodiment of the present invention.
Fig. 8 is a schematic structural diagram of a mixing prediction device based on an information sparseness situation according to the present invention.
Fig. 9 is a schematic structural diagram of a mixing prediction device based on an information sparsity situation according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1, a mixing prediction method based on information sparsity includes the following steps:
s1, analyzing the frequency mixing data and integrating different types of data;
as shown in fig. 2, in the embodiment of the present invention, a structured data storage normalized code is compiled, and a text file, a CSV file, an XML file, a JASON file, an HTML file, and a database file are analyzed and summarized to a structured database; the method specifically comprises the following steps:
step S11, writing a structured data storage normalized feature code according to the data access feature, and analyzing different files to obtain different data access information, wherein the analyzed files include but are not limited to: text files, CSV files, XML files, JASON files, HTML files and database files;
s12, designing a memory pointer, and dividing storage areas for different data access information;
s13, confirming the data index position, and establishing data association according to the data index;
and S14, setting an index retrieval rule to enable the unsearched observed value to be NA, and forming a structured database.
S2, constructing an information sparsity criterion and determining a frequency conversion target, and comprising the following steps of:
s21, obtaining an analysis data matrix according to the user retrieval data:
Figure 100002_DEST_PATH_IMAGE014
(1)
wherein the content of the first and second substances,xrepresenting a user-specified index of the input data,yindicates the index to be predicted specified by the user,mindicating the number of fields in the data set,nrepresenting the number of observations in the dataset;
step S22, setting information sparsitysAnd calculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns, wherein the specific formula is as follows:
Figure 100002_DEST_PATH_IMAGE015
wherein is the first one retrieved according to the data indexjThe first of each indexiIndividual data observations, i.e. of (1)iGo to the firstjAnd (4) columns.nAn index for the last observation;
step S23, setting information sparsity criterionS(default is 1), ifs<SStep S3 is entered, otherwise step S4 is entered directly.
Step S3, processing the data by using a cubic spline interpolation method for the time series frequency, wherein the result is shown in figure 3, a cross line represents the interpolation estimated value of a cubic spline, and a dot line represents actual observation, and the processing method comprises the following steps:
step S31, extracting the original data to obtain the interval of [ 2 ]a,b]Low frequency time series data index ofxAngle section [ 2 ]a,b]Is divided intokAn interval [ () ]x 0 ,x 1 ),(x 1 ,x 2 ),…,(x k-1 ,x k )]In all, havek+1 point, end point ofx 0 =ax 1 =bEach subinterval (x i ,x i+1 ) In the above-mentioned order of magnitude,S(x)=S i (x) For a cubic spline equation, all points satisfy the interpolation conditionS(x i )= y i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS i (x i )= y i S i (x i+1 )= y i+1 (i=0,1,…,n-1);
Step S32, calculating the step length of each data nodeh i = x i+1 - x i
And step S33, filling the data nodes and the specified head end point condition into a matrix equation under the three boundary conditions.
The first boundary condition is a natural boundary: the second derivative of the endpoint is designated as 0M i = S’’ i (x i ) The matrix equation is:S’’(x 0 ) =0= S’’(x n )
Figure DEST_PATH_IMAGE016
the second boundary condition is a fixed boundary: designated endThe point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ 0 (x 0 ) =A,S’ n-1 (x n ) = B, the matrix equation is:
Figure 474554DEST_PATH_IMAGE002
the third boundary condition is a non-kinking boundary: forcing the third derivative value of the first interpolated point to be equal to the third derivative value of the second point and the third derivative value of the last first point to be equal to the third derivative value of the second to last point, i.e.S’’’ 0 (x 0 ) =S’’’ 1 (x 1 ),S’’’ n-2 (x n-1 )= S’’’ n-1 (x n-1 ) The matrix equation is:
Figure 210429DEST_PATH_IMAGE003
s34, solving a matrix equation to obtain a second differential valueM i (i=0,1,…,n). The matrix is a tri-diagonal matrix, and LU decomposition can be performed on the coefficient matrix to obtain a unit lower triangular matrix and a unit upper triangular matrix, namely
B=Ax=(LU)x=L(Ux)=Ly
Wherein, L represents a lower triangular matrix, and U represents an upper triangular matrix;
step S35, byM i Obtaining the coefficient of a spline interpolation function;
Figure 487957DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE017
Figure 219153DEST_PATH_IMAGE006
Figure DEST_PATH_IMAGE018
step S36, in each subintervalx i xx i+1 In (1), creating a cubic equation:
g i (x)=a i +b i (x-x i )+c i (x-x i ) 2 +d i (x-x i ) 3
step S37, using cubic equation to satisfy sparsity criterionSAnd (2) adjusting the sequence, and replacing the corresponding index data in the step (1) by the interpolated sequence to form an analysis data set D, wherein the analysis data set D satisfies the following conditions:
Figure 33089DEST_PATH_IMAGE008
s4, mixing different models to carry out prediction, and forming a prediction result by using AIC (Akaike information criterion) information;
step S41, performing regression prediction on the analysis data set D by using a Lasso model, as shown in fig. 4, where a cross line represents a predicted value, and a solid line represents an actual value, and the method includes:
step S411, constructing a loss function with a penalty term, wherein the specific formula is as follows:
L=(Y-W T X) T (Y-W T X)+λ||W||
wherein |WI isWThe L1-norm of (a),W=(w 1 ,w 2 ,…,w m ) Representing the weight vector calculated by the model,Y=(y 1 ,y 2 ,…,y n ) It is shown that,λindicating a penalty (default 1) given by a person,Xfor the data set D "xTranspose of column ", satisfy:
Figure 256260DEST_PATH_IMAGE013
step S412, using coordinate descent method to pairWSolving is carried out;
step S413, according toWSolving the result and calculating Lasso model predicted value L YInformation content of Hechi pool L AIC
Step S42, performing regression prediction on the analysis data set D by using auto.
Step S421, calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;
step S422, according to the parameter estimation result, calculating the auto. ARIMA model prediction value A YInformation content of Hechi pool A AIC
Step S43, performing regression prediction on the analysis data set D by using the time-series multivariate regression model, as shown in fig. 6, includes:
step S431, calculating the explanation variable in sequenceXAnd an interpreted variableYThe time sequence incidence relation between the two;
step S432, estimating regression parameters by using a least square method;
step S433, calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result M YInformation content of Hechi pool M AIC
Step S44, constructing model importance index according to model residual information content index AICR i And satisfies the following conditions:
Figure DEST_PATH_IMAGE019
wherein, the first and the second end of the pipe are connected with each other,R i to representiThe importance indexes of the model, L, M and A, respectively represent a Lasso model, a multivariate time sequence regression model and an auto. ARIMA model;
step S45, construct importance weight by using model importance index (w L ,w M ,w A ) Wherein:
Figure DEST_PATH_IMAGE020
TABLE 1 summary of the parameters
Figure DEST_PATH_IMAGE021
S46, carrying out weighted integration on the prediction result according to the model importance index, and outputting the prediction result
Figure 603059DEST_PATH_IMAGE011
And satisfies the following conditions:
Figure DEST_PATH_IMAGE022
as shown in table 2, the predicted values of the three prediction models are shown, and the total predicted value of the total prediction is shown.
TABLE 2 prediction results of actual utilization of foreigner sequences
Figure DEST_PATH_IMAGE023
Step S5, storing the prediction result
Figure 267389DEST_PATH_IMAGE011
And converted into shape data and displayed by a visualization technology, as shown in fig. 7, including:
s51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely date, Y _ act and Y _ pre, and respectively comprises a date index, a prediction index actual value and a prediction value;
step S52, predicting the result
Figure 290709DEST_PATH_IMAGE011
Storing the data into the data table in the step S51 and providing an API data interface.
As shown in fig. 8, a mixing prediction apparatus based on information sparsity situation, for performing the mixing prediction method based on information sparsity situation, includes an integration module, a frequency translation target determination module, an analysis data set generation module, a mixing prediction module, and a presentation module;
the integration module is used for analyzing the frequency mixing data and integrating different types of data;
the frequency conversion target determining module is used for constructing an information sparsity criterion and determining a frequency conversion target, and the execution process is as follows:
obtaining an analysis data matrix according to user retrieval data;
setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;
setting information sparsity criterionSIf, ifs<SStep S3 is entered, otherwise, step S4 is directly entered;
the analysis data set generation module processes the data by using a cubic spline interpolation method for time series frequency and does not meet the sparsity criterionSThe sequence of (2) is adjusted, and the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D;
the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the Chichi information criterion;
and the display module is used for displaying the prediction result.
The implementation of this part is similar to that of the above method embodiment, and is not described here again.
Corresponding to the embodiment of the mixing prediction method based on the information sparsity situation, the invention also provides an embodiment of the mixing prediction device based on the information sparsity situation.
Referring to fig. 9, the mixing prediction apparatus based on the information sparsity situation according to the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the one or more processors execute the executable codes to implement a mixing prediction method based on the information sparsity situation in the foregoing embodiments.
The embodiment of the mixing prediction device based on the information sparsity situation of the invention can be applied to any device with data processing capability, such as a computer or other devices or apparatuses. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 9, a hardware structure diagram of an arbitrary device with data processing capability where a frequency mixing prediction device based on an information sparsity situation is located according to the present invention is shown, where in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 9, the arbitrary device with data processing capability where the apparatus is located in the embodiment may generally include other hardware according to an actual function of the arbitrary device with data processing capability, and details thereof are not repeated.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements a frequency mixing prediction method based on information sparsity in the foregoing embodiments.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the embodiments of the present invention in nature.

Claims (8)

1. A frequency mixing prediction method based on information sparsity is characterized by comprising the following steps:
s1, analyzing the mixing data and integrating different types of data;
s2, constructing an information sparsity criterion and determining a frequency conversion target, and comprising the following steps of:
s21, obtaining an analysis data matrix according to user retrieval data;
Figure DEST_PATH_IMAGE001
(1)
wherein the content of the first and second substances,xrepresenting a user-specified index of the input data,yrepresents the index to be predicted specified by the user,mindicates the number of fields in the data set,nrepresenting the number of observations in the dataset;
step S22, setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns; the specific formula is as follows:
Figure DEST_PATH_IMAGE002
wherein is the first one retrieved according to the data indexjThe first of each indexiIndividual data observations, i.e. of (1)iGo to the firstjThe columns of the image data are,nan index for the last observation;
step S23, setting information sparsity criterionSIf at alls<SStep S3 is entered, otherwise, step S4 is directly entered;
s3, processing the data by using a cubic spline interpolation method for the time sequence frequency, and processing the data which do not meet the sparsity criterionSThe sequence of (2) is adjusted, and the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D;
s31, extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, dividing the specific interval into subintervals, and enabling each subinterval to meet a cubic spline equation;
s32, calculating the step length of each data node in the subinterval;
s33, under three boundary conditions, filling the data nodes and the specified head end point condition into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary and a non-kinking boundary;
s34, solving a matrix equation to obtain a second differential value;
step S35, obtaining coefficients of a spline interpolation function through the secondary differential value;
s36, creating a cubic equation in each subinterval according to the coefficient;
step S37, using cubic equation pair not to satisfy sparsity criterionSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;
s4, mixing different models, carrying out prediction on the analysis data set D, and forming a prediction result by using a Chichi information criterion; performing regression prediction on the analysis data set D by using a Lasso model, wherein the method comprises the following steps of:
step S411, constructing a loss function with a penalty term, wherein the specific formula is as follows:
L=(Y-W T X) T (Y-W T X)+λ||W||
wherein |WI isWThe L1-norm of (a),W=(w 1 ,w 2 ,…,w m ) Representing the weight vector calculated by the model,Y=(y 1 ,y 2 ,…,y n ) It is shown that,λthe amount of the penalty is indicated,Xas in data set DxTransposition of the columns satisfies:
Figure DEST_PATH_IMAGE003
step S412, using coordinate descent method to pairWSolving is carried out;
step S413, according toWSolving the result and calculating Lasso model predicted value L YInformation content of Hechi pool L AIC
In step S4, performing regression prediction on the analysis data set D using an auto. Arima model, including:
step S421, calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;
step S422, according to the parameter estimation result, calculating the auto. ARIMA model predicted value A YInformation content of Hechi pool A AIC
In step S4, performing regression prediction on the analysis data set D by using a time-series multivariate regression model, including:
step S431, calculating the explanation variable in sequenceXAnd the explained variableYThe time sequence incidence relation between the two;
step S432, estimating regression parameters by using a least square method;
step S433, calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result M YInformation content of Hechi pool M AIC
And S5, displaying the prediction result.
2. The method of claim 1, wherein the mixing prediction based on sparse information scenarios is as follows: the step S1 includes the steps of:
s11, according to data access characteristics, constructing a structured data storage specification, and analyzing different files to obtain different data access information;
s12, designing a memory pointer, and dividing storage areas for different data access information;
s13, confirming the data index position, and establishing data association according to the data index;
and S14, setting an index retrieval rule, and enabling the un-retrieved observation value to be NA to form a structured database.
3. The method of claim 1, wherein the mixing prediction based on sparse information scenarios is as follows:
in the step S31, the original data is extracted to obtain a section of [ 2 ], [a,b]Low frequency time series data ofIndex (I)xSection of handlea,b]Is divided intokAn interval [ () ]x 0 ,x 1 ),(x 1 ,x 2 ),…,(x k-1 ,x k )]All together havek+1 nodes, end point ofx 0 =ax 1 =bEach subinterval (x i ,x i+1 ) In the above-mentioned order of magnitude,S(x)=S i (x) For a cubic spline equation, all points satisfy the interpolation conditionS(x i )= y i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS i (x i )= y i S i (x i+1 )= y i+1 (i=0,1,…,n-1);
In step S32, the step length ish i = x i+1 - x i
In step S33, the first boundary condition is a natural boundary: the second derivative of the endpoint is designated as 0M i = S’’ i (x i ) The matrix equation is:S’’(x 0 ) =0= S’’(x n )
Figure DEST_PATH_IMAGE004
the second boundary condition is a fixed boundary: the designated end point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ 0 (x 0 ) =A,S’ n-1 (x n ) = B, matrix equation:
Figure DEST_PATH_IMAGE005
the third boundary condition is a non-kinked boundary: forcing the third derivative value of the first interpolated point to be equal to the third derivative value of the second point and the third derivative value of the last first point to be equal to the third derivative value of the second to last point, i.e.S’’’ 0 (x 0 ) = S’’’ 1 (x 1 ),S’’’ n-2 (x n-1 )= S’’’ n-1 (x n-1 ) The matrix equation is:
Figure DEST_PATH_IMAGE006
in step S34, the matrix equation is solved to obtain the second order differential valueM i (i=0,1,…,n);
In the step S35: byM i Obtaining coefficients of a spline interpolation function:
Figure DEST_PATH_IMAGE007
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
in the step S36: according to the coefficient, in each subintervalx i xx i+1 In (1), creating a cubic equation:
g i (x)=a i +b i (x-x i )+c i (x-x i ) 2 +d i (x-x i ) 3
in step S37, the analysis data set is formed as follows:
Figure DEST_PATH_IMAGE011
4. the method of claim 3, wherein the mixing prediction based on the sparse information scenario comprises: in step S34, the coefficient matrix is a tri-diagonal matrix, and the LU decomposition is performed on the coefficient matrix to obtain a unit lower triangular matrix and a unit upper triangular matrix, that is, the unit lower triangular matrix and the unit upper triangular matrix
B=Ax=(LU)x=L(Ux)=Ly
Wherein the content of the first and second substances,La lower triangular matrix is represented by a lower triangular matrix,Urepresenting the upper triangular matrix.
5. The mixing prediction method based on the sparse information situation as claimed in claim 1, wherein: in the step S4, according to the information quantity index AIC of each model Chichi pool, a model importance index is constructedR i Satisfies the following conditions:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,R i to representiAn importance index of the model;
constructing importance weights using the model importance indicators, wherein:
Figure DEST_PATH_IMAGE013
the prediction result is carried out according to the importance index of the modelWeighted integration, output prediction results
Figure DEST_PATH_IMAGE014
And satisfies the following conditions:
Figure DEST_PATH_IMAGE015
wherein
Figure 699810DEST_PATH_IMAGE014
Representing the predicted results of different models.
6. The mixing prediction method based on the sparse information situation as claimed in claim 1, wherein: in step S5, the prediction result is stored, converted into shape data, and displayed by a visualization technique, including:
step S51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely a date index, a prediction index actual value and a prediction value;
and S52, storing the prediction result into the data table in the S51, and providing an API data interface.
7. A frequency mixing prediction device based on information sparse situation comprises an integration module, a frequency conversion target determination module, an analysis data set generation module, a mixing prediction module and a display module, and is characterized in that:
the integration module analyzes the mixing data and integrates different types of data;
the frequency conversion target determining module is used for constructing an information sparsity criterion and determining a frequency conversion target, and the execution process is as follows:
obtaining an analysis data matrix according to user retrieval data;
Figure 679268DEST_PATH_IMAGE001
(1)
wherein, the first and the second end of the pipe are connected with each other,xrepresenting a user-specified index of the input data,yrepresents the index to be predicted specified by the user,mindicates the number of fields in the data set,nrepresenting the number of observations in the dataset;
setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns; the concrete formula is as follows:
Figure 68792DEST_PATH_IMAGE002
wherein is the first one retrieved according to the data indexjThe first of the indexiThe observed value of the data, i.e. in (1)iGo to the firstjThe columns of the image data are,nan index for the last observation;
setting information sparsity criterionSIf at alls<SEntering an analysis data set generation module, otherwise, directly entering a hybrid prediction module;
the analysis data set generation module processes the data by using a cubic spline interpolation method for time series frequency and does not meet the sparsity criterionSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;
extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, dividing the specific interval into subintervals, wherein each subinterval satisfies a cubic spline equation;
calculating the step length of each data node in the subinterval;
under three boundary conditions, filling the data nodes and the specified head end point conditions into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary, and a non-kinking boundary;
solving a matrix equation to obtain a quadratic differential value;
obtaining the coefficient of a spline interpolation function from the second differential value;
according to the coefficient, in each subinterval, a cubic equation is created;
not satisfying sparsity criterion using cubic equation pairsSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;
the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the akage pool information criterion; regression prediction was performed on the analysis dataset D using the Lasso model, and the procedure was performed as follows:
constructing a loss function with a penalty term, wherein a specific formula is as follows:
L=(Y-W T X) T (Y-W T X)+λ||W||
wherein |WI isWThe L1-norm of (a),W=(w 1 ,w 2 ,…,w m ) Representing the weight vector calculated by the model,Y=(y 1 ,y 2 ,…,y n ) It is shown that the process of the present invention,λthe amount of penalty is indicated and,Xas in data set DxTransposition of the columns satisfies:
Figure 260739DEST_PATH_IMAGE003
using coordinate descent method pairWSolving is carried out;
according toWSolving the result and calculating Lasso model predicted value L YInformation content of Hechi pool L AIC
Regression prediction of analysis data set D using auto. Arima model, comprising:
calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;
calculating auto. ARIMA model prediction value according to parameter estimation result A YInformation content of Hechi pool A AIC
Performing regression prediction on the analysis data set D by using a time-series multivariate regression model, wherein the regression prediction comprises the following steps:
calculating the explanatory variables in turnXAnd an interpreted variableYTime sequence correlation relationship between;
Estimating regression parameters by using a least square method;
calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result M YInformation content of Hechi pool M AIC
And the display module is used for displaying the prediction result.
8. A mixed prediction device based on information sparse situations, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors execute the executable code to implement a mixed prediction method based on information sparse situations according to any one of claims 1 to 6.
CN202211238794.6A 2022-10-11 2022-10-11 Frequency mixing prediction method, device and equipment based on information sparse situation Active CN115309754B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211238794.6A CN115309754B (en) 2022-10-11 2022-10-11 Frequency mixing prediction method, device and equipment based on information sparse situation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211238794.6A CN115309754B (en) 2022-10-11 2022-10-11 Frequency mixing prediction method, device and equipment based on information sparse situation

Publications (2)

Publication Number Publication Date
CN115309754A CN115309754A (en) 2022-11-08
CN115309754B true CN115309754B (en) 2023-03-10

Family

ID=83867667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211238794.6A Active CN115309754B (en) 2022-10-11 2022-10-11 Frequency mixing prediction method, device and equipment based on information sparse situation

Country Status (1)

Country Link
CN (1) CN115309754B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008244512A (en) * 2007-03-23 2008-10-09 Institute Of Physical & Chemical Research Multimedia information providing system, server, terminal, multimedia information providing method and program
CN103178853A (en) * 2013-03-21 2013-06-26 哈尔滨工业大学 Compressive-sensing-based sparse signal under-sampling method and implementation device
JP2018194904A (en) * 2017-05-12 2018-12-06 株式会社情報医療 Prediction system, prediction method, and prediction program
CN109933621A (en) * 2019-03-20 2019-06-25 合肥黎曼信息科技有限公司 A kind of macroeconomy multi-source mixing big data modeling method
CN114756605A (en) * 2022-06-14 2022-07-15 之江实验室 Frequency mixing prediction method and system based on time series data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11556789B2 (en) * 2019-06-24 2023-01-17 Tata Consultancy Services Limited Time series prediction with confidence estimates using sparse recurrent mixture density networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008244512A (en) * 2007-03-23 2008-10-09 Institute Of Physical & Chemical Research Multimedia information providing system, server, terminal, multimedia information providing method and program
CN103178853A (en) * 2013-03-21 2013-06-26 哈尔滨工业大学 Compressive-sensing-based sparse signal under-sampling method and implementation device
JP2018194904A (en) * 2017-05-12 2018-12-06 株式会社情報医療 Prediction system, prediction method, and prediction program
CN109933621A (en) * 2019-03-20 2019-06-25 合肥黎曼信息科技有限公司 A kind of macroeconomy multi-source mixing big data modeling method
CN114756605A (en) * 2022-06-14 2022-07-15 之江实验室 Frequency mixing prediction method and system based on time series data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Expected stock return and mixed frequency variance risk premium data;Ruobing Liu等;《Journal of ambient intelligence & humanized computing》;20201231;全文 *
混频高维动态波动矩阵的统计估计与预测;刘丽萍;《万方平台》;20210906;全文 *

Also Published As

Publication number Publication date
CN115309754A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
Harte PtProcess: An R package for modelling marked point processes indexed by time
Davis et al. Measures of serial extremal dependence and their estimation
KR100264633B1 (en) Quasi-random number generator apparatus and method, and multiple integration apparatus and method of function
US8595155B2 (en) Kernel regression system, method, and program
CN113268403B (en) Time series analysis and prediction method, device, equipment and storage medium
Borgnat et al. Scale invariances and lamperti transformations for stochastic processes
US20160282821A1 (en) Management of complex physical systems using time series segmentation to determine behavior switching
EP2051175A1 (en) Method and device for generating a model of a multiparameter system
Bai et al. How the instability of ranks under long memory affects large-sample inference
CN115309754B (en) Frequency mixing prediction method, device and equipment based on information sparse situation
EP3048566A1 (en) Information processing device and information processing method
CN110781622A (en) Unified probability interval mixed uncertainty propagation analysis method
Ankargren et al. Simulation smoothing for nowcasting with large mixed-frequency VARs
CN111008740B (en) Data propagation trend prediction method, device, storage medium and device
Godolphin et al. Decomposition of time series models in state-space form
Stringer Implementing approximate Bayesian inference using adaptive quadrature: the aghq package
McElroy et al. Computation of the autocovariances for time series with multiple long-range persistencies
Schoonees et al. Flexible graphical assessment of experimental designs in R: The vdg package
Wang et al. Quantification and propagation of Aleatoric uncertainties in topological structures
Rong et al. Multicriteria 0-1 knapsack problems with k-min objectives
JP2014211827A (en) Derivation device, derivation method and derivation program
JP7349811B2 (en) Training device, generation device, and graph generation method
Rysz et al. A scenario decomposition algorithm for stochastic programming problems with a class of downside risk measures
Waldmann et al. Variational approximations in geoadditive latent Gaussian regression: mean and quantile regression
Serre et al. Computational investigations of Bayesian maximum entropy spatiotemporal mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant