CN115309754B

CN115309754B - Frequency mixing prediction method, device and equipment based on information sparse situation

Info

Publication number: CN115309754B
Application number: CN202211238794.6A
Authority: CN
Inventors: 张崇辉; 陈思博; 王永恒; 苏为华; 周家敏; 苏田恬
Original assignee: Zhejiang Gongshang University; Zhejiang Lab
Current assignee: Zhejiang Gongshang University; Zhejiang Lab
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2023-03-10
Anticipated expiration: 2042-10-11
Also published as: CN115309754A

Abstract

The invention discloses a frequency mixing prediction method, a frequency mixing prediction device and frequency mixing prediction equipment based on information sparse situation, wherein different types of data information are integrated by constructing a multi-input frequency mixing data access toolkit; formulating an information sparsity level judgment rule and an index frequency conversion target; if the index information sparse level cannot meet the index frequency conversion requirement, performing interpolation processing on the index set until the frequency conversion requirement is met to form an analysis index set; mixing different models to carry out prediction, and forming a prediction result by utilizing an AIC information criterion; and storing the prediction result, converting the prediction result into shape data, and displaying the shape data through a visualization technology. The invention solves the problem that accurate results cannot be obtained due to too few low-frequency index observations in the frequency mixing prediction analysis.

Description

Frequency mixing prediction method, device and equipment based on information sparse situation

Technical Field

The invention relates to the technical field of data prediction analysis of sparse data information, in particular to a frequency mixing prediction method, a frequency mixing prediction device and frequency mixing prediction equipment based on sparse information.

Background

With the continuous development of digital economy and technology thereof, the information input of modern predictive analysis systems is becoming diversified and complicated. Under the condition of complex information input, the problem of multi-frequency data input is often faced because the update frequency and update time of different data acquisition systems (such as equipment sensors, information sensors, statistical surveys and the like) are inconsistent. For the cold start situation, the data acquisition system with a low update frequency (such as statistical survey) is difficult to provide data with information density meeting the analysis requirement, that is, the problem of information sparsity occurs.

The information sparseness problem generally exists in various cold start prediction analysis systems, such as a newly-built communication base station load prediction system, a newly-built hospital visitor flow prediction system, a high-frequency economy prediction analysis system in a general investigation period and the like.

The traditional prediction analysis system has a plurality of difficulties which are difficult to solve in the prediction work for processing the problem of information sparsity. For example, too few observations can not satisfy key statistical assumptions; the index information amount is too low, so that the traditional data analysis system cannot accurately predict and the like.

Disclosure of Invention

In order to solve the defects of the prior art, realize the long-panel data processing and simultaneously improve the precision of the prediction of sparse information in high-dimensional data analysis, the invention adopts the following technical scheme:

a mixing prediction method based on information sparsity situation comprises the following steps:

s1, analyzing the frequency mixing data and integrating different types of data;

s2, constructing an information sparsity criterion and determining a frequency conversion target, and comprising the following steps of:

s21, obtaining an analysis data matrix according to user retrieval data;

step S22, setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;

step S23, setting information sparsity criterionSIf at alls<SStep S3 is entered, otherwise, step S4 is entered directly;

s3, processing the data by using a cubic spline interpolation method for the time sequence frequency, and processing the data which do not meet the sparsity criterionSThe sequence of (2) is adjusted, and the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D;

s4, mixing different models, carrying out prediction on the analysis data set D, and forming a prediction result by using a Chichi information criterion;

and S5, displaying the prediction result.

Further, the step S1 includes the steps of:

s11, according to data access characteristics, constructing a structured data storage specification, and analyzing different files to obtain different data access information;

s12, designing a memory pointer, and dividing storage areas for different data access information;

s13, confirming the data index position, and establishing data association according to the data index;

and S14, setting an index retrieval rule, and enabling the un-retrieved observation value to be NA to form a structured database.

Further, the step S3 includes the steps of:

s31, extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, and dividing the specific interval into sub-intervals, wherein each sub-interval meets a cubic spline equation;

s32, calculating the step length of each data node in the subinterval;

s33, under three boundary conditions, filling the data nodes and the specified head end point condition into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary, and a non-kinking boundary;

s34, solving a matrix equation to obtain a second differential value;

step S35, obtaining coefficients of a spline interpolation function through the secondary differential values;

s36, creating a cubic equation in each subinterval according to the coefficient;

step S37, using cubic equation pair not to satisfy sparsity criterionSAnd the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D.

Further, in the step S31, the original data is extracted to obtain a section of [ 2 ]a,b]Low frequency time series data index ofxSection of handlea,b]Is divided intokAn interval [ () ]x ₀ ,x ₁ ),(x ₁ ,x ₂ ),…,(x _k-1 ,x _k )]In all, havek+1 nodes, end-point ofx ₀ =a，x ₁ =bEach subinterval (x _i ,x _i+1 ) In the above-mentioned order of magnitude,S(x)=S _i (x) Is a cubic spline equation, all points satisfy the interpolation conditionS(x _i )= y _i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS _i (x _i )= y _i ，S _i (x _i+1 )= y _i+1 (i=0,1,…,n-1)；

In step S32, the step length ish _i = x _i+1 - x _i ；

In step S33, the first boundary condition is a natural boundary: the second derivative of the endpoint is designated as 0M _i = S’’ _i (x _i ) The matrix equation is:S’’(x ₀ ) =0= S’’(x _n )

the second boundary condition is a fixed boundary: the designated end point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ ₀ (x ₀ ) =A，S’ _n-1 (x _n ) = B, the matrix equation is:

the third boundary condition is a non-kinked boundary: forcing the third-order derivative of the first interpolation point to be equal to the third-order derivative of the second point, and the third-order derivative of the last first point to be equal to the third-order derivative of the second point to the reciprocal, i.e.S’’’ ₀ (x ₀ ) =S’’’ ₁ (x ₁ )，S’’’ _n-2 (x _n-1 )= S’’’ _n-1 (x _n-1 ) The matrix equation is:

in step S34, the matrix equation is solved to obtain the second order differential valueM _i (i=0,1,…,n)；

In the step S35: byM _i Obtaining coefficients of a spline interpolation function:

in the step S36: according to the coefficient, in each subintervalx _i ≤x≤x _i+1 In (1), a cubic equation is created:

g _i (x)=a _i +b _i (x-x _i )+c _i (x-x _i ) ² +d _i (x-x _i ) ³

in step S37, the analysis data set is formed as follows:

。

further, in step S34, the coefficient matrix is a tri-diagonal matrix, and the LU decomposition is performed on the coefficient matrix to decompose the coefficient matrix into a unit lower triangular matrix and a unit upper triangular matrix, that is, a unit lower triangular matrix and a unit upper triangular matrix

B=Ax=(LU)x=L(Ux)=Ly

Wherein the content of the first and second substances,La lower triangular matrix is represented, which is,Urepresenting the upper triangular matrix.

Further, in the step S4, a model importance index is constructed according to the model akachi pool information content index AICR _i And satisfies the following conditions:

wherein the content of the first and second substances,R _i to representiAn importance index of the model;

constructing importance weights using the model importance indicators, wherein:

performing weighted integration on the prediction result according to the importance index of the model, and outputting the prediction result

And satisfies the following conditions:

wherein

Representing the predicted results of different models.

Further, in the step S4, performing regression prediction on the analysis data set D by using a Lasso model includes the following steps:

step S411, constructing a loss function with a penalty term, wherein the specific formula is as follows:

L=(Y-W ^T X) ^T (Y-W ^T X)+λ||W||

wherein |WI isWThe L1-norm of (a),W=(w ₁ ,w ₂ ,…,w _m ) Representing the weight vector calculated by the model,Y=(y ₁ ,y ₂ ,…,y _n ) It is shown that the process of the present invention,λthe amount of penalty is indicated and,Xas in data set DxTransposition of the columns satisfies:

step S412, using coordinate descent method to pairWSolving is carried out;

step S413, according toWSolving the result and calculating Lasso model predicted value ^L YInformation content of Hechi pool ^L AIC；

In step S4, performing regression prediction on the analysis data set D using an auto. Arima model, including:

step S421, calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;

step S422, according to the parameter estimation result, calculating the auto. ARIMA model predicted value ^A YInformation content of Hechi pool ^A AIC；

In step S4, performing regression prediction on the analysis data set D by using a time-series multivariate regression model, including:

step S431, calculating the explanation variable in sequenceXAnd an interpreted variableYThe time sequence incidence relation between the two;

step S432, estimating regression parameters by using a least square method;

step S433, calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result ^M YInformation content of Hechi pool ^M AIC。

Further, in step S5, storing the prediction result, converting the prediction result into shape data, and displaying the shape data through a visualization technology, including:

s51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely a date index, a prediction index actual value and a prediction value;

and S52, storing the prediction result into the data table in the S51, and providing an API data interface.

A frequency mixing prediction device based on the information sparse situation is used for executing the frequency mixing prediction method based on the information sparse situation, and comprises an integration module, a frequency conversion target determination module, an analysis data set generation module, a mixing prediction module and a display module;

the integration module analyzes the mixing data and integrates different types of data;

the frequency conversion target determining module is used for constructing an information sparsity criterion and determining a frequency conversion target, and the executing process is as follows:

obtaining an analysis data matrix according to user retrieval data;

setting information rarityDegree of sparsenesssCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;

setting information sparsity criterionSIf, ifs<SStep S3 is entered, otherwise, step S4 is directly entered;

the analysis data set generation module processes the data by using a cubic spline interpolation method for time series frequency and does not meet the sparsity criterionSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;

the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the akage pool information criterion;

and the display module is used for displaying the prediction result.

A mixing prediction device based on information sparse situations comprises a memory and one or more processors, wherein the memory stores executable codes, and the one or more processors are used for realizing the mixing prediction method based on the information sparse situations when executing the executable codes.

The invention has the advantages and beneficial effects that:

the invention solves the problem that under the condition of complex information input, the traditional prediction analysis system is difficult to give out accurate prediction results due to the problem of sparse part of index information in the mixing prediction data set. On the basis of comprehensively integrating various types of data input, the invention provides a relatively objective sparse judgment standard by using the consistency of communication theoretical information; meanwhile, an automatic information filling means is provided by using a cubic spline function method; then, from three perspectives of section, panel and time sequence analysis, a prediction result under the cold start condition is given through statistical modeling; and finally, constructing an importance measure according to the model residual information quantity to integrate the prediction result, thereby ensuring the accuracy, the robustness and the fairness of the prediction result.

Drawings

FIG. 1 is a flow chart of a mixing prediction method based on sparse information.

Fig. 2 is a flowchart of a mixing prediction method based on information sparsity according to an embodiment of the present invention.

FIG. 3 is a graph of the result of cubic spline interpolation in the mixing prediction method based on time series data according to an embodiment of the present invention.

FIG. 4 is a diagram of the Lasso regression prediction result of the mixing prediction method based on time series data according to the embodiment of the present invention.

FIG. 5 is a graph of autoregressive prediction results of a mixing prediction method based on time series data according to an embodiment of the present invention.

FIG. 6 is a time series multiple regression prediction result chart of the mixing prediction method based on time series data according to the embodiment of the present invention.

FIG. 7 is a diagram of a prediction output of the mixing prediction method based on time series data according to the embodiment of the present invention.

Fig. 8 is a schematic structural diagram of a mixing prediction device based on an information sparseness situation according to the present invention.

Fig. 9 is a schematic structural diagram of a mixing prediction device based on an information sparsity situation according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, a mixing prediction method based on information sparsity includes the following steps:

as shown in fig. 2, in the embodiment of the present invention, a structured data storage normalized code is compiled, and a text file, a CSV file, an XML file, a JASON file, an HTML file, and a database file are analyzed and summarized to a structured database; the method specifically comprises the following steps:

step S11, writing a structured data storage normalized feature code according to the data access feature, and analyzing different files to obtain different data access information, wherein the analyzed files include but are not limited to: text files, CSV files, XML files, JASON files, HTML files and database files;

and S14, setting an index retrieval rule to enable the unsearched observed value to be NA, and forming a structured database.

s21, obtaining an analysis data matrix according to the user retrieval data:

(1)

wherein the content of the first and second substances,xrepresenting a user-specified index of the input data,yindicates the index to be predicted specified by the user,mindicating the number of fields in the data set,nrepresenting the number of observations in the dataset;

step S22, setting information sparsitysAnd calculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns, wherein the specific formula is as follows:

wherein is the first one retrieved according to the data indexjThe first of each indexiIndividual data observations, i.e. of (1)iGo to the firstjAnd (4) columns.nAn index for the last observation;

step S23, setting information sparsity criterionS(default is 1), ifs<SStep S3 is entered, otherwise step S4 is entered directly.

Step S3, processing the data by using a cubic spline interpolation method for the time series frequency, wherein the result is shown in figure 3, a cross line represents the interpolation estimated value of a cubic spline, and a dot line represents actual observation, and the processing method comprises the following steps:

step S31, extracting the original data to obtain the interval of [ 2 ]a,b]Low frequency time series data index ofxAngle section [ 2 ]a,b]Is divided intokAn interval [ () ]x ₀ ,x ₁ ),(x ₁ ,x ₂ ),…,(x _k-1 ,x _k )]In all, havek+1 point, end point ofx ₀ =a，x ₁ =bEach subinterval (x _i ,x _i+1 ) In the above-mentioned order of magnitude,S(x)=S _i (x) For a cubic spline equation, all points satisfy the interpolation conditionS(x _i )= y _i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS _i (x _i )= y _i ，S _i (x _i+1 )= y _i+1 (i=0,1,…,n-1)；

Step S32, calculating the step length of each data nodeh _i = x _i+1 - x _i ；

And step S33, filling the data nodes and the specified head end point condition into a matrix equation under the three boundary conditions.

The first boundary condition is a natural boundary: the second derivative of the endpoint is designated as 0M _i = S’’ _i (x _i ) The matrix equation is:S’’(x ₀ ) =0= S’’(x _n )

the second boundary condition is a fixed boundary: designated endThe point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ ₀ (x ₀ ) =A，S’ _n-1 (x _n ) = B, the matrix equation is:

the third boundary condition is a non-kinking boundary: forcing the third derivative value of the first interpolated point to be equal to the third derivative value of the second point and the third derivative value of the last first point to be equal to the third derivative value of the second to last point, i.e.S’’’ ₀ (x ₀ ) =S’’’ ₁ (x ₁ )，S’’’ _n-2 (x _n-1 )= S’’’ _n-1 (x _n-1 ) The matrix equation is:

s34, solving a matrix equation to obtain a second differential valueM _i (i=0,1,…,n). The matrix is a tri-diagonal matrix, and LU decomposition can be performed on the coefficient matrix to obtain a unit lower triangular matrix and a unit upper triangular matrix, namely

B=Ax=(LU)x=L(Ux)=Ly

Wherein, L represents a lower triangular matrix, and U represents an upper triangular matrix;

step S35, byM _i Obtaining the coefficient of a spline interpolation function;

step S36, in each subintervalx _i ≤x≤x _i+1 In (1), creating a cubic equation:

g _i (x)=a _i +b _i (x-x _i )+c _i (x-x _i ) ² +d _i (x-x _i ) ³

step S37, using cubic equation to satisfy sparsity criterionSAnd (2) adjusting the sequence, and replacing the corresponding index data in the step (1) by the interpolated sequence to form an analysis data set D, wherein the analysis data set D satisfies the following conditions:

。

s4, mixing different models to carry out prediction, and forming a prediction result by using AIC (Akaike information criterion) information;

step S41, performing regression prediction on the analysis data set D by using a Lasso model, as shown in fig. 4, where a cross line represents a predicted value, and a solid line represents an actual value, and the method includes:

L=(Y-W ^T X) ^T (Y-W ^T X)+λ||W||

wherein |WI isWThe L1-norm of (a),W=(w ₁ ,w ₂ ,…,w _m ) Representing the weight vector calculated by the model,Y=(y ₁ ,y ₂ ,…,y _n ) It is shown that,λindicating a penalty (default 1) given by a person,Xfor the data set D "xTranspose of column ", satisfy:

step S412, using coordinate descent method to pairWSolving is carried out;

step S413, according toWSolving the result and calculating Lasso model predicted value ^L YInformation content of Hechi pool ^L AIC。

Step S42, performing regression prediction on the analysis data set D by using auto.

step S422, according to the parameter estimation result, calculating the auto. ARIMA model prediction value ^A YInformation content of Hechi pool ^A AIC。

Step S43, performing regression prediction on the analysis data set D by using the time-series multivariate regression model, as shown in fig. 6, includes:

step S432, estimating regression parameters by using a least square method;

step S433, calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result ^M YInformation content of Hechi pool ^M AIC；

Step S44, constructing model importance index according to model residual information content index AICR _i And satisfies the following conditions:

wherein, the first and the second end of the pipe are connected with each other,R _i to representiThe importance indexes of the model, L, M and A, respectively represent a Lasso model, a multivariate time sequence regression model and an auto. ARIMA model;

step S45, construct importance weight by using model importance index (w _L ,w _M ,w _A ) Wherein:

TABLE 1 summary of the parameters

S46, carrying out weighted integration on the prediction result according to the model importance index, and outputting the prediction result

And satisfies the following conditions:

as shown in table 2, the predicted values of the three prediction models are shown, and the total predicted value of the total prediction is shown.

TABLE 2 prediction results of actual utilization of foreigner sequences

Step S5, storing the prediction result

And converted into shape data and displayed by a visualization technology, as shown in fig. 7, including:

s51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely date, Y _ act and Y _ pre, and respectively comprises a date index, a prediction index actual value and a prediction value;

step S52, predicting the result

Storing the data into the data table in the step S51 and providing an API data interface.

As shown in fig. 8, a mixing prediction apparatus based on information sparsity situation, for performing the mixing prediction method based on information sparsity situation, includes an integration module, a frequency translation target determination module, an analysis data set generation module, a mixing prediction module, and a presentation module;

the integration module is used for analyzing the frequency mixing data and integrating different types of data;

the frequency conversion target determining module is used for constructing an information sparsity criterion and determining a frequency conversion target, and the execution process is as follows:

obtaining an analysis data matrix according to user retrieval data;

setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns;

the analysis data set generation module processes the data by using a cubic spline interpolation method for time series frequency and does not meet the sparsity criterionSThe sequence of (2) is adjusted, and the interpolated sequence is used for replacing the corresponding index data in the original data matrix to form an analysis data set D;

the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the Chichi information criterion;

and the display module is used for displaying the prediction result.

The implementation of this part is similar to that of the above method embodiment, and is not described here again.

Corresponding to the embodiment of the mixing prediction method based on the information sparsity situation, the invention also provides an embodiment of the mixing prediction device based on the information sparsity situation.

Referring to fig. 9, the mixing prediction apparatus based on the information sparsity situation according to the embodiment of the present invention includes a memory and one or more processors, where the memory stores executable codes, and the one or more processors execute the executable codes to implement a mixing prediction method based on the information sparsity situation in the foregoing embodiments.

The embodiment of the mixing prediction device based on the information sparsity situation of the invention can be applied to any device with data processing capability, such as a computer or other devices or apparatuses. The apparatus embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for running through the processor of any device with data processing capability. In terms of hardware, as shown in fig. 9, a hardware structure diagram of an arbitrary device with data processing capability where a frequency mixing prediction device based on an information sparsity situation is located according to the present invention is shown, where in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 9, the arbitrary device with data processing capability where the apparatus is located in the embodiment may generally include other hardware according to an actual function of the arbitrary device with data processing capability, and details thereof are not repeated.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

An embodiment of the present invention further provides a computer-readable storage medium, on which a program is stored, where the program, when executed by a processor, implements a frequency mixing prediction method based on information sparsity in the foregoing embodiments.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be any external storage device of a device with data processing capabilities, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing capable device, and may also be used for temporarily storing data that has been output or is to be output.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the embodiments of the present invention in nature.

Claims

1. A frequency mixing prediction method based on information sparsity is characterized by comprising the following steps:

s1, analyzing the mixing data and integrating different types of data;

s21, obtaining an analysis data matrix according to user retrieval data;

（1）

wherein the content of the first and second substances,xrepresenting a user-specified index of the input data,yrepresents the index to be predicted specified by the user,mindicates the number of fields in the data set,nrepresenting the number of observations in the dataset;

step S22, setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns; the specific formula is as follows:

wherein is the first one retrieved according to the data indexjThe first of each indexiIndividual data observations, i.e. of (1)iGo to the firstjThe columns of the image data are,nan index for the last observation;

step S23, setting information sparsity criterionSIf at alls<SStep S3 is entered, otherwise, step S4 is directly entered;

s31, extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, dividing the specific interval into subintervals, and enabling each subinterval to meet a cubic spline equation;

s32, calculating the step length of each data node in the subinterval;

s33, under three boundary conditions, filling the data nodes and the specified head end point condition into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary and a non-kinking boundary;

s34, solving a matrix equation to obtain a second differential value;

step S35, obtaining coefficients of a spline interpolation function through the secondary differential value;

step S37, using cubic equation pair not to satisfy sparsity criterionSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;

s4, mixing different models, carrying out prediction on the analysis data set D, and forming a prediction result by using a Chichi information criterion; performing regression prediction on the analysis data set D by using a Lasso model, wherein the method comprises the following steps of:

L=(Y-W ^T X) ^T (Y-W ^T X)+λ||W||

wherein |WI isWThe L1-norm of (a),W=(w ₁ ,w ₂ ,…,w _m ) Representing the weight vector calculated by the model,Y=(y ₁ ,y ₂ ,…,y _n ) It is shown that,λthe amount of the penalty is indicated,Xas in data set DxTransposition of the columns satisfies:

step S412, using coordinate descent method to pairWSolving is carried out;

step S431, calculating the explanation variable in sequenceXAnd the explained variableYThe time sequence incidence relation between the two;

step S432, estimating regression parameters by using a least square method;

And S5, displaying the prediction result.

2. The method of claim 1, wherein the mixing prediction based on sparse information scenarios is as follows: the step S1 includes the steps of:

3. The method of claim 1, wherein the mixing prediction based on sparse information scenarios is as follows:

in the step S31, the original data is extracted to obtain a section of [ 2 ], [a,b]Low frequency time series data ofIndex (I)xSection of handlea,b]Is divided intokAn interval [ () ]x ₀ ,x ₁ ),(x ₁ ,x ₂ ),…,(x _k-1 ,x _k )]All together havek+1 nodes, end point ofx ₀ =a，x ₁ =bEach subinterval (x _i ,x _i+1 ) In the above-mentioned order of magnitude,S(x)=S _i (x) For a cubic spline equation, all points satisfy the interpolation conditionS(x _i )= y _i (i=0,1,…,k) Except for two endpoints, allk-each of the 1 interior points satisfiesS _i (x _i )= y _i ，S _i (x _i+1 )= y _i+1 (i=0,1,…,n-1)；

In step S32, the step length ish _i = x _i+1 - x _i ；

the second boundary condition is a fixed boundary: the designated end point is first order conductive, the differential values of the nodes at both ends of the data are known, set as A and B,S’ ₀ (x ₀ ) =A，S’ _n-1 (x _n ) = B, matrix equation:

the third boundary condition is a non-kinked boundary: forcing the third derivative value of the first interpolated point to be equal to the third derivative value of the second point and the third derivative value of the last first point to be equal to the third derivative value of the second to last point, i.e.S’’’ ₀ (x ₀ ) = S’’’ ₁ (x ₁ )，S’’’ _n-2 (x _n-1 )= S’’’ _n-1 (x _n-1 ) The matrix equation is:

in the step S36: according to the coefficient, in each subintervalx _i ≤x≤x _i+1 In (1), creating a cubic equation:

g _i (x)=a _i +b _i (x-x _i )+c _i (x-x _i ) ² +d _i (x-x _i ) ³

in step S37, the analysis data set is formed as follows:

。

4. the method of claim 3, wherein the mixing prediction based on the sparse information scenario comprises: in step S34, the coefficient matrix is a tri-diagonal matrix, and the LU decomposition is performed on the coefficient matrix to obtain a unit lower triangular matrix and a unit upper triangular matrix, that is, the unit lower triangular matrix and the unit upper triangular matrix

B=Ax=(LU)x=L(Ux)=Ly

Wherein the content of the first and second substances,La lower triangular matrix is represented by a lower triangular matrix,Urepresenting the upper triangular matrix.

5. The mixing prediction method based on the sparse information situation as claimed in claim 1, wherein: in the step S4, according to the information quantity index AIC of each model Chichi pool, a model importance index is constructedR _i Satisfies the following conditions:

constructing importance weights using the model importance indicators, wherein:

the prediction result is carried out according to the importance index of the modelWeighted integration, output prediction results

And satisfies the following conditions:

wherein

Representing the predicted results of different models.

6. The mixing prediction method based on the sparse information situation as claimed in claim 1, wherein: in step S5, the prediction result is stored, converted into shape data, and displayed by a visualization technique, including:

step S51, establishing a conclusion data table, wherein the conclusion data table comprises three fields, namely a date index, a prediction index actual value and a prediction value;

7. A frequency mixing prediction device based on information sparse situation comprises an integration module, a frequency conversion target determination module, an analysis data set generation module, a mixing prediction module and a display module, and is characterized in that:

obtaining an analysis data matrix according to user retrieval data;

（1）

wherein, the first and the second end of the pipe are connected with each other,xrepresenting a user-specified index of the input data,yrepresents the index to be predicted specified by the user,mindicates the number of fields in the data set,nrepresenting the number of observations in the dataset;

setting information sparsitysCalculating and analyzing the information sparsity of all input data indexes in the data matrix according to columns; the concrete formula is as follows:

wherein is the first one retrieved according to the data indexjThe first of the indexiThe observed value of the data, i.e. in (1)iGo to the firstjThe columns of the image data are,nan index for the last observation;

setting information sparsity criterionSIf at alls<SEntering an analysis data set generation module, otherwise, directly entering a hybrid prediction module;

extracting original data, acquiring low-frequency time sequence data indexes of a specific interval, dividing the specific interval into subintervals, wherein each subinterval satisfies a cubic spline equation;

calculating the step length of each data node in the subinterval;

under three boundary conditions, filling the data nodes and the specified head end point conditions into a matrix equation; the three boundary conditions include a natural boundary, a fixed boundary, and a non-kinking boundary;

solving a matrix equation to obtain a quadratic differential value;

obtaining the coefficient of a spline interpolation function from the second differential value;

according to the coefficient, in each subinterval, a cubic equation is created;

not satisfying sparsity criterion using cubic equation pairsSThe sequence of (D) is adjusted, and the interpolated sequence is used for replacing corresponding index data in the original data matrix to form an analysis data set D;

the mixed prediction module is used for mixing different models, carrying out prediction on the analysis data set D and forming a prediction result by using the akage pool information criterion; regression prediction was performed on the analysis dataset D using the Lasso model, and the procedure was performed as follows:

constructing a loss function with a penalty term, wherein a specific formula is as follows:

L=(Y-W ^T X) ^T (Y-W ^T X)+λ||W||

using coordinate descent method pairWSolving is carried out;

according toWSolving the result and calculating Lasso model predicted value ^L YInformation content of Hechi pool ^L AIC；

Regression prediction of analysis data set D using auto. Arima model, comprising:

calling an auto. ARIMA algorithm package, constructing an ARIMA model, and obtaining a parameter estimation result;

calculating auto. ARIMA model prediction value according to parameter estimation result ^A YInformation content of Hechi pool ^A AIC；

Performing regression prediction on the analysis data set D by using a time-series multivariate regression model, wherein the regression prediction comprises the following steps:

calculating the explanatory variables in turnXAnd an interpreted variableYTime sequence correlation relationship between；

Estimating regression parameters by using a least square method;

calculating the predicted value of the time sequence multiple regression model according to the parameter estimation result ^M YInformation content of Hechi pool ^M AIC；

And the display module is used for displaying the prediction result.

8. A mixed prediction device based on information sparse situations, comprising a memory and one or more processors, wherein the memory stores executable code, and the one or more processors execute the executable code to implement a mixed prediction method based on information sparse situations according to any one of claims 1 to 6.