CN115705411A - Data prediction algorithm selection method, data prediction method and data prediction device - Google Patents

Data prediction algorithm selection method, data prediction method and data prediction device Download PDF

Info

Publication number
CN115705411A
CN115705411A CN202111213110.2A CN202111213110A CN115705411A CN 115705411 A CN115705411 A CN 115705411A CN 202111213110 A CN202111213110 A CN 202111213110A CN 115705411 A CN115705411 A CN 115705411A
Authority
CN
China
Prior art keywords
data
trend
prediction
sequence
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111213110.2A
Other languages
Chinese (zh)
Inventor
安藤
王俊
郭建徽
徐成全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Betasoft Co ltd
Shanghai Yugen Information Technology Co ltd
Original Assignee
Shanghai Betasoft Co ltd
Shanghai Yugen Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Betasoft Co ltd, Shanghai Yugen Information Technology Co ltd filed Critical Shanghai Betasoft Co ltd
Publication of CN115705411A publication Critical patent/CN115705411A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data prediction algorithm selection method, a data prediction method and a data prediction device, which are used for carrying out non-parameter trend judgment and prediction on data sequences in various forms in a mode of acquiring historical time sequence data, judging data characteristics of the historical time sequence data, acquiring trend indexes of the historical time sequence data according to the data characteristics, selecting a prediction algorithm according to the trend indexes and the data characteristics and carrying out prediction according to the selected prediction algorithm, measuring the reliability of a prediction result through a confidence coefficient, and carrying out correction regression on a short-term prediction result with a low confidence coefficient by using a long-term prediction result with a high confidence coefficient.

Description

Data prediction algorithm selection method, data prediction method and data prediction device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data prediction algorithm selection method, a data prediction method and a data prediction device.
Background
In the IT operation and maintenance management process, a large amount of index data such as a host CPU, the memory utilization rate and the like are collected, if the future state information of a managed object is mastered in time, prediction needs to be carried out based on the collected index information, and the predicted value of the index of a plurality of days in the future is obtained.
The process involves data prediction, but the difficulty of the problem lies in how to select a reasonable prediction algorithm for various time series data with different forms, and then obtain a prediction result which is in accordance with expectations. In the prior art, the error values of prediction results of different prediction algorithms are mostly compared to select the algorithm with the smallest error as the target prediction algorithm. However, in practical situations, the form of the index data is constantly changing, most of the index data presents a trend characteristic of random walk, and the actual demand is difficult to meet based on the prediction idea of the algorithm model.
Disclosure of Invention
In view of the above-described shortcomings, the present invention provides a data prediction algorithm selection method, a data prediction method, and a data prediction apparatus, which can perform non-parametric trend determination and prediction for data sequences of various forms.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
a data prediction algorithm selection method, the data prediction algorithm selection method comprising the steps of:
acquiring historical time sequence data;
judging the data characteristics of historical time sequence data;
acquiring a trend index of historical time series data according to the data characteristics;
and selecting a prediction algorithm according to the trend index and the data characteristics.
According to one aspect of the invention, the determining the data characteristic of the historical time series data comprises: and performing time series data curve feature extraction, and generating characteristic indexes of the curve from multiple dimensions.
According to one aspect of the invention, the performing time series data curve feature extraction, generating a feature index of a curve from a plurality of dimensions comprises: and extracting trend characteristics of the data sequence, wherein the trend characteristics can comprise various types.
According to one aspect of the invention, the obtaining of the trend index of the historical time series data according to the data characteristics comprises: after the trend characteristics are extracted, the qualitative trend characteristics are converted into quantitative trend indexes through the decision tree so as to quantitatively measure the trend degree of the data sequence.
According to one aspect of the invention, the obtaining of the trend index of the historical time series data according to the data characteristics comprises: after various types of trend characteristics are obtained, the weight of each trend characteristic in trend judgment is determined by combining the statistical characteristics of the data sequence, and finally the overall trend index of the data sequence is calculated.
In accordance with one aspect of the invention, the statistical features include mean bias, median bias, and fluctuation fraction; each trend feature has a corresponding weight function: weight = F (mean bias) + F (median bias) + F (fluctuation ratio); the trend index of the data series is a weighted average of the individual trend characteristic indices.
According to one aspect of the invention, the selecting a prediction algorithm based on the trend index and the data characteristic comprises: and executing the data by a prediction algorithm judger according to the trend index and the data characteristics, and selecting a proper prediction algorithm.
According to one aspect of the invention, selecting a predictive algorithm based on the trend index and the data characteristic comprises: judging stationarity through data characteristics, and selecting a preset prediction algorithm if the stationarity is a stationary sequence; and if the sequence is not stable, performing autocorrelation judgment, if the sequence accords with autocorrelation, selecting a preset prediction algorithm, and if the sequence does not accord with autocorrelation, selecting the preset prediction algorithm according to the trend index.
A data prediction method, the data prediction method comprising the steps of:
acquiring historical time sequence data;
judging the data characteristics of historical time sequence data;
acquiring a trend index of historical time series data according to the data characteristics;
selecting a prediction algorithm according to the trend index and the data characteristics;
and performing prediction according to the selected prediction algorithm.
According to an aspect of the invention, the determining the data characteristics of the historical time series data comprises: and performing time series data curve feature extraction, and generating characteristic indexes of the curve from multiple dimensions.
According to one aspect of the invention, the performing time series data curve feature extraction, generating a feature index of a curve from a plurality of dimensions comprises: and extracting trend characteristics of the data sequence, wherein the trend characteristics can comprise various types.
According to one aspect of the invention, the obtaining of the trend index of the historical time series data according to the data characteristics comprises: after the trend characteristics are extracted, the qualitative trend characteristics are converted into quantitative trend indexes through the decision tree so as to quantitatively measure the trend degree of the data sequence.
According to one aspect of the invention, the obtaining of the trend index of the historical time series data according to the data characteristics comprises: after various types of trend characteristics are obtained, the weight of each trend characteristic in trend judgment is determined by combining the statistical characteristics of the data sequence, and finally the overall trend index of the data sequence is calculated.
In accordance with one aspect of the invention, the statistical features include mean bias, median bias, and fluctuation fraction; each trend feature has a corresponding weight function: weight = F (mean bias) + F (median bias) + F (fluctuation ratio); the trend index of the data series is a weighted average of the individual trend characteristic indices.
According to one aspect of the invention, the selecting a prediction algorithm based on the trend index and the data characteristic comprises: and executing the data by a prediction algorithm judger according to the trend index and the data characteristics, and selecting a proper prediction algorithm.
According to one aspect of the invention, said predicting according to the selected prediction algorithm comprises: and respectively predicting the same data sequence from different dimensions of short, medium and long lengths, and performing correction regression on the short-term prediction result by using the prediction value with higher reliability in the medium and long periods to obtain a more reasonable prediction result.
An apparatus for data prediction, the apparatus comprising:
an acquisition unit configured to acquire historical time series data;
the data characteristic acquisition unit is used for judging the data characteristics of the historical time sequence data;
the trend index judging unit is used for obtaining a trend index of historical time sequence data according to the data characteristics;
the prediction algorithm judging unit is used for selecting a prediction algorithm according to the trend index and the data characteristics;
and the prediction unit is used for predicting according to the selected prediction algorithm.
The implementation of the invention has the advantages that: the data prediction algorithm selection method, the data prediction method and the data prediction device of the invention carry out non-parameter trend judgment and prediction on data sequences of various forms by acquiring historical time sequence data, judging data characteristics of the historical time sequence data, acquiring trend indexes of the historical time sequence data according to the data characteristics, selecting a prediction algorithm according to the trend indexes and the data characteristics, and carrying out prediction according to the selected prediction algorithm, measure the reliability of prediction results through confidence degrees, and carry out correction regression on short-term prediction results with low confidence degrees by using long-term prediction results with high confidence degrees; time series data suitable for any shape and any data quantity (length > = 1); the prediction degree (degree) X is to obtain the next X prediction values through a given data sequence; for example, the original data is the host CPU utilization in the last 30 days, and when the prediction degree is 7, the prediction algorithm returns the CPU utilization in the next seven days as the prediction result; and selecting a proper prediction algorithm for prediction, wherein the reliability of the prediction result is reduced along with the increase of the prediction degree. And respectively predicting the same data sequence from different dimensions of short (day), middle (week) and long (month), and then correcting and regressing the short-term prediction result by utilizing the prediction value with higher reliability in the middle and long term so as to obtain a more reasonable prediction result.
Index data in the IT operation and maintenance field, such as CPU utilization rate and memory utilization rate, often have large fluctuation in transient state or short period, present random curve characteristics, and have no great guiding significance and reference value in management decision; the data prediction method of the invention predicts the trend of the data sequence, and explores the trend characteristics of curve form from a plurality of characteristic dimensions of time domain, frequency domain, space domain and the like of the time sequence data, and then carries out prediction, thereby not only effectively avoiding the interference of short-term fluctuation, but also fully considering various curve trend characteristics of short, medium and long periods.
Because the trend itself is the potential energy driving the data curve to move about, and therefore has inertia, under the action of inertia, the curve shape will exhibit a certain degree of stability, and this stability in turn supports the possibility and value of trend prediction. Based on the demonstration, the data prediction method provided by the invention is based on the trend characteristics of the fully-mined data sequence, performs trend judgment, classification, prediction and correction on the time sequence data in different forms, and has the advantages of wide application range, strong self-adaption, high prediction reliability and the like.
The data prediction method provided by the invention judges and classifies the curve forms based on the curve characteristics of the time sequence data, and selects an applicable algorithm according to the classification result, thereby reducing the blindness of algorithm selection. Because the set of curve features is open and expandable, in the application practice of the scheme, feature items can be continuously accumulated and newly added, and a decision and classification model is optimized, so that the effective performance of the prediction scheme is continuously evolved and improved in a specific application scene.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a data prediction method according to the present invention;
FIG. 2 is a schematic flow chart of a prediction algorithm selection decision device according to the present invention;
FIG. 3 is a graph illustrating the relationship between the prediction power and the confidence level according to the present invention;
FIG. 4 is a schematic diagram illustrating the point location characteristics of a data sequence according to the present invention;
FIG. 5 is a schematic diagram of data sequence density features according to the present invention;
FIG. 6 is a schematic diagram of extreme characteristic of a data sequence according to the present invention;
FIG. 7 is a schematic diagram of a data prediction implementation according to an embodiment of the present invention;
FIG. 8 is a graph of a piecewise optimization function according to an embodiment of the present invention;
FIG. 9 is a graph of a logarithmic optimization function according to an embodiment of the present invention;
FIG. 10 is a graph of an exponential optimization function according to an embodiment of the present invention;
FIG. 11 is a graph of a hyperbolic tangent-type optimization function according to an embodiment of the present invention;
FIG. 12 is a graph illustrating an exemplary hyperbolic tangent optimization function according to an embodiment of the present invention;
fig. 13 is a flowchart of a data prediction method according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under the permission of a specific order or sequence, so that the embodiments of the present application described herein can be implemented in an order other than that shown or described herein.
The embodiment of the application provides a data prediction algorithm selection method and a data prediction method, the method is applied to electronic equipment, the functions realized by the method can be realized by calling program codes through a processor in the electronic equipment, and the program codes can be saved in a storage medium of the electronic equipment.
Example one
As shown in fig. 1, 2, 3, 4, 5, 6 and 7, a data prediction algorithm selection method includes the steps of:
step S1: acquiring historical time sequence data;
here, the electronic device may be various types of devices having information processing capability, such as a mobile phone, a PDA (personal digital assistant), a navigator, a digital phone, a video phone, a smart watch, a smart band, a wearable device, a tablet computer, a kiosk, a PLC, a PC computer, a server, and the like.
The time-series data is time-series data, and the time-series data is a data sequence recorded in time series by the same uniform index. Therefore, the historical time series data in the embodiment of the present application may be time series data acquired before the current time, which is also called a data series.
In the embodiment of the application, the data in the actual environment can be continuously collected in real time to form a historical data pool. The historical time series data may be various index data generated along with time in an actual environment, such as index data of a CPU, a memory, a flow rate, and the like.
Step S2: judging the data characteristics of historical time sequence data;
the step S2 of determining the data characteristics of the historical time series data includes: and performing time series data curve feature extraction, and generating characteristic indexes of the curve from multiple dimensions. Defining curve characteristics refers to potential information to be mined, which is contained in the form of curves by time series data, each characteristic is called a characteristic item of the curve, and a set of the characteristic items is called a characteristic pool. When feature extraction is performed on time series data, three dimensions can be used: time dimension, frequency dimension, space dimension. The data characteristics may include: trend features, statistical features, autocorrelation features, and the like. Wherein, the statistical characteristics are as follows: carrying out statistical analysis on given time sequence data to obtain statistical information such as a maximum value, a minimum value, a mean value, a median, a variance and the like; the autocorrelation characteristics are: and analyzing the autocorrelation of the time series data, and if the autocorrelation is obvious, solving the maximum data point distance of the autocorrelation system. For example, in the time dimension, a group of time series data can be clustered by different time units to generate a plurality of groups of new time series data, or the time series data behavior can be moved smoothly through different sliding windows, and then the characteristics of the newly generated time series data are extracted; the autocorrelation features belong to features of the frequency dimension; extreme value characteristics, point location characteristics, and the like belong to spatial dimensions.
And step S3: acquiring a trend index of historical time series data according to the data characteristics;
the step S3 of obtaining the trend index of the historical time series data according to the data characteristics may specifically be: extracting trend characteristics of the data sequence, wherein the trend characteristics can comprise various types:
1. point location characteristics: as shown in fig. 4, for a set of time series data, the data points themselves are ordered, and when the value of a point is greater than or less than the values of all the points before it, it is determined to be the "forward most significant". All the 'forward maximum' and 'forward minimum' of a given data sequence are counted and trend features of the data sequence are mined therefrom.
2. Density characteristics: for data with an upward trend, the overall weight of the data sequence should have a backward tilt characteristic; for data with decreasing trends, the overall density of the data series should be characterized by forward dip. The overall mean value of the data sequence is mean, the mean value of n% of the data after the data sequence is npercentan, and the ratio of r = npercentan/mean; the trend index of the data is plotted against r as shown approximately in fig. 5.
3. Section characteristics: presume the trend characteristic according to the relation of numerical sequence and trend, for example, the more backward data, the higher the ratio of exceeding the median, indicate the upward trend; when the data amount is large, the larger the data value of the last section is, the more pronounced the upward tendency is.
4. Extreme value characteristics: as shown in fig. 6, the position where the most significant (maximum, minimum) of the data sequence occurs;
5. inflection point characteristics: the inflection point refers to a point at which the trend of the data sequence kept upward or downward changes, and the total inflection point information of the data sequence is counted, including the length and the amplitude of an inflection section.
After extracting the trend characteristics, inputting the trend characteristics into a trend characteristic judger to generate a trend index, wherein the size of the trend index is between [ -1,1 ].
The core logic for calculating the trend index is to convert qualitative trend characteristics into quantitative trend indexes through a decision tree, and finally integrate the trend indexes of all the characteristics to obtain the trend index of the time series data.
In this embodiment, the solving formula of the trend index T is as follows:
Figure BDA0003309570170000071
t is a trend index;
i is the serial number of the characteristic item;
W i is the weight of the feature item with the serial number i.
According to the decision formula, the key to solve the trend index is to obtain the weight sequence of all the feature items, which is called as a decision model.
In one embodiment, the method of solving the decision model may employ an unsupervised simple decision tree approach.
In another embodiment, the method of solving the decision model may employ a deep classification method by supervised neural networks.
In this process, a characteristic index is defined, which is a numerical value for quantifying how significant the metrological curve is in a certain characteristic, and the range of the numerical value is [0,1]. The characteristic index is used for quantifying the degree that a measured curve has certain characteristics, and the curve characteristics of time sequence data have analyzability and deducibility through the characteristic index, thereby laying a foundation for the automatic and intelligent trend prediction of mass data. For example, the position where the maximum value of the data is taken is the characteristic of the maximum value backward bias of the data curve, and the characteristic index I = the position where the maximum value is located/the total amount of data; the characteristic index is 0 when the maximum value occurs at the top of the sequence, and 1 when the maximum value occurs at the bottom of the sequence. Of course, the actual characteristic index function is more complex, and for better fitting the actual effect, a specific optimization function is selected, and the function is optimized and corrected, that is, I = f (I), where I is the original characteristic index and f is the optimization function.
As shown in fig. 8 to 12, commonly used optimization functions include exponential type, hyperbolic tangent type, piecewise function type, and the like; only curve forms are demonstrated in the figure, a mathematical formula corresponding to a specific optimization function needs to be finely adjusted according to actual conditions, and a hyperbolic tangent optimization function is taken as an example, and a reference formula is as follows:
Figure BDA0003309570170000081
Figure BDA0003309570170000082
where I is the initial index and I (I) is the actual curve of the optimization function, in this example, as shown in fig. 13.
After various types of trend characteristics are obtained, the weight of each trend characteristic in trend judgment is determined by combining the statistical characteristics of the data sequence, and finally the overall trend index of the data sequence is obtained. Any feature with quantitative analysis can be regarded as a feature item, and weight information of each feature item can be generated in the process of constructing a prediction scheme judgment and classification model.
Basic statistical characteristics:
1. mean bias amount = (mean-min)/(max-min);
2. median bias amount = (median-min)/(max-min);
3. fluctuation ratio = (maximum-minimum)/maximum;
each trend feature has a corresponding weight function:
weight = F (mean bias) + F (median bias) + F (fluctuation ratio);
the trend coefficient of the data series is a weighted average of the individual trend characteristic indices.
And step S4: and selecting a prediction algorithm according to the trend index and the data characteristics.
The selecting a prediction algorithm according to the trend index and the data characteristics comprises: and executing the data by a prediction algorithm judger according to the trend index and the data characteristics, and selecting a proper prediction algorithm. The selecting a prediction algorithm according to the trend index and the data characteristics comprises: judging stationarity through data characteristics, and selecting a preset prediction algorithm if the stationarity is a stationary sequence; and if the sequence is not stable, performing autocorrelation judgment, if the sequence accords with autocorrelation, selecting a preset prediction algorithm, and if the sequence does not accord with autocorrelation, selecting the preset prediction algorithm according to the trend index.
In the present embodiment, as shown in fig. 13, the selection of the prediction algorithm can be achieved by the following process:
step 1: acquiring a time sequence data sequence to be predicted;
step 2: if the sequence is empty, the prediction result returns 0; if the sequence length is less than 5, directly selecting a linear regression algorithm;
and step 3: when the sequence length is more than 5, performing stationarity judgment, if the sequence is a stationary sequence, selecting ARIMA, and if the requirement on the prediction longitude of the stationary sequence is not high, using a moving average algorithm;
and 4, step 4: non-stationary sequences, making an autocorrelation decision if the time series { X } t And { X } t+k The maximum value of the correlation coefficient between is greater than 0.8, and k is greater than 2; then, sequence decomposition is needed to be carried out firstly, then the trend component and the cyclic component are respectively predicted, and finally the prediction results are summed;
and 5: for sequences which do not accord with the autocorrelation judgment, when the trend index is greater than 0.8, selecting a linear regression algorithm, and for sequences of which the trend index is greater than or equal to 0.6 and less than or equal to 0.8, respectively applying a curve regression/growth rate prediction/exponential smoothing/polynomial regression algorithm, taking the data of the first 80% of the sequences as training data and the data of the last 20% as test data, and selecting an algorithm with the minimum MSE as a prediction algorithm;
step 6: for sequences with a trend index less than 0.6, a moving average algorithm is used.
Because the trend itself is the potential energy driving the data curve to move about, and therefore has inertia, under the action of inertia, the curve shape will exhibit a certain degree of stability, and this stability in turn supports the possibility and value of trend prediction. Based on the above demonstration, the embodiment is based on the fact that trend characteristics of a data sequence are fully mined, trend judgment, classification, prediction and correction are performed on time series data in different forms, and the method has the advantages of wide application range, strong self-adaptation, high prediction reliability and the like.
Example two
A data prediction method, the data prediction method comprising the steps of:
step S1: acquiring historical time sequence data;
here, the electronic device may be various types of devices having information processing capability, such as a mobile phone, a PDA (personal digital assistant), a navigator, a digital phone, a video phone, a smart watch, a smart band, a wearable device, a tablet computer, a kiosk, a PLC, a PC computer, a server, and the like.
The time-series data is time-series data, and the time-series data is a data sequence recorded in time series by the same uniform index. Therefore, the historical time series data in the embodiment of the present application may be time series data acquired before the current time, which is also called a data series.
In the embodiment of the application, the data in the actual environment can be continuously collected in real time to form a historical data pool. The historical time series data may be various index data generated along with time in an actual environment, such as index data of a CPU, a memory, a flow rate, and the like.
Step S2: judging the data characteristics of historical time sequence data;
the step S2 of determining the data characteristics of the historical time series data includes: and performing time series data curve feature extraction, and generating characteristic indexes of the curve from multiple dimensions. Defining curve characteristics refers to potential information to be mined, which is contained in the form of curves by time series data, each characteristic is called a characteristic item of the curve, and a set of the characteristic items is called a characteristic pool. When feature extraction is performed on time series data, three dimensions can be used: time dimension, frequency dimension, space dimension. The data characteristics may include: trend features, statistical features, autocorrelation features, and the like. Wherein, the statistical characteristics are as follows: carrying out statistical analysis on given time sequence data to obtain statistical information such as a maximum value, a minimum value, a mean value, a median, a variance and the like; the autocorrelation characteristics are: and analyzing the autocorrelation of the time series data, and if the autocorrelation is obvious, solving the maximum data point distance of the autocorrelation system. For example, in the time dimension, a group of time series data can be clustered by different time units to generate a plurality of groups of new time series data, or the time series data behavior can be moved smoothly through different sliding windows, and then the characteristics of the newly generated time series data are extracted; the autocorrelation features belong to features in the frequency dimension; extreme value characteristics, point location characteristics, and the like belong to spatial dimensions.
And step S3: acquiring a trend index of historical time series data according to the data characteristics;
the step S3 of obtaining the trend index of the historical time series data according to the data characteristics may specifically be: extracting trend characteristics of the data sequence, wherein the trend characteristics can comprise various types:
1. point location characteristics: as shown in fig. 4, for a set of time series data, the data points themselves are ordered, and when the value of a point is greater than or less than the values of all the points before it, it is determined to be the "forward most significant". All the 'forward maximum' and 'forward minimum' of a given data sequence are counted and trend features of the data sequence are mined.
2. Density characteristics: for data with upward trend, the overall weight of the data sequence should have a backward-leaning characteristic; for data with decreasing trends, the overall density of the data series should be characterized by forward dip. The overall mean value of the data sequence is mean, the mean value of n% of the data after the data sequence is npercentan, and the ratio of r = npercentan/mean; the trend index of the data is plotted against r as shown approximately in fig. 5.
3. Section characteristics: presume the trend characteristic according to the relation of numerical sequence and trend, for example, the more backward data, the higher the ratio of exceeding the median, indicate the upward trend; when the data amount is large, the larger the data value of the last section is, the more pronounced the upward tendency is.
4. Extreme value characteristics: as shown in fig. 6, the position where the most significant (maximum, minimum) of the data sequence occurs;
5. inflection point characteristics: the inflection point refers to a point at which the trend that the data sequence keeps going upward or downward changes, and all inflection point information of the data sequence is counted, including the length and the amplitude of an inflection section.
After extracting the trend characteristics, inputting the trend characteristics into a trend characteristic judger to generate a trend index, wherein the size of the trend index is between [ -1,1 ].
The core logic for calculating the trend index is to convert qualitative trend characteristics into quantitative trend indexes through a decision tree, and finally integrate the trend indexes of all the characteristics to obtain the trend index of the time series data.
In this embodiment, the solving formula of the trend index T is as follows:
Figure BDA0003309570170000111
t is a trend index;
i is the serial number of the characteristic item;
W i is the weight of the feature item with the serial number i.
According to the decision formula, the key to solve the trend index is to obtain the weight sequence of all the feature items, which is called as a decision model.
In one embodiment, the method of solving the decision model may employ an unsupervised simple decision tree approach.
In another embodiment, the method of solving the decision model may employ a deep classification method by supervised neural networks.
In this process, a characteristic index is defined, which is a numerical value for quantifying how significant the metrological curve is in a certain characteristic, and the range of the numerical value is [0,1]. The characteristic index is used for quantifying the degree that a measured curve has certain characteristics, and the curve characteristics of time sequence data have analyzability and deducibility through the characteristic index, thereby laying a foundation for the automatic and intelligent trend prediction of mass data. For example, the position where the maximum value of the data is taken is the characteristic of the maximum value backward bias of the data curve, and the characteristic index I = the position where the maximum value is located/the total amount of data; the characteristic index is 0 when the maximum value occurs at the top of the sequence, and 1 when the maximum value occurs at the bottom of the sequence. Of course, the actual characteristic index function is more complex, and for better fitting the actual effect, a specific optimization function is selected, and the function is optimized and corrected, that is, I = f (I), where I is the original characteristic index and f is the optimization function.
As shown in fig. 8 to 12, commonly used optimization functions include exponential type, hyperbolic tangent type, piecewise function type, and the like; only curve forms are demonstrated in the figure, a mathematical formula corresponding to a specific optimization function needs to be finely adjusted according to actual conditions, and a hyperbolic tangent optimization function is taken as an example, and a reference formula is as follows:
Figure BDA0003309570170000121
Figure BDA0003309570170000122
where I is the initial index and I (I) is the actual curve of the optimization function, in this example, as shown in fig. 13.
After various types of trend characteristics are obtained, the weight of each trend characteristic in trend judgment is determined by combining the statistical characteristics of the data sequence, and finally the overall trend index of the data sequence is obtained. Any feature with quantitative analysis can be regarded as a feature item, and weight information of each feature item is generated in the process of constructing a prediction scheme judgment and classification model.
Basic statistical characteristics:
1. mean bias amount = (mean-min)/(max-min);
2. median bias amount = (median-min)/(max-min);
3. fluctuation ratio = (maximum-minimum)/maximum;
each trend feature has a corresponding weight function:
weight = F (mean bias) + F (median bias) + F (fluctuation ratio);
the trend coefficient of the data series is a weighted average of the individual trend characteristic indices.
And step S4: and selecting a prediction algorithm according to the trend index and the data characteristics.
The selecting a prediction algorithm according to the trend index and the data characteristics comprises: and executing the data by a prediction algorithm judger according to the trend index and the data characteristics, and selecting a proper prediction algorithm. The selecting a prediction algorithm according to the trend index and the data characteristics comprises: judging stationarity through data characteristics, and selecting a preset prediction algorithm if the stationarity is a stationary sequence; and if the sequence is not stable, performing autocorrelation judgment, if the sequence accords with autocorrelation, selecting a preset prediction algorithm, and if the sequence does not accord with autocorrelation, selecting the preset prediction algorithm according to the trend index.
Step S5: and performing prediction according to the selected prediction algorithm.
The step S5 of predicting according to the selected prediction algorithm may specifically be: predicting the prediction data in the time period needing prediction by using the selected prediction algorithm;
in practical applications, the predicting according to the selected prediction algorithm includes: and respectively predicting the same data sequence from different dimensions of short, medium and long lengths, and performing correction regression on the short-term prediction result by using the prediction value with higher reliability in the medium and long periods to obtain a more reasonable prediction result.
For example, if the user specifies to obtain the predicted data within the future 5 months, the predicted data within the future 5 months is determined according to the historical time sequence data by using a native prediction algorithm.
In this embodiment, as shown in fig. 13, the data prediction method is implemented by the following processes:
step 1: acquiring a time sequence data sequence to be predicted;
step 2: if the sequence is empty, the prediction result returns 0; if the sequence length is less than 5, directly selecting a linear regression algorithm, calculating a linear equation coefficient through least square, and then predicting;
and step 3: when the sequence length is greater than 5, stationarity judgment is carried out, if the sequence is a stationary sequence, ARIMA is selected, and if the requirement on the prediction longitude of the stationary sequence is not high, a moving average algorithm can be used;
and 4, step 4: non-stationary sequences, making an autocorrelation decision if the time series { X } t And { X } t+k The maximum value of the correlation coefficient between is greater than 0.8, and k is greater than 2; then, sequence decomposition is needed to be carried out firstly, then the trend component and the cycle component are respectively predicted, and finally the prediction results are summed;
and 5: for sequences which do not accord with the autocorrelation judgment, when the trend index is greater than 0.8, selecting a linear regression algorithm, respectively applying a curve regression/growth rate prediction/exponential smoothing/polynomial regression algorithm to the sequences with the trend index greater than or equal to 0.6 and less than or equal to 0.8, taking the data of the first 80% of the sequences as training data and the data of the last 20% as test data, and selecting an algorithm with the minimum MSE as a prediction algorithm to perform sequence prediction;
step 6: for sequences with a trend index less than 0.6, a moving average algorithm is applied for prediction.
An apparatus for data prediction, the apparatus comprising:
an acquisition unit configured to acquire historical time series data;
the data characteristic acquisition unit is used for judging the data characteristics of the historical time sequence data;
the trend index judging unit is used for obtaining a trend index of historical time sequence data according to the data characteristics;
the prediction algorithm judging unit is used for selecting a prediction algorithm according to the trend index and the data characteristics;
and the prediction unit is used for predicting according to the selected prediction algorithm.
The implementation of the invention has the advantages that: the data prediction algorithm selection method, the data prediction method and the data prediction device of the invention carry out non-parameter trend judgment and prediction on data sequences of various forms by acquiring historical time sequence data, judging data characteristics of the historical time sequence data, acquiring trend indexes of the historical time sequence data according to the data characteristics, selecting a prediction algorithm according to the trend indexes and the data characteristics, and carrying out prediction according to the selected prediction algorithm, measure the reliability of prediction results through confidence degrees, and carry out correction regression on short-term prediction results with low confidence degrees by using long-term prediction results with high confidence degrees; time series data suitable for any shape and any data quantity (length > = 1); the prediction degree (degree) X is to obtain the next X prediction values through a given data sequence; for example, the original data is the host CPU utilization in the last 30 days, and when the prediction degree is 7, the prediction algorithm returns the CPU utilization in the next seven days as the prediction result; and selecting a proper prediction algorithm to predict, wherein the reliability of a prediction result is reduced along with the increase of the prediction degree. And respectively predicting the same data sequence from different dimensions of short (day), middle (week) and long (month), and then correcting and regressing the short-term prediction result by utilizing the prediction value with higher reliability in the middle and long term so as to obtain a more reasonable prediction result.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention disclosed herein should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (12)

1. A method for selecting a data prediction algorithm, the method comprising the steps of:
acquiring historical time sequence data;
judging the data characteristics of historical time sequence data;
obtaining a trend index of historical time series data according to the data characteristics;
and selecting a prediction algorithm according to the trend index and the data characteristics.
2. The method of claim 1, wherein said determining data characteristics of historical time series data comprises: and (4) performing time series data curve feature extraction, and generating a characteristic index of the curve from multiple dimensions.
3. The method of claim 2, wherein the characteristic index is a numerical value for quantitatively measuring the degree of the curve's saliency in a characteristic, and the range is [0,1].
4. The data prediction algorithm selection method of claim 2, wherein the performing time series data curve feature extraction, generating a feature index of a curve from a plurality of dimensions comprises: and extracting trend characteristics of the data sequence, wherein the trend characteristics can comprise various types.
5. The data prediction algorithm selection method of claim 4, wherein the obtaining a trend index for historical time series data from data features comprises: after the trend characteristics are extracted, the qualitative trend characteristics are converted into quantitative trend indexes through the decision tree so as to quantitatively measure the trend degree of the data sequence.
6. The data prediction algorithm selection method of claim 4, wherein the obtaining a trend index for historical time series data from data features comprises: after various types of trend characteristics are obtained, the weight of each trend characteristic in trend judgment is determined by combining the statistical characteristics of the data sequence, and finally the overall trend index of the data sequence is calculated.
7. The method of claim 6, wherein the statistical features include mean bias, median bias, and fluctuation fraction; each trend feature has a corresponding weight function: weight = F (mean bias) + F (median bias) + F (fluctuation ratio); the trend index of the data series is a weighted average of the individual trend characteristic indices.
8. The data prediction algorithm selection method of any of claims 1 to 7, wherein selecting a prediction algorithm based on the trend index and the data characteristic comprises: and executing the data by a prediction algorithm judger according to the trend index and the data characteristics, and selecting a proper prediction algorithm.
9. The method of claim 8, wherein selecting a prediction algorithm based on the trend index and the data characteristic comprises: judging stationarity through data characteristics, and selecting a preset prediction algorithm if the stationarity is a stationary sequence; and if the sequence is not stable, performing autocorrelation judgment, if the sequence accords with autocorrelation, selecting a preset prediction algorithm, and if the sequence does not accord with autocorrelation, selecting the preset prediction algorithm according to the trend index.
10. A data prediction method, characterized in that the data prediction method comprises the steps of:
acquiring historical time sequence data;
judging the data characteristics of historical time sequence data;
acquiring a trend index of historical time series data according to the data characteristics;
selecting a prediction algorithm according to the trend index and the data characteristics;
and performing prediction according to the selected prediction algorithm.
11. The data prediction method of claim 10, wherein the predicting according to the selected prediction algorithm comprises: and respectively predicting the same data sequence from different dimensions of short, medium and long lengths, and performing correction regression on the short-term prediction result by using the prediction value with higher reliability in the medium and long periods to obtain a more reasonable prediction result.
12. A data prediction apparatus, characterized in that the apparatus comprises:
an acquisition unit configured to acquire historical time series data;
the data characteristic acquisition unit is used for judging the data characteristics of the historical time sequence data;
the trend index judging unit is used for obtaining a trend index of historical time sequence data according to the data characteristics;
the prediction algorithm judging unit is used for selecting a prediction algorithm according to the trend index and the data characteristics;
and the prediction unit is used for predicting according to the selected prediction algorithm.
CN202111213110.2A 2021-08-07 2021-10-19 Data prediction algorithm selection method, data prediction method and data prediction device Pending CN115705411A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110904957 2021-08-07
CN2021109049579 2021-08-07

Publications (1)

Publication Number Publication Date
CN115705411A true CN115705411A (en) 2023-02-17

Family

ID=85180595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111213110.2A Pending CN115705411A (en) 2021-08-07 2021-10-19 Data prediction algorithm selection method, data prediction method and data prediction device

Country Status (1)

Country Link
CN (1) CN115705411A (en)

Similar Documents

Publication Publication Date Title
US11334813B2 (en) Method and apparatus for managing machine learning process
CN102056182B (en) Method for predicting mobile traffic based on LS-SVM
CN109587713A (en) A kind of network index prediction technique, device and storage medium based on ARIMA model
CN107506868B (en) Method and device for predicting short-time power load
CN114330935B (en) New energy power prediction method and system based on multiple combination strategies integrated learning
CN111160626B (en) Power load time sequence control method based on decomposition fusion
CN104517020A (en) Characteristic extraction method and device used for cause and effect analysis
CN115225520B (en) Multi-mode network flow prediction method and device based on meta-learning framework
CN116307215A (en) Load prediction method, device, equipment and storage medium of power system
CN111311001B (en) Bi-LSTM network short-term load prediction method based on DBSCAN algorithm and feature selection
CN116244069A (en) Capacity expansion and contraction method and device, electronic equipment and readable storage medium
CN116691418B (en) Charging method capable of automatically distributing control power
CN116894687A (en) Power consumption analysis method and system based on machine learning and electronic equipment
CN108256693A (en) A kind of photovoltaic power generation power prediction method, apparatus and system
CN115705411A (en) Data prediction algorithm selection method, data prediction method and data prediction device
CN110956318A (en) Method and device for predicting based on pre-constructed prediction model
CN112667394B (en) Computer resource utilization rate optimization method
Viana et al. Load forecasting benchmark for smart meter data
CN112819256A (en) Convolution time sequence room price prediction method based on attention mechanism
CN113011674A (en) Photovoltaic power generation prediction method and device, electronic equipment and storage medium
CN113837782A (en) Method and device for optimizing periodic item parameters of time series model and computer equipment
CN111027680A (en) Monitoring quantity uncertainty prediction method and system based on variational self-encoder
CN116777452B (en) Prepayment system and method for intelligent ammeter
CN117175584A (en) Controllable load prediction method, device and equipment based on power data
Ma et al. Short-Term Household Load Forecasting Based on Attention Mechanism and CNN-ICPSO-LSTM.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination