Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Before describing the solution provided in the present specification, the inventive concept of the present solution will be described.
As described in the background, the conventional time series prediction method has the following two problems: first, a large amount of historical data of precipitation indicators is required. Second, it is not compatible with a scenario where the index is momentarily raised or lowered.
Regarding the first problem, the present scheme attempts to use history data (hereinafter referred to as history sequence) of the correlation index. The related index herein may refer to an index of a service similar to the service to which the current index belongs. Because the index development trend of similar service often has certain consistency, the scheme can be combined with the history data of related indexes to predict the time sequence of the service index.
Regarding the second problem, an activity sequence (also referred to as an activity component) for reflecting sequence fluctuations caused by traffic activity related to the metrics may be added. In particular, the present description may fit the activity sequence by presetting an activity morphology fitting function. Taking the example of the induced sequence fluctuation as a transient rise, there are two cases after a transient rise in time series: first, the raised standing horse falls back. Second, the rise continues for a period of time before falling back. For the first case, the fitting may be by an exponential decay function. For the second case, the fit may be by a rectangular decay function.
The foregoing is the inventive concept of the present invention, and the embodiments provided in the present specification can be obtained based on the inventive concept, and the embodiments are described in detail below.
Fig. 1 is a schematic diagram of a time-series prediction system of a business index provided in the present specification. As shown in fig. 1, the system may include: a historical sequence acquisition module 102, a real-time sequence generation module 104, a decomposition sequence calculation module 106, and a sequence prediction module 108. The history sequence obtaining module 102 is configured to obtain a service index and a history sequence of related indexes of the service index from a preset history sequence library. The real-time sequence generating module 104 is configured to generate a real-time sequence of the traffic indicator. The real-time sequence includes index values for a plurality of past times of the business index within the current evaluation period. The decomposition sequence calculating module 106 is configured to determine, for a plurality of iterations, a period sequence (also referred to as a period component), a trend sequence (also referred to as a trend component), and an activity sequence corresponding to the current evaluation period. The sequence prediction module 108 is configured to predict the total time sequence of the traffic indicator based on the periodic sequence, the trend sequence, and the activity sequence determined in the last iteration.
The system may further include a preprocessing module 110 for preprocessing the real-time sequence and the history sequence. The process of pretreatment is described later.
Fig. 2 is a flowchart of a method for predicting a time sequence of a traffic indicator according to an embodiment of the present disclosure. The subject of execution of the method may be a device with processing capabilities: the server or system or apparatus, for example, may be the time series prediction system of fig. 1, or the like. As shown in fig. 2, the method specifically may include:
step 202, obtaining a real-time sequence of business indexes.
For example, a real-time sequence of business indicators may be obtained by the real-time sequence generation module 104.
The real-time sequence may include index values for a business index at a plurality of past times within a current evaluation period. The traffic index herein may refer to data that is constantly changing over time. Which may be, for example, transaction amount, access amount, etc. The evaluation period has a specified duration, which may be, for example, one day, one month, one year, or the like. Taking the duration of the evaluation period as one day as an example, the current evaluation period may be the same day, and the plurality of past times in the current evaluation period may refer to each minute from 0 a.m. to the present time of the day.
Step 204, a plurality of history sequences is obtained.
For example, a plurality of history sequences may be acquired by the history sequence acquisition module 102.
Wherein each history sequence includes index values of business indexes or related indexes at a plurality of history moments in a history evaluation period. As described above, the related index herein refers to an index of a service similar to the service to which the above-described service index belongs. Taking the current evaluation period as the day as an example, the one history evaluation period may be the past day, and the plurality of history moments in the one history evaluation period may be from 0 to 24 a.m. every minute of the past day.
In one implementation, a plurality of history sequences may be obtained from a predefined history sequence library. The predefined history sequence library may store a history sequence of the business index and a plurality of history sequences corresponding to a plurality of related indexes of the business index.
Step 206, determining a historical trend sequence within the evaluation period duration based on the plurality of historical sequences.
In one example, the real-time sequence and the historical sequence may be preprocessed by the preprocessing module 110 prior to performing the determining step. The pretreatment herein may include any one or more of the following: missing value complementation, wavelet transformation (wavelet transform), normalization, sequence screening, and the like. The deficiency value complement is used for complementing the deficiency index value at a certain moment in the real-time sequence or the historical sequence by an interpolation method. Wavelet transformation is used to remove spikes of real-time sequences or historical sequences. Normalization is used to transform each index value in the real-time sequence or the historical sequence into a value within a specified range (e.g., [0,1 ]). Sequence screening methods for screening related sequences from historical sequences by slicing or calculating pearson coefficients.
The wavelet transform and normalization processing will be described below with reference to fig. 3. In fig. 3, the left 5 curves are the 5 history sequences obtained initially. After wavelet transformation and normalization of the 5 history sequences, the right 5 curves can be obtained. As can be seen from FIG. 3, the index values included in each curve on the right side are all within the range of [0,1 ].
Returning to fig. 2, in step 206, the step of determining the historical trend sequence during the evaluation period based on the plurality of historical sequences may specifically be: distances between the real-time sequence and each of the historical sequences are calculated separately. And clustering the plurality of historical sequences by adopting a clustering algorithm (e.g. a dbscan algorithm) according to the calculated distance to obtain a plurality of sequence clusters, and determining the sequence cluster to which the real-time sequence belongs from the plurality of sequence clusters. And determining a historical trend sequence in the evaluation period duration based on the historical sequences in the determined sequence clusters.
It will be appreciated that when the real-time sequence and the history sequence are also preprocessed, the real-time sequence mentioned in the above steps is a preprocessed real-time sequence, and furthermore, the history sequence mentioned is a preprocessed history sequence.
The above-described process of determining the historical trend sequence is described below in conjunction with fig. 4. In fig. 4, the uppermost curves are a plurality of history sequences stored in the history sequence library. The next upper curves are the preprocessed historical sequences. Shown later is a plurality of clusters of sequences obtained after clustering a plurality of historical sequences based on distance from the real-time sequences. Finally, the historical sequences corresponding to the different sequence clusters are shown. Wherein, the history sequences corresponding to different sequence clusters are composed of the history sequences of the business indexes and/or the related indexes in the sequence clusters. The historical trend sequence in the evaluation period duration can be obtained by extracting the seasonal trend of the corresponding historical sequence of the sequence cluster to which the real-time sequence belongs.
Returning again to fig. 2, the method shown in fig. 2 may further comprise the steps of:
step 208, obtaining a plurality of decomposition sequences corresponding to the current evaluation period based on the historical trend sequence and the real-time sequence.
The plurality of decomposition sequences here comprises at least an activity sequence, which is determined using a predetermined activity morphology fitting function. The sequence of activities is used to reflect sequence fluctuations caused by the business activity associated with the business indicia.
In one example, the plurality of decomposition sequences are determined by a plurality of iterations.
Taking the ith iteration as an example, the process of calculating the periodic sequence, the active sequence and the trend sequence corresponding to the ith iteration may be:
and a, acquiring a trend sequence corresponding to the i-1 th iteration.
And b, executing trending operation on the real-time sequence based on the trending sequence corresponding to the i-1 th iteration to obtain a trending sequence.
For example, the detrending sequence may be obtained by the following formula.
DT t i =Y t -T t i-1 (equation 1)
Wherein DT is t i To remove trend sequence, Y t T is real-time sequence t i-1 And (5) the trend sequence corresponding to the i-1 th iteration.
And c, determining a periodic sequence corresponding to the ith iteration by adopting a nonlinear regression algorithm based on the historical trend sequence and the trending sequence.
The nonlinear regression algorithm herein may include, but is not limited to, a K-Nearest Neighbor (KNN) algorithm, an xgBoost algorithm, and the like.
Taking the KNN algorithm as an example, the above steps of determining the periodic sequence may be expressed as: s is S t i =KNN(DT t i Seal), wherein S t i For the periodic sequence corresponding to the ith iteration, DT t i For trending sequences, seal is a historical trending sequence.
And d, performing a decycling operation on the decubitus sequence based on the periodic sequence corresponding to the ith iteration to obtain a decycling sequence.
For example, the decycling sequence may be obtained by the following formula.
DS t i =DT t i -S t i (equation 2)
Wherein DS is t i For the decycling sequence DT t i To remove trend sequence, S t i Is the periodic sequence corresponding to the ith iteration.
And e, determining an active sequence corresponding to the current iteration by using a preset active form fitting function based on the decycling sequence.
In one implementation, multiple sub-sequences may be preset, where each sub-sequence corresponds to an active morphology fit function. And then selecting a subsequence from the plurality of subsequences by combining the real-time sequences, and fitting an active sequence corresponding to the ith iteration based on the selected subsequence and a corresponding active form fitting function.
In another implementation, the process of determining the activity sequence may be: based on an anomaly detection algorithm, anomaly detection is carried out on the decycled sequence, and an anomaly subsequence is intercepted. And fitting the decycled sequence by using at least one active form fitting function based on the abnormal subsequence to obtain at least one fitting result. And screening at least one fitting result based on the fitting goodness corresponding to each fitting result, and determining an active sequence corresponding to the ith iteration based on the screened fitting result.
In one example, the anomaly detection algorithm may be an ESD test (Extreme Studentized Deviate Test) algorithm.
When the anomaly detection algorithm is an ESD test algorithm, the above step of anomaly detecting the decycled sequence may be expressed as: subDS t i =ESD test(DS t i ) Wherein, subsDS t i DS for truncated abnormal subsequences t i Is a decycling sequence.
Furthermore, the at least one active morphology fitting function may comprise at least one of an exponential decay function and a rectangular decay function. Wherein the exponential decay function can be expressed as: y=a+be -cx Wherein a, b and c are all greater than 0. The rectangular decay function can be expressed as: y=a-be cx Wherein a, b and c are all greater than 0. Here, the function curves corresponding to the index decay function and the rectangular decay function may be shown in fig. 5a and 5b, respectively.
It should be noted that, the step of determining the activity sequence may be expressed as: a is that t i =actfit(subDS t i ) Wherein A is t i For the active sequence corresponding to the ith iteration, actfit is an active form fitting function, and subDS t i Is an truncated abnormal subsequence.
It should be understood that the active form fitting functions described above in this specification are all set for time series where there is a tendency for instantaneous rise. In practical applications, the corresponding activity shape fitting function may be set for the trend of temporal sequence to be instantaneously reduced, which is not limited in this specification.
And f, based on the active sequence corresponding to the ith iteration, performing the deactivation operation on the cycle-removed sequence to obtain a deactivation sequence.
For example, the deactivation sequence may be obtained by the following formula.
DA t i =DS t i -A t i (equation 3)
Wherein DA is t i For deactivating sequences, DS t i For the decycling sequence A t i The active sequence corresponding to the ith iteration.
Step g, adopting a trend smoothing algorithm to carry out trend smoothing on the deactivation sequence so as to obtain a trend sequence corresponding to the ith iteration, wherein the trend sequence is specifically expressed as follows: t (T) t i 。
Of course, in practical applications, the execution sequence of the above-mentioned trending operation, the cycle removing operation and the deactivation operation may also be adjusted, for example, the execution sequence may also be adjusted as follows: trending operation, deactivation operation, cycle removal operation, and the like, which are not limited in this specification.
After the completion of step g, it may be based on S t i 、A t i T is as follows t i Determining a total time sequence corresponding to the ith iteration (S t i +A t i +T t i ) And judging whether the goodness of fit of the total time sequence to the real-time sequence is larger than N, and ending if so. Otherwise, the i+1th iteration is performed.
It can be seen from the above steps that, in the current iteration process, the trend sequence obtained in the previous iteration is referred to, so that each decomposition sequence calculated in the iteration process is more accurate, that is, each decomposition sequence can be more and more accurate by executing the above steps iteratively.
In addition, steps 206 and 208 described above may be performed by the disaggregation sequence calculation module 106.
Step 210, determining a total time sequence corresponding to the current evaluation period based on the plurality of decomposition sequences.
The total time series is used to predict an index value for a future time instant within the current evaluation period.
For example, the total time sequence corresponding to the current evaluation period may be determined by the sequence prediction module 108 based on the period sequence, the activity sequence, and the trend sequence corresponding to the last iteration.
Also taking the current evaluation period as the current day as an example, the future time within the current evaluation period may refer to each minute from the present time to 24 points on the current day.
The above-described step of determining the total time series may be expressed specifically as the following formula.
predict=S t k +A t k +T t k (equation 4)
Wherein the prediction is the total time sequence, S t k 、A t k T is as follows t k The method comprises a periodic sequence, an active sequence and a trend sequence which correspond to the last iteration respectively.
It will be appreciated that the total time series determined in this step will include both an index value for the business index at the past time and an index value for the future time within the current evaluation period. In one example, the total time series and the decomposition sequences may be as shown in fig. 6. In fig. 6, the total time series and the cycle series, the activity series, and the trend series constituting the total time series are shown from top to bottom, respectively.
It should be noted that, when determining the total time sequence, by adding the active sequence, the scheme can be compatible with the situation that the time sequence is raised or lowered instantaneously, so that the robustness of the time sequence prediction method provided by the embodiment can be greatly improved.
Finally, it should also be noted that the present solution may also set a detection rule so as to detect the determined total time sequence. The detection rules here may be, for example: the card issuing prediction super-inventory detection, index value fluctuation detection and the like.
In summary, the time sequence prediction method provided in the embodiments of the present disclosure may predict a time sequence of a service indicator based on a history sequence of related indicators of the service indicator, so as to solve a problem in the conventional art that a large amount of history data of the service indicator needs to be deposited. In addition, the scheme also determines the active sequence at the same time, so that the situation that the time sequence is instantaneously increased or decreased can be adapted.
The above is an overall description of the time series prediction method, and the reason why the nonlinear regression is adopted when determining the periodic sequence is described in addition to the above.
In fact, in addition to the nonlinear regression algorithm, the inventors have tried the linear regression algorithm such as RANSAC. The fitting result based on the linear regression algorithm and the nonlinear regression algorithm may be as shown in fig. 7. In fig. 7, curve 1 is a real sequence, curve 2 is a fitting result of a nonlinear regression algorithm, and curve 3 is a fitting result of a linear regression algorithm. As can be seen from fig. 7, the fitting effect on the real sequence based on the nonlinear regression algorithm is better. Thus, the present scheme selects a nonlinear regression algorithm.
The applicant considers that the difference between the fitting effects of the two algorithms is large because: the trends of different businesses in the daytime and in the morning tend to be different, so that based on the data in the morning, the data in the quasi-whole day is difficult to predict by adopting a linear regression mode.
Corresponding to the above method for predicting time series of service indicators, an embodiment of the present disclosure further provides a device for predicting time series of service indicators, as shown in fig. 8, where the device may include:
an acquisition unit 802 for acquiring a real-time sequence of traffic indicators, the real-time sequence comprising index values of the traffic indicators at a plurality of past moments in the current evaluation period.
The obtaining unit 802 is further configured to obtain a plurality of history sequences, where each history sequence includes index values of a business index or a related index at a plurality of history moments in a history evaluation period.
A determining unit 804, configured to determine a historical trend sequence within the evaluation period duration based on the plurality of historical sequences acquired by the acquiring unit 802.
The determining unit 804 may specifically be configured to:
distances between the real-time sequence and each of the historical sequences are calculated separately.
And clustering the plurality of historical sequences by adopting a clustering algorithm according to the calculated distance to obtain a plurality of sequence clusters, and determining the sequence cluster to which the real-time sequence belongs from the plurality of sequence clusters.
And determining a historical trend sequence in the evaluation period duration based on the historical sequences in the determined sequence clusters.
The determining unit 804 may be further specifically configured to:
the real-time sequences and the history sequences are preprocessed.
And respectively calculating the distance between the preprocessed real-time sequence and each preprocessed historical sequence.
The pretreatment may include one or more of the following: deletion value complementation, wavelet transformation, normalization and sequence screening.
The obtaining unit 802 is further configured to obtain a plurality of decomposition sequences corresponding to the current evaluation period based on the historical trend sequence and the real-time sequence determined by the determining unit 804. The plurality of decomposition sequences include at least an activity sequence determined using a predetermined activity morphology fitting function, the activity sequence being configured to reflect a sequence fluctuation caused by a business activity associated with a business metric.
The determining unit 804 is further configured to determine a total time sequence corresponding to the current evaluation period based on the plurality of decomposition sequences acquired by the acquiring unit 802, where the total time sequence is used to predict the index value of the future time within the current evaluation period.
Alternatively, the process may be carried out in a single-stage, the plurality of decomposition sequences may further include: trend sequence and periodic sequence.
The acquisition unit 802 may specifically be configured to:
the following steps are performed in multiple iterations:
and acquiring a trend sequence corresponding to the previous iteration.
And executing trending operation on the real-time sequence based on the trending sequence corresponding to the previous iteration to obtain a trending sequence.
Based on the historical trend sequence and the trending sequence, a nonlinear regression algorithm is adopted to determine a periodic sequence corresponding to the current iteration.
Here, the nonlinear regression algorithm includes any one of the following: k nearest neighbor KNN algorithm and xgboost algorithm.
And performing a decycling operation on the decubitus sequence based on the periodic sequence corresponding to the current iteration to obtain the decycling sequence.
And determining an active sequence corresponding to the current iteration by using a preset active form fitting function based on the decycling sequence.
And performing an inactivity operation on the deactivation sequence based on the activity sequence corresponding to the current iteration to obtain the deactivation sequence.
And adopting a trend smoothing algorithm to carry out trend smoothing on the deactivation sequence so as to obtain a trend sequence corresponding to the current iteration.
The acquisition unit 802 may also be specifically configured to:
based on an anomaly detection algorithm, anomaly detection is carried out on the decycled sequence, and an anomaly subsequence is intercepted.
And fitting the decycled sequence by using at least one active form fitting function based on the abnormal subsequence to obtain at least one fitting result.
And screening at least one fitting result based on the fitting goodness corresponding to each fitting result, and determining an active sequence corresponding to the current iteration based on the screened fitting result.
Wherein the at least one active morphology fitting function may comprise at least one of an exponential decay function and a rectangular decay function.
The functions of the functional modules of the apparatus in the foregoing embodiments of the present disclosure may be implemented by the steps of the foregoing method embodiments, so that the specific working process of the apparatus provided in one embodiment of the present disclosure is not repeated herein.
In the time-series prediction apparatus for a traffic index provided in one embodiment of the present specification, the acquisition unit 802 acquires a real-time series of traffic indexes including index values of the traffic indexes at a plurality of past times in the current evaluation period. The acquisition unit 802 acquires a plurality of history sequences each including index values of a business index or a related index at a plurality of history times within one history evaluation period. The determination unit 804 determines a historical trend sequence within the evaluation period duration based on the plurality of historical sequences. The acquisition unit 802 obtains a plurality of decomposition sequences corresponding to the current evaluation period based on the historical trend sequence and the real-time sequence. The plurality of decomposition sequences include at least an activity sequence determined using a predetermined activity morphology fitting function, the activity sequence being configured to reflect a sequence fluctuation caused by a business activity associated with a business metric. The determination unit 804 determines a total time series corresponding to the current evaluation period based on the plurality of decomposition sequences, the total time series being used to predict an index value at a future time within the current evaluation period. Thus, the time series of the index can be accurately predicted.
Correspondingly to the above method for predicting the time sequence of the service index, the embodiment of the present disclosure further provides a device for predicting the time sequence of the service index, as shown in fig. 9, where the device may include: memory 902, one or more processors 904, and one or more programs. Wherein the one or more programs are stored in the memory 902 and configured to be executed by the one or more processors 904, the programs when executed by the processors 904 performing the steps of:
a real-time sequence of business indicators is obtained, the real-time sequence comprising index values of the business indicators at a plurality of past moments in a current evaluation period.
A plurality of history sequences are acquired, each history sequence comprising index values of business indexes or related indexes at a plurality of history moments in a history evaluation period.
A historical trend sequence is determined over an evaluation period duration based on the plurality of historical sequences.
Based on the historical trend sequence and the real-time sequence, a plurality of decomposition sequences corresponding to the current evaluation period are obtained. The plurality of decomposition sequences at least comprises an activity sequence, wherein the activity sequence is determined by a preset activity form fitting function and is used for reflecting sequence fluctuation caused by the business activity related to the business index.
Based on the plurality of decomposition sequences, a total time sequence corresponding to the current evaluation period is determined. The total time series is used to predict an index value for a future time instant within the current evaluation period.
The time sequence prediction device for the business index provided by the embodiment of the specification can accurately predict the time sequence of the index.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a server. The processor and the storage medium may reside as discrete components in a server.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing detailed description of the embodiments has further described the objects, technical solutions and advantages of the present specification, and it should be understood that the foregoing description is only a detailed description of the embodiments of the present specification, and is not intended to limit the scope of the present specification, but any modifications, equivalents, improvements, etc. made on the basis of the technical solutions of the present specification should be included in the scope of the present specification.