CN109754115B

CN109754115B - Data prediction method and device, storage medium and electronic equipment

Info

Publication number: CN109754115B
Application number: CN201811475791.8A
Authority: CN
Inventors: 孙木鑫
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2021-03-26
Anticipated expiration: 2038-12-04
Also published as: CN109754115A

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for data prediction, which can predict a plurality of historical time-series data by obtaining the plurality of historical time-series data; converting the historical time sequence data into a first time sequence data vector set according to the acquisition time corresponding to the historical time sequence data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.

Description

Data prediction method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of data prediction, and in particular, to a method, an apparatus, a storage medium, and an electronic device for data prediction.

Background

The time series prediction technology is to predict the development trend of data to guide the solution of practical problems based on the ordered data associated with the time series, and nowadays, the prediction of the time series data plays an extremely important role in different industries, for example, the banking industry is used for predicting the change of daily transaction amount; the exchange is used for predicting the stock price change rule of the stock market; and detecting the future trend of key indexes such as CPU, memory, HTTP response time and the like of the application system.

However, with the rapid development of computer software technology, data measurement models are larger and larger, and the complexity of time series data is higher and higher, so that the regularity of data change is difficult to mine.

Disclosure of Invention

The invention aims to provide a data prediction method, a data prediction device, a storage medium and an electronic device.

In a first aspect, a method for data prediction is provided, the method comprising: acquiring a plurality of historical time series data; converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function.

Optionally, the converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data includes: converting the historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data through the following formula, wherein the formula comprises:

y_j＝[x_j-m,x_j-m+1,...,x_j]^T

wherein x is_jRepresents a plurality of historical time-series data, x, collected at time j_j-mRepresents historical time-series data acquired at time j-m in a plurality of historical time-series data, y_jSet [ y ] of vectors for the first time series data_m+1,y_m+2,...,y_t]The value range of j includes m +1 to t.

Optionally, the determining a set of target identification vectors from the set of identification vectors and the set of first time series data vectors comprises: circularly executing the step of updating the identification vector set until a cycle termination condition is met, and determining the identification vector set when the cycle termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.

Optionally, the determining an activation vector set from the first time series data vector set and the target identification vector set comprises: determining a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.

Optionally, the determining a prediction density function of the data to be predicted according to the density function includes: calculating the information entropy of each data vector in the activated vector set according to the density function; and determining the prediction density function according to the information entropy.

In a second aspect, there is provided a data prediction apparatus, the apparatus comprising: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of historical time series data; the data conversion module is used for converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; the second acquisition module is used for acquiring an identification vector set according to the first time series data vector set; a third obtaining module, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set; a first determining module, configured to determine an activation vector set according to the first time-series data vector set and the target identification vector set; the second determination module is used for acquiring a density function of each data vector in the activation vector set under a preset condition and determining a prediction density function of data to be predicted according to the density function; and the prediction module is used for predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.

Optionally, the data conversion module is configured to convert the historical time-series data into a first time-series data vector set according to a collection time corresponding to a plurality of the historical time-series data by using the following formula, where the formula includes:

y_j＝[x_j-m,x_j-m+1,...,x_j]^T

Optionally, the third obtaining module is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.

Optionally, the first determining module is configured to determine, from the first time-series data vector set, a target vector corresponding to the data to be predicted; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.

Optionally, the second determining module is configured to calculate an information entropy of each data vector in the active vector set according to the density function; and determining the prediction density function according to the information entropy.

In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to the first aspect of the disclosure.

In a fourth aspect, an electronic device is provided, comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.

By the technical scheme, a plurality of historical time sequence data can be acquired; converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function, so that the prediction result can be given in the form of the density function of the data to be predicted, the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for the actual service demand.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram illustrating a method of data prediction in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating yet another method of data prediction in accordance with an exemplary embodiment;

FIG. 3 is a block diagram illustrating an apparatus for data prediction in accordance with an exemplary embodiment;

FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

The present disclosure provides a method, an apparatus, a storage medium, and an electronic device for data prediction, which can predict a plurality of historical time-series data by obtaining the plurality of historical time-series data; converting the historical time sequence data into a first time sequence data vector set according to the acquisition time corresponding to the historical time sequence data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function, so that the prediction result can be given in the form of the density function of the data to be predicted, the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for the actual service demand.

Specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a flow chart illustrating a method of data prediction, as shown in FIG. 1, according to an exemplary embodiment, the method comprising the steps of:

s101, acquiring a plurality of historical time-series data.

The time-series data are data collected at different time points according to a time sequence, and may be used to describe a situation that the data changes over time, for example, the time-series data may include data such as daily transaction amount of banking industry, stock price of stock market, and response time of application system, and the plurality of historical time-series data may include a first preset amount of time-series data collected within a preset historical time period.

And S102, converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data.

In this step, the historical time-series data may be converted into a first time-series data vector set according to a plurality of acquisition times corresponding to the historical time-series data by the following formula, where the formula includes:

y_j＝[x_j-m,x_j-m+1,...,x_j]^T

wherein a plurality of the historical time-series data can be represented as [ x ]₁,x₂,...,x_t]，x_jRepresents a plurality of historical time-series data, x, collected at time j_j-mRepresenting historical time series data acquired at the time j-m in a plurality of historical time series data, and the first time series data vector set can be represented as [ y_m+1,y_m+2,...,y_t]，y_jSet of vectors [ y ] for the first time series data_m+1,y_m+2,...,y_t]And the value range of j includes m +1 to t, in addition, the value of m can be a preset value, and the size of the value of m can be set according to different service requirements in an actual application scene.

After S102 is executed, each of the plurality of historical time-series data acquired after the m-th acquisition time may be transformed into a column vector composed of target time-series data and m data before the target time-series data, where the target time-series data is any one of the plurality of historical time-series data acquired after the m-th acquisition time.

S103, obtaining an identification vector set according to the first time series data vector set.

In this step, a third predetermined number of data vectors may be randomly selected from the first time series data vector set, and the randomly selected third predetermined number of data vectors may be combined into the identification vector set. In addition, to avoid the over-fitting phenomenon, the third predetermined number may be smaller than the number of data vectors in the first time-series data vector set.

S104, determining a target identification vector set according to the identification vector set and the first time series data vector set.

In this step, the step of updating the set of identification vectors may be performed in a loop until a loop termination condition is satisfied, and the set of identification vectors when the loop termination condition is satisfied is determined as the set of target identification vectors; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first time series data vector set, the first distance comprising a distance between a first data vector and each identification vector in the identification vector set, the first data vector comprising any data vector in the first time series data vector set; determining a target distance corresponding to the first data vector from the first distances, and determining an identification vector corresponding to the target distance as a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle end condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.

In a possible implementation manner, when it is determined that the target distance remains unchanged after the first preset number of consecutive cycles, it may be determined that the identification vector has converged, at this time, the converged identification vector may be determined as a target identification vector, and then, the target identification vector set may be determined, so that the time-series data may be predicted according to the target identification vector set in the following.

And S105, determining an activation vector set according to the first time series data vector set and the target identification vector set.

In this step, a target vector corresponding to the data to be predicted may be determined from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activation vector set.

S106, obtaining a density function of each data vector in the activated vector set under a preset condition, and determining a prediction density function of the data to be predicted according to the density function.

Considering that the information entropy can be used as an uncertainty measure of a density function, if the information entropy of a certain data vector in the active vector set is small, it indicates that most of the predicted values based on the data variable are invalid predictions, and at this time, the weight of the predicted values needs to be reduced, so as to improve the accuracy of the prediction, and therefore, in a possible implementation manner, the information entropy of each data vector in the active vector set can be calculated according to the density function; the predicted density function is determined from the information entropy.

And S107, predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.

It should be noted that after the actual value of the data to be predicted is obtained, the data to be predicted can be used as new historical time series data, and the density function of each data vector in the activation vector set is updated according to the new historical time series data, so that the data prediction method in the disclosure can automatically adapt to the new rule of the time series, the prediction of the time series data is more accurate, a large amount of historical data does not need to be learned in advance, and the applicability of the prediction method is improved.

By adopting the method, the prediction result can be given in the form of the density function of the data to be predicted, so that the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for actual service requirements.

FIG. 2 is a flow chart illustrating a method of data prediction, as shown in FIG. 2, according to an exemplary embodiment, the method comprising the steps of:

s201, a plurality of historical time series data are acquired.

S202, converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data.

y_j＝[x_j-m,x_j-m+1,...,x_j]^T

Specifically, the plurality of historical time-series data is [ x ]₁,x₂,...,x_t]Then, the historical time-series data can be converted into a first time-series data vector set of [ y ] according to the formula_m+1,y_m+2,...,y_t]Wherein, in the step (A),

y_m+1＝[x₁,x₂,...,x_m+1]^T

y_m+2＝[x₂,x₃,...,x_m+2]^T

......

y_t＝[x_t-m,x_t-m+1,...,x_t]^T

for example, the time-series data is taken as the daily transaction amount of the banking industry for explanation, and for convenience of explanation, the acquired transaction amount of the banking industry in the last 10 days is represented as [ x₁,x₂,...,x₁₀](t is 10), and in this case, x_iRepresenting the transaction amount of the banking industry on the ith (i is taken from 1 to 10) day, and when the value of m is set to be 5, the first time series data vector set is [ y₆,y₇,y₈,y₉,y₁₀]Wherein, in the step (A),

y₆＝[x₁,x₂,...,x₆]^T

y₇＝[x₂,x₃,...,x₇]^T

y₈＝[x₃,x₄,...,x₈]^T

y₉＝[x₄,x₅,...,x₉]^T

y₁₀＝[x₅,x₆,...,x₁₀]^T

the foregoing examples are illustrative only, and the disclosure is not limited thereto.

That is, after S202 is executed, each of the plurality of the historical time-series data acquired after the m-th acquisition time may be transformed into a column vector composed of target time-series data and m data before the target time-series data, where the target time-series data is any one of the plurality of the historical time-series data acquired after the m-th acquisition time.

S203, obtaining an identification vector set according to the first time series data vector set.

In this step, a third predetermined number of data vectors may be randomly selected from the first time series data vector set, and the randomly selected third predetermined number of data vectors may constitute the identification vector set.

Illustratively, the set of vectors at this first time series data is [ y ]_m+1,y_m+2,...,y_t]Then, K (i.e., the third predetermined number) of y may be randomly selected_jAs identification vectors for different data patterns, and then randomly selecting K y_jConstitute a set of identification vectors of dimension K, e.g. the set of identification vectors may be y_m+1,y_m+2,...,y_m+k]For example, the set of vectors for the first time series data is [ y ]₆,y₇,y₈,y₉,y₁₀]Then, 3(K ═ 3) data vectors (i.e., y) may be randomly selected from the first set of time series data vectors_j) Forming a 3-dimensional identification vector set, the 3-dimensional identification vector set may be composed of [ y₆,y₇,y₈,y₉,y₁₀]Any three data vectors (e.g., [ y ])₆,y₇,y₈]、[y₇,y₈,y₉]、[y₇,y₉,y₁₀]Etc.), the above examples are illustrative only and the disclosure is not limited thereto.

S204, calculating a first distance corresponding to each data vector in the first time series data vector set.

In one possible implementation, the first distance may be obtained by calculating a euclidean distance between each data vector in the first time-series data vector set and each identification vector in the identification vector set.

Illustratively, the set of vectors of the first time series data is [ y ]_m+1,y_m+2,...,y_t]The set of identification vectors is K-dimensional_m+1,y_m+2,...,y_m+k]For example, the first data vector is y_m+1Then, a first data vector y is calculated_m+1With the set of identification vectors y_m+1,y_m+2,...,y_m+k]The distance between each identification vector in the K data vectors and the first data vector y is obtained_m+1A corresponding first distance; when the first data vector is y_m+k+1Then, a first data vector y is calculated_m+k+1With the set of identification vectors y_m+1,y_m+2,...,y_m+k]The distance between each identification vector in the K data vectors and the first data vector y is obtained_m+k+1Corresponding first distance, so that the first time series data vector set [ y ] can be calculated according to a similar calculation method_m+1,y_m+2,...,y_t]Each data vector in the set of identification vectors y_m+1,y_m+2,...,y_m+k]The first distance is obtained from the distance of each identification vector, and the above example is only illustrative and the disclosure does not limit this.

S205, determine a target distance corresponding to the first data vector from the first distances, and determine that the identification vector corresponding to the target distance is the target identification vector corresponding to the first data vector, where the target distance may include a distance with the smallest first distance.

After S204 is executed, K first distances corresponding to each data vector in the first time-series data vector set may be determined, and at this time, a minimum distance of the K first distances corresponding to the first data vector may be determined as a target distance corresponding to the first data vector, and an identification vector corresponding to the target distance may be determined as a target identification vector corresponding to the first data vector.

Illustratively, the set of vectors continuing to be [ y ] for the first time series data₆,y₇,y₈,y₉,y₁₀]The set of identification vectors is 3-dimensional [ y ]₆,y₇,y₈]For example, the first data vector is y₆,y₇,y₈,y₉,y₁₀In any one of the data vectors, y being the first data vector₆When, calculate y₆And identify the set of vectors [ y ]₆,y₇,y₈]The first data vector y may be determined at this point₆The corresponding target identification vector is y₆Similarly, a first data vector y may be determined₇The corresponding target identification vector is y₇First data vector y₈The corresponding target identification vector is y₈In the first data vector y₉When, calculate y₉And identify the set of vectors [ y ]₆,y₇,y₈]The distance of each of the identification vectors in (1), let y₉And an identification vector y₆Is the smallest distance of y₁₀And an identification vector y₇Is minimized, in which case the first data vector y can be determined₉The corresponding target identification vector is y₆First data vector y₁₀The corresponding target identification vector is y₇The above examples are merely illustrative, and the present disclosure is not limited thereto.

S206, calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; and determining the target identification vector set according to the updated target identification vector.

Illustratively, the set of vectors continuing to be [ y ] for the first time series data₆,y₇,y₈,y₉,y₁₀]The set of identification vectors is 3-dimensional [ y ]₆,y₇,y₈]To illustrate, after performing S205, a target identification vector y may be determined₆Corresponding set [ y ] of the first time series data vectors₆,y₇,y₈,y₉,y₁₀]The data vector in (1) is y₆And y₉Target identification vector y₇Corresponding set [ y ] of the first time series data vectors₆,y₇,y₈,y₉,y₁₀]The data vector in (1) is y₇And y₁₀Target identification vector y₈Corresponding set [ y ] of the first time series data vectors₆,y₇,y₈,y₉,y₁₀]The data vector in (1) is y₈Thus, y can be expressed₆And y₉The mean vector of the two data vectors is used as the updated target identification vector y₆', can be substituted by y₇And y₁₀The mean vector of the two data vectors is used as the updated target identification vector y₇', the data vector y may be₈As an updated target identification vector y₈' (in this case, the target identification vector y before update₈I.e. a data vector y in the first time series data vector set₈Itself, no mean calculation is required), such that the updated set of target identification vectors may be determined as y from the updated target identification vectors₆’,y₇’,y₈’]The above examples are merely illustrative, and the present disclosure is not limited thereto.

S207, determining whether the target distance remains unchanged after a first preset number of consecutive cycles.

Executing S208 when it is determined that the target distance remains unchanged after a first preset number of consecutive cycles; when it is determined that the number of cycles does not reach the first preset number and/or the target distance changes, S204 to S207 are performed.

And S208, determining a target vector corresponding to the data to be predicted from the first time series data vector set.

Illustratively, the plurality of historical time-series data is [ x ]₁,x₂,...,x_t]Then, the data to be predicted is x_t+1(Note that, in the present disclosure, the data x to be predicted may be processed_t+1The probability of the preset condition being reached for prediction), that is, in one possible implementation, a plurality of historical time series data [ x ] may be used₁,x₂,...,x_t]Predicting data x at time t +1_t+1At this point, a set of [ y ] vectors from the first time series data may be derived_m+1,y_m+2,...,y_t]To obtain data x to be predicted_t+1The corresponding target vectors are: y is_t＝[x_t-m,x_t-m+1,...,x_t]^TThe above examples are merely illustrative, and the present disclosure is not limited thereto.

S209, calculating a second distance between the target vector and each target identification vector in the set of target identification vectors.

In one possible implementation, the second distance may be obtained by calculating a euclidean distance between the target vector and each target identification vector.

And S210, determining a second preset number of data vectors closest to the target vector according to the second distance in the target identification vector set to obtain the activated vector set.

Illustratively, the target vector is taken as y_predict＝[2,2,3,4]^TThe set of target identification vectors is [ y ]₁,y₂,y₃,y₄]And, y₁＝[1,2,3,4]^T，y₂＝[2,2,4,4]^T，y₃＝[4,4,2,2]^T，y₄＝[4,3,2,1]^TFor example, the target vector y is now described_predictWith the target identification vector set y₁,y₂,y₃,y₄]The second distances of the four target identification vectors in (1) are respectively:

dist(y_predict,y₁)＝1

dist(y_predict,y₂)＝1

dist(y_predict,y₃)＝3.16

dist(y_predict,y₄)＝3.87

at this time, when the second preset number is 2, the target identity vector y in the target identity vector set may be determined₁And y₂The set of activation vectors y may be composed₁,y₂]The above examples are illustrative only, and the present disclosure is not limited thereto。

S211, obtaining a density function of each data vector in the active vector set under a preset condition, and calculating the information entropy of each data vector in the active vector set according to the density function.

Considering that the information entropy can be used as an uncertainty measure of the density function, if the information entropy of a certain data vector in the active vector set is small, it means that most of the predicted values based on the data variable are invalid predictions, and in this case, the weight of the predicted values needs to be reduced, so as to improve the accuracy of the prediction.

In one possible implementation, for convenience of description, the density function of the ith data vector in the set of activation vectors may be represented by the following formula:

wherein f is_i(x) Density function, p, representing the ith data vector in the set of activation vectors_nThe method includes that a + delta is larger than or equal to x and smaller than or equal to a +2 delta, a +2 delta is larger than or equal to x and smaller than or equal to a +3 delta, and a + n delta is larger than or equal to x and smaller than or equal to a + (n +1) delta respectively represent different preset conditions of data to be predicted, a is preset boundary threshold values of a plurality of preset conditions, delta is preset data variation, and n is the number of the preset conditions. At this time, the information entropy of each data vector in the set of active vectors can be calculated according to the density function by the following formula:

wherein, I (f)_i(x) A density function f) representing the ith data vector in the set of activation vectors_i(x) Information entropy of (p)_jRepresenting the probability that the data to be predicted is located in the jth preset condition.

S212, determining a prediction density function of the data to be predicted according to the information entropy.

In a possible implementation manner, the prediction density function of the data to be predicted can be calculated according to the information entropy by the following formula:

wherein f (x) represents a prediction density function of the data to be predicted, f_i(x) A density function, I (f), representing the ith data vector in the set of activation vectors_i(x) A density function f) representing the ith data vector in the set of activation vectors_i(x) The entropy of information of (1).

And S213, predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.

For example, the prediction of daily transaction amount of banking industry is taken as an example for illustration, and in this case, the data x to be predicted_t+1That is, the transaction amount to be predicted, in a possible implementation manner, the following three preset conditions may be set: the transaction amount to be predicted is below 8 ten thousand (namely x)_t+1<80000) The transaction amount to be predicted is between 8 and 10 thousands (i.e. 80000 ≦ x)_t+1Less than or equal to 100000), the transaction amount to be predicted is more than 10 pieces (namely x)_t+1>100000), in this case, the probabilities that the transaction amount to be predicted meets the three preset conditions according to the prediction density function f (x) are respectively: the probability that the transaction amount to be predicted is below 8 ten thousand is 10%, the probability that the transaction amount to be predicted is between 8 ten thousand and 10 ten thousand is 75%, and the probability that the transaction amount to be predicted is above 10 ten thousand is 15%.

FIG. 3 is a block diagram illustrating an apparatus for data prediction, according to an example embodiment, as shown in FIG. 3, the apparatus comprising:

a first obtaining module 301, configured to obtain a plurality of historical time-series data;

a data conversion module 302, configured to convert, according to a plurality of acquisition times corresponding to the historical time-series data, the plurality of historical time-series data into a first time-series data vector set;

a second obtaining module 303, configured to obtain an identification vector set according to the first time series data vector set;

a third obtaining module 304, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set;

a first determining module 305, configured to determine an active vector set according to the first time-series data vector set and the target identification vector set;

a second determining module 306, configured to obtain a density function of each data vector in the active vector set under a preset condition, and determine a predicted density function of data to be predicted according to the density function;

and the predicting module 307 is configured to predict the probability that the data to be predicted meets the preset condition according to the prediction density function.

Optionally, the data conversion module 302 is configured to convert the historical time-series data into a first time-series data vector set according to a plurality of acquisition time instants corresponding to the historical time-series data by using the following formula, where the formula includes:

y_j＝[x_j-m,x_j-m+1,...,x_j]^T

Optionally, the third obtaining module 304 is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first time series data vector set, the first distance comprising a distance between a first data vector and each identification vector in the identification vector set, the first data vector comprising any data vector in the first time series data vector set; determining a target distance corresponding to the first data vector from the first distance, and determining an identification vector corresponding to the target distance as a target identification vector corresponding to the first data vector; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle end condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.

Optionally, the first determining module 305 is configured to determine a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activation vector set.

Optionally, the second determining module 306 is configured to calculate an information entropy of each data vector in the active vector set according to the density function; the predicted density function is determined from the information entropy.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

By adopting the device, the prediction result can be given in the form of the density function of the data to be predicted, so that the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for actual service requirements.

Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communications component 405.

The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the data prediction method. The memory 402 is used to store various types of data to support operation at the electronic device 400, such as instructions for any application or method operating on the electronic device 400 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the data prediction method described above.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data prediction method described above is also provided. For example, the computer readable storage medium may be the memory 402 described above including program instructions that are executable by the processor 401 of the electronic device 400 to perform the data prediction method described above.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method of data prediction, the method comprising:

acquiring a plurality of historical time series data;

converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data;

obtaining an identification vector set according to the first time series data vector set;

determining a target identification vector set according to the identification vector set and the first time series data vector set;

determining an activation vector set according to the first time series data vector set and the target identification vector set;

acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function;

predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function; the data to be predicted comprises business transaction amount of banking every day, stock price or performance index data of an application system, and the preset condition comprises a data interval where the data to be predicted is located;

wherein the determining a set of target identification vectors from the set of identification vectors and the set of first time series data vectors comprises:

circularly executing the step of updating the identification vector set until a cycle termination condition is met, and determining the identification vector set when the cycle termination condition is met as the target identification vector set;

the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector;

the cycle termination condition comprises that the target distance is kept unchanged after a first preset number of continuous cycles;

the determining an activation vector set from the first time series data vector set and the target identification vector set comprises:

determining a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.

2. The method of claim 1, wherein determining a predicted density function for the data to be predicted from the density function comprises:

calculating the information entropy of each data vector in the activated vector set according to the density function;

and determining the prediction density function according to the information entropy.

3. A data prediction apparatus, characterized in that the apparatus comprises:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of historical time series data;

the data conversion module is used for converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data;

the second acquisition module is used for acquiring an identification vector set according to the first time series data vector set;

a third obtaining module, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set;

a first determining module, configured to determine an activation vector set according to the first time-series data vector set and the target identification vector set;

the second determination module is used for acquiring a density function of each data vector in the activation vector set under a preset condition and determining a prediction density function of data to be predicted according to the density function;

the prediction module is used for predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function; the data to be predicted comprises business transaction amount of banking every day, stock price or performance index data of an application system, and the preset condition comprises a data interval where the data to be predicted is located;

the third obtaining module is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition comprises that the target distance is kept unchanged after a first preset number of continuous cycles;

the first determining module is configured to determine, from the first time-series data vector set, a target vector corresponding to the data to be predicted; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.

4. The apparatus of claim 3, wherein the second determining module is configured to calculate an information entropy of each data vector in the set of active vectors according to the density function; and determining the prediction density function according to the information entropy.

5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 1 or 2.

6. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of claim 1 or 2.