CN109754115B - Data prediction method and device, storage medium and electronic equipment - Google Patents

Data prediction method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109754115B
CN109754115B CN201811475791.8A CN201811475791A CN109754115B CN 109754115 B CN109754115 B CN 109754115B CN 201811475791 A CN201811475791 A CN 201811475791A CN 109754115 B CN109754115 B CN 109754115B
Authority
CN
China
Prior art keywords
vector
data
vector set
target
series data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811475791.8A
Other languages
Chinese (zh)
Other versions
CN109754115A (en
Inventor
孙木鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811475791.8A priority Critical patent/CN109754115B/en
Publication of CN109754115A publication Critical patent/CN109754115A/en
Application granted granted Critical
Publication of CN109754115B publication Critical patent/CN109754115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for data prediction, which can predict a plurality of historical time-series data by obtaining the plurality of historical time-series data; converting the historical time sequence data into a first time sequence data vector set according to the acquisition time corresponding to the historical time sequence data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.

Description

Data prediction method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of data prediction, and in particular, to a method, an apparatus, a storage medium, and an electronic device for data prediction.
Background
The time series prediction technology is to predict the development trend of data to guide the solution of practical problems based on the ordered data associated with the time series, and nowadays, the prediction of the time series data plays an extremely important role in different industries, for example, the banking industry is used for predicting the change of daily transaction amount; the exchange is used for predicting the stock price change rule of the stock market; and detecting the future trend of key indexes such as CPU, memory, HTTP response time and the like of the application system.
However, with the rapid development of computer software technology, data measurement models are larger and larger, and the complexity of time series data is higher and higher, so that the regularity of data change is difficult to mine.
Disclosure of Invention
The invention aims to provide a data prediction method, a data prediction device, a storage medium and an electronic device.
In a first aspect, a method for data prediction is provided, the method comprising: acquiring a plurality of historical time series data; converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function.
Optionally, the converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data includes: converting the historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data through the following formula, wherein the formula comprises:
yj=[xj-m,xj-m+1,...,xj]T
wherein x isjRepresents a plurality of historical time-series data, x, collected at time jj-mRepresents historical time-series data acquired at time j-m in a plurality of historical time-series data, yjSet [ y ] of vectors for the first time series datam+1,ym+2,...,yt]The value range of j includes m +1 to t.
Optionally, the determining a set of target identification vectors from the set of identification vectors and the set of first time series data vectors comprises: circularly executing the step of updating the identification vector set until a cycle termination condition is met, and determining the identification vector set when the cycle termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.
Optionally, the determining an activation vector set from the first time series data vector set and the target identification vector set comprises: determining a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.
Optionally, the determining a prediction density function of the data to be predicted according to the density function includes: calculating the information entropy of each data vector in the activated vector set according to the density function; and determining the prediction density function according to the information entropy.
In a second aspect, there is provided a data prediction apparatus, the apparatus comprising: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of historical time series data; the data conversion module is used for converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; the second acquisition module is used for acquiring an identification vector set according to the first time series data vector set; a third obtaining module, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set; a first determining module, configured to determine an activation vector set according to the first time-series data vector set and the target identification vector set; the second determination module is used for acquiring a density function of each data vector in the activation vector set under a preset condition and determining a prediction density function of data to be predicted according to the density function; and the prediction module is used for predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.
Optionally, the data conversion module is configured to convert the historical time-series data into a first time-series data vector set according to a collection time corresponding to a plurality of the historical time-series data by using the following formula, where the formula includes:
yj=[xj-m,xj-m+1,...,xj]T
wherein x isjRepresents a plurality of historical time-series data, x, collected at time jj-mRepresents historical time-series data acquired at time j-m in a plurality of historical time-series data, yjSet [ y ] of vectors for the first time series datam+1,ym+2,...,yt]The value range of j includes m +1 to t.
Optionally, the third obtaining module is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.
Optionally, the first determining module is configured to determine, from the first time-series data vector set, a target vector corresponding to the data to be predicted; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.
Optionally, the second determining module is configured to calculate an information entropy of each data vector in the active vector set according to the density function; and determining the prediction density function according to the information entropy.
In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to the first aspect of the disclosure.
In a fourth aspect, an electronic device is provided, comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.
By the technical scheme, a plurality of historical time sequence data can be acquired; converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function, so that the prediction result can be given in the form of the density function of the data to be predicted, the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for the actual service demand.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a method of data prediction in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating yet another method of data prediction in accordance with an exemplary embodiment;
FIG. 3 is a block diagram illustrating an apparatus for data prediction in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
The present disclosure provides a method, an apparatus, a storage medium, and an electronic device for data prediction, which can predict a plurality of historical time-series data by obtaining the plurality of historical time-series data; converting the historical time sequence data into a first time sequence data vector set according to the acquisition time corresponding to the historical time sequence data; obtaining an identification vector set according to the first time series data vector set; determining a target identification vector set according to the identification vector set and the first time series data vector set; determining an activation vector set according to the first time series data vector set and the target identification vector set; acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function; and predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function, so that the prediction result can be given in the form of the density function of the data to be predicted, the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for the actual service demand.
Specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
FIG. 1 is a flow chart illustrating a method of data prediction, as shown in FIG. 1, according to an exemplary embodiment, the method comprising the steps of:
s101, acquiring a plurality of historical time-series data.
The time-series data are data collected at different time points according to a time sequence, and may be used to describe a situation that the data changes over time, for example, the time-series data may include data such as daily transaction amount of banking industry, stock price of stock market, and response time of application system, and the plurality of historical time-series data may include a first preset amount of time-series data collected within a preset historical time period.
And S102, converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data.
In this step, the historical time-series data may be converted into a first time-series data vector set according to a plurality of acquisition times corresponding to the historical time-series data by the following formula, where the formula includes:
yj=[xj-m,xj-m+1,...,xj]T
wherein a plurality of the historical time-series data can be represented as [ x ]1,x2,...,xt],xjRepresents a plurality of historical time-series data, x, collected at time jj-mRepresenting historical time series data acquired at the time j-m in a plurality of historical time series data, and the first time series data vector set can be represented as [ ym+1,ym+2,...,yt],yjSet of vectors [ y ] for the first time series datam+1,ym+2,...,yt]And the value range of j includes m +1 to t, in addition, the value of m can be a preset value, and the size of the value of m can be set according to different service requirements in an actual application scene.
After S102 is executed, each of the plurality of historical time-series data acquired after the m-th acquisition time may be transformed into a column vector composed of target time-series data and m data before the target time-series data, where the target time-series data is any one of the plurality of historical time-series data acquired after the m-th acquisition time.
S103, obtaining an identification vector set according to the first time series data vector set.
In this step, a third predetermined number of data vectors may be randomly selected from the first time series data vector set, and the randomly selected third predetermined number of data vectors may be combined into the identification vector set. In addition, to avoid the over-fitting phenomenon, the third predetermined number may be smaller than the number of data vectors in the first time-series data vector set.
S104, determining a target identification vector set according to the identification vector set and the first time series data vector set.
In this step, the step of updating the set of identification vectors may be performed in a loop until a loop termination condition is satisfied, and the set of identification vectors when the loop termination condition is satisfied is determined as the set of target identification vectors; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first time series data vector set, the first distance comprising a distance between a first data vector and each identification vector in the identification vector set, the first data vector comprising any data vector in the first time series data vector set; determining a target distance corresponding to the first data vector from the first distances, and determining an identification vector corresponding to the target distance as a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle end condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.
In a possible implementation manner, when it is determined that the target distance remains unchanged after the first preset number of consecutive cycles, it may be determined that the identification vector has converged, at this time, the converged identification vector may be determined as a target identification vector, and then, the target identification vector set may be determined, so that the time-series data may be predicted according to the target identification vector set in the following.
And S105, determining an activation vector set according to the first time series data vector set and the target identification vector set.
In this step, a target vector corresponding to the data to be predicted may be determined from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activation vector set.
S106, obtaining a density function of each data vector in the activated vector set under a preset condition, and determining a prediction density function of the data to be predicted according to the density function.
Considering that the information entropy can be used as an uncertainty measure of a density function, if the information entropy of a certain data vector in the active vector set is small, it indicates that most of the predicted values based on the data variable are invalid predictions, and at this time, the weight of the predicted values needs to be reduced, so as to improve the accuracy of the prediction, and therefore, in a possible implementation manner, the information entropy of each data vector in the active vector set can be calculated according to the density function; the predicted density function is determined from the information entropy.
And S107, predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.
It should be noted that after the actual value of the data to be predicted is obtained, the data to be predicted can be used as new historical time series data, and the density function of each data vector in the activation vector set is updated according to the new historical time series data, so that the data prediction method in the disclosure can automatically adapt to the new rule of the time series, the prediction of the time series data is more accurate, a large amount of historical data does not need to be learned in advance, and the applicability of the prediction method is improved.
By adopting the method, the prediction result can be given in the form of the density function of the data to be predicted, so that the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for actual service requirements.
FIG. 2 is a flow chart illustrating a method of data prediction, as shown in FIG. 2, according to an exemplary embodiment, the method comprising the steps of:
s201, a plurality of historical time series data are acquired.
The time-series data are data collected at different time points according to a time sequence, and may be used to describe a situation that the data changes over time, for example, the time-series data may include data such as daily transaction amount of banking industry, stock price of stock market, and response time of application system, and the plurality of historical time-series data may include a first preset amount of time-series data collected within a preset historical time period.
S202, converting the plurality of historical time-series data into a first time-series data vector set according to the acquisition time corresponding to the plurality of historical time-series data.
In this step, the historical time-series data may be converted into a first time-series data vector set according to a plurality of acquisition times corresponding to the historical time-series data by the following formula, where the formula includes:
yj=[xj-m,xj-m+1,...,xj]T
wherein a plurality of the historical time-series data can be represented as [ x ]1,x2,...,xt],xjRepresents a plurality of historical time-series data, x, collected at time jj-mRepresenting historical time series data acquired at the time j-m in a plurality of historical time series data, and the first time series data vector set can be represented as [ ym+1,ym+2,...,yt],yjSet of vectors [ y ] for the first time series datam+1,ym+2,...,yt]And the value range of j includes m +1 to t, in addition, the value of m can be a preset value, and the size of the value of m can be set according to different service requirements in an actual application scene.
Specifically, the plurality of historical time-series data is [ x ]1,x2,...,xt]Then, the historical time-series data can be converted into a first time-series data vector set of [ y ] according to the formulam+1,ym+2,...,yt]Wherein, in the step (A),
ym+1=[x1,x2,...,xm+1]T
ym+2=[x2,x3,...,xm+2]T
......
yt=[xt-m,xt-m+1,...,xt]T
for example, the time-series data is taken as the daily transaction amount of the banking industry for explanation, and for convenience of explanation, the acquired transaction amount of the banking industry in the last 10 days is represented as [ x1,x2,...,x10](t is 10), and in this case, xiRepresenting the transaction amount of the banking industry on the ith (i is taken from 1 to 10) day, and when the value of m is set to be 5, the first time series data vector set is [ y6,y7,y8,y9,y10]Wherein, in the step (A),
y6=[x1,x2,...,x6]T
y7=[x2,x3,...,x7]T
y8=[x3,x4,...,x8]T
y9=[x4,x5,...,x9]T
y10=[x5,x6,...,x10]T
the foregoing examples are illustrative only, and the disclosure is not limited thereto.
That is, after S202 is executed, each of the plurality of the historical time-series data acquired after the m-th acquisition time may be transformed into a column vector composed of target time-series data and m data before the target time-series data, where the target time-series data is any one of the plurality of the historical time-series data acquired after the m-th acquisition time.
S203, obtaining an identification vector set according to the first time series data vector set.
In this step, a third predetermined number of data vectors may be randomly selected from the first time series data vector set, and the randomly selected third predetermined number of data vectors may constitute the identification vector set.
Illustratively, the set of vectors at this first time series data is [ y ]m+1,ym+2,...,yt]Then, K (i.e., the third predetermined number) of y may be randomly selectedjAs identification vectors for different data patterns, and then randomly selecting K yjConstitute a set of identification vectors of dimension K, e.g. the set of identification vectors may be ym+1,ym+2,...,ym+k]For example, the set of vectors for the first time series data is [ y ]6,y7,y8,y9,y10]Then, 3(K ═ 3) data vectors (i.e., y) may be randomly selected from the first set of time series data vectorsj) Forming a 3-dimensional identification vector set, the 3-dimensional identification vector set may be composed of [ y6,y7,y8,y9,y10]Any three data vectors (e.g., [ y ])6,y7,y8]、[y7,y8,y9]、[y7,y9,y10]Etc.), the above examples are illustrative only and the disclosure is not limited thereto.
S204, calculating a first distance corresponding to each data vector in the first time series data vector set.
In one possible implementation, the first distance may be obtained by calculating a euclidean distance between each data vector in the first time-series data vector set and each identification vector in the identification vector set.
Illustratively, the set of vectors of the first time series data is [ y ]m+1,ym+2,...,yt]The set of identification vectors is K-dimensionalm+1,ym+2,...,ym+k]For example, the first data vector is ym+1Then, a first data vector y is calculatedm+1With the set of identification vectors ym+1,ym+2,...,ym+k]The distance between each identification vector in the K data vectors and the first data vector y is obtainedm+1A corresponding first distance; when the first data vector is ym+k+1Then, a first data vector y is calculatedm+k+1With the set of identification vectors ym+1,ym+2,...,ym+k]The distance between each identification vector in the K data vectors and the first data vector y is obtainedm+k+1Corresponding first distance, so that the first time series data vector set [ y ] can be calculated according to a similar calculation methodm+1,ym+2,...,yt]Each data vector in the set of identification vectors ym+1,ym+2,...,ym+k]The first distance is obtained from the distance of each identification vector, and the above example is only illustrative and the disclosure does not limit this.
S205, determine a target distance corresponding to the first data vector from the first distances, and determine that the identification vector corresponding to the target distance is the target identification vector corresponding to the first data vector, where the target distance may include a distance with the smallest first distance.
After S204 is executed, K first distances corresponding to each data vector in the first time-series data vector set may be determined, and at this time, a minimum distance of the K first distances corresponding to the first data vector may be determined as a target distance corresponding to the first data vector, and an identification vector corresponding to the target distance may be determined as a target identification vector corresponding to the first data vector.
Illustratively, the set of vectors continuing to be [ y ] for the first time series data6,y7,y8,y9,y10]The set of identification vectors is 3-dimensional [ y ]6,y7,y8]For example, the first data vector is y6,y7,y8,y9,y10In any one of the data vectors, y being the first data vector6When, calculate y6And identify the set of vectors [ y ]6,y7,y8]The first data vector y may be determined at this point6The corresponding target identification vector is y6Similarly, a first data vector y may be determined7The corresponding target identification vector is y7First data vector y8The corresponding target identification vector is y8In the first data vector y9When, calculate y9And identify the set of vectors [ y ]6,y7,y8]The distance of each of the identification vectors in (1), let y9And an identification vector y6Is the smallest distance of y10And an identification vector y7Is minimized, in which case the first data vector y can be determined9The corresponding target identification vector is y6First data vector y10The corresponding target identification vector is y7The above examples are merely illustrative, and the present disclosure is not limited thereto.
S206, calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; and determining the target identification vector set according to the updated target identification vector.
Illustratively, the set of vectors continuing to be [ y ] for the first time series data6,y7,y8,y9,y10]The set of identification vectors is 3-dimensional [ y ]6,y7,y8]To illustrate, after performing S205, a target identification vector y may be determined6Corresponding set [ y ] of the first time series data vectors6,y7,y8,y9,y10]The data vector in (1) is y6And y9Target identification vector y7Corresponding set [ y ] of the first time series data vectors6,y7,y8,y9,y10]The data vector in (1) is y7And y10Target identification vector y8Corresponding set [ y ] of the first time series data vectors6,y7,y8,y9,y10]The data vector in (1) is y8Thus, y can be expressed6And y9The mean vector of the two data vectors is used as the updated target identification vector y6', can be substituted by y7And y10The mean vector of the two data vectors is used as the updated target identification vector y7', the data vector y may be8As an updated target identification vector y8' (in this case, the target identification vector y before update8I.e. a data vector y in the first time series data vector set8Itself, no mean calculation is required), such that the updated set of target identification vectors may be determined as y from the updated target identification vectors6’,y7’,y8’]The above examples are merely illustrative, and the present disclosure is not limited thereto.
S207, determining whether the target distance remains unchanged after a first preset number of consecutive cycles.
In a possible implementation manner, when it is determined that the target distance remains unchanged after the first preset number of consecutive cycles, it may be determined that the identification vector has converged, at this time, the converged identification vector may be determined as a target identification vector, and then, the target identification vector set may be determined, so that the time-series data may be predicted according to the target identification vector set in the following.
Executing S208 when it is determined that the target distance remains unchanged after a first preset number of consecutive cycles; when it is determined that the number of cycles does not reach the first preset number and/or the target distance changes, S204 to S207 are performed.
And S208, determining a target vector corresponding to the data to be predicted from the first time series data vector set.
Illustratively, the plurality of historical time-series data is [ x ]1,x2,...,xt]Then, the data to be predicted is xt+1(Note that, in the present disclosure, the data x to be predicted may be processedt+1The probability of the preset condition being reached for prediction), that is, in one possible implementation, a plurality of historical time series data [ x ] may be used1,x2,...,xt]Predicting data x at time t +1t+1At this point, a set of [ y ] vectors from the first time series data may be derivedm+1,ym+2,...,yt]To obtain data x to be predictedt+1The corresponding target vectors are: y ist=[xt-m,xt-m+1,...,xt]TThe above examples are merely illustrative, and the present disclosure is not limited thereto.
S209, calculating a second distance between the target vector and each target identification vector in the set of target identification vectors.
In one possible implementation, the second distance may be obtained by calculating a euclidean distance between the target vector and each target identification vector.
And S210, determining a second preset number of data vectors closest to the target vector according to the second distance in the target identification vector set to obtain the activated vector set.
Illustratively, the target vector is taken as ypredict=[2,2,3,4]TThe set of target identification vectors is [ y ]1,y2,y3,y4]And, y1=[1,2,3,4]T,y2=[2,2,4,4]T,y3=[4,4,2,2]T,y4=[4,3,2,1]TFor example, the target vector y is now describedpredictWith the target identification vector set y1,y2,y3,y4]The second distances of the four target identification vectors in (1) are respectively:
dist(ypredict,y1)=1
dist(ypredict,y2)=1
dist(ypredict,y3)=3.16
dist(ypredict,y4)=3.87
at this time, when the second preset number is 2, the target identity vector y in the target identity vector set may be determined1And y2The set of activation vectors y may be composed1,y2]The above examples are illustrative only, and the present disclosure is not limited thereto。
S211, obtaining a density function of each data vector in the active vector set under a preset condition, and calculating the information entropy of each data vector in the active vector set according to the density function.
Considering that the information entropy can be used as an uncertainty measure of the density function, if the information entropy of a certain data vector in the active vector set is small, it means that most of the predicted values based on the data variable are invalid predictions, and in this case, the weight of the predicted values needs to be reduced, so as to improve the accuracy of the prediction.
In one possible implementation, for convenience of description, the density function of the ith data vector in the set of activation vectors may be represented by the following formula:
Figure GDA0002919431320000161
wherein f isi(x) Density function, p, representing the ith data vector in the set of activation vectorsnThe method includes that a + delta is larger than or equal to x and smaller than or equal to a +2 delta, a +2 delta is larger than or equal to x and smaller than or equal to a +3 delta, and a + n delta is larger than or equal to x and smaller than or equal to a + (n +1) delta respectively represent different preset conditions of data to be predicted, a is preset boundary threshold values of a plurality of preset conditions, delta is preset data variation, and n is the number of the preset conditions. At this time, the information entropy of each data vector in the set of active vectors can be calculated according to the density function by the following formula:
Figure GDA0002919431320000162
wherein, I (f)i(x) A density function f) representing the ith data vector in the set of activation vectorsi(x) Information entropy of (p)jRepresenting the probability that the data to be predicted is located in the jth preset condition.
S212, determining a prediction density function of the data to be predicted according to the information entropy.
In a possible implementation manner, the prediction density function of the data to be predicted can be calculated according to the information entropy by the following formula:
Figure GDA0002919431320000171
wherein f (x) represents a prediction density function of the data to be predicted, fi(x) A density function, I (f), representing the ith data vector in the set of activation vectorsi(x) A density function f) representing the ith data vector in the set of activation vectorsi(x) The entropy of information of (1).
And S213, predicting the probability that the data to be predicted meets the preset condition according to the prediction density function.
For example, the prediction of daily transaction amount of banking industry is taken as an example for illustration, and in this case, the data x to be predictedt+1That is, the transaction amount to be predicted, in a possible implementation manner, the following three preset conditions may be set: the transaction amount to be predicted is below 8 ten thousand (namely x)t+1<80000) The transaction amount to be predicted is between 8 and 10 thousands (i.e. 80000 ≦ x)t+1Less than or equal to 100000), the transaction amount to be predicted is more than 10 pieces (namely x)t+1>100000), in this case, the probabilities that the transaction amount to be predicted meets the three preset conditions according to the prediction density function f (x) are respectively: the probability that the transaction amount to be predicted is below 8 ten thousand is 10%, the probability that the transaction amount to be predicted is between 8 ten thousand and 10 ten thousand is 75%, and the probability that the transaction amount to be predicted is above 10 ten thousand is 15%.
It should be noted that after the actual value of the data to be predicted is obtained, the data to be predicted can be used as new historical time series data, and the density function of each data vector in the activation vector set is updated according to the new historical time series data, so that the data prediction method in the disclosure can automatically adapt to the new rule of the time series, the prediction of the time series data is more accurate, a large amount of historical data does not need to be learned in advance, and the applicability of the prediction method is improved.
By adopting the method, the prediction result can be given in the form of the density function of the data to be predicted, so that the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for actual service requirements.
FIG. 3 is a block diagram illustrating an apparatus for data prediction, according to an example embodiment, as shown in FIG. 3, the apparatus comprising:
a first obtaining module 301, configured to obtain a plurality of historical time-series data;
a data conversion module 302, configured to convert, according to a plurality of acquisition times corresponding to the historical time-series data, the plurality of historical time-series data into a first time-series data vector set;
a second obtaining module 303, configured to obtain an identification vector set according to the first time series data vector set;
a third obtaining module 304, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set;
a first determining module 305, configured to determine an active vector set according to the first time-series data vector set and the target identification vector set;
a second determining module 306, configured to obtain a density function of each data vector in the active vector set under a preset condition, and determine a predicted density function of data to be predicted according to the density function;
and the predicting module 307 is configured to predict the probability that the data to be predicted meets the preset condition according to the prediction density function.
Optionally, the data conversion module 302 is configured to convert the historical time-series data into a first time-series data vector set according to a plurality of acquisition time instants corresponding to the historical time-series data by using the following formula, where the formula includes:
yj=[xj-m,xj-m+1,...,xj]T
wherein x isjRepresents a plurality of historical time-series data, x, collected at time jj-mRepresents historical time-series data acquired at time j-m in a plurality of historical time-series data, yjSet [ y ] of vectors for the first time series datam+1,ym+2,...,yt]The value range of j includes m +1 to t.
Optionally, the third obtaining module 304 is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first time series data vector set, the first distance comprising a distance between a first data vector and each identification vector in the identification vector set, the first data vector comprising any data vector in the first time series data vector set; determining a target distance corresponding to the first data vector from the first distance, and determining an identification vector corresponding to the target distance as a target identification vector corresponding to the first data vector; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle end condition includes that the target distance remains unchanged after a first preset number of consecutive cycles.
Optionally, the first determining module 305 is configured to determine a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activation vector set.
Optionally, the second determining module 306 is configured to calculate an information entropy of each data vector in the active vector set according to the density function; the predicted density function is determined from the information entropy.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
By adopting the device, the prediction result can be given in the form of the density function of the data to be predicted, so that the probability that the data to be predicted meets different preset conditions can be displayed to a user according to the density function, and a higher reference value is provided for actual service requirements.
Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communications component 405.
The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the data prediction method. The memory 402 is used to store various types of data to support operation at the electronic device 400, such as instructions for any application or method operating on the electronic device 400 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the data prediction method described above.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the data prediction method described above is also provided. For example, the computer readable storage medium may be the memory 402 described above including program instructions that are executable by the processor 401 of the electronic device 400 to perform the data prediction method described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (6)

1. A method of data prediction, the method comprising:
acquiring a plurality of historical time series data;
converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data;
obtaining an identification vector set according to the first time series data vector set;
determining a target identification vector set according to the identification vector set and the first time series data vector set;
determining an activation vector set according to the first time series data vector set and the target identification vector set;
acquiring a density function of each data vector in the activation vector set under a preset condition, and determining a prediction density function of data to be predicted according to the density function;
predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function; the data to be predicted comprises business transaction amount of banking every day, stock price or performance index data of an application system, and the preset condition comprises a data interval where the data to be predicted is located;
wherein the determining a set of target identification vectors from the set of identification vectors and the set of first time series data vectors comprises:
circularly executing the step of updating the identification vector set until a cycle termination condition is met, and determining the identification vector set when the cycle termination condition is met as the target identification vector set;
the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector;
the cycle termination condition comprises that the target distance is kept unchanged after a first preset number of continuous cycles;
the determining an activation vector set from the first time series data vector set and the target identification vector set comprises:
determining a target vector corresponding to the data to be predicted from the first time series data vector set; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.
2. The method of claim 1, wherein determining a predicted density function for the data to be predicted from the density function comprises:
calculating the information entropy of each data vector in the activated vector set according to the density function;
and determining the prediction density function according to the information entropy.
3. A data prediction apparatus, characterized in that the apparatus comprises:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a plurality of historical time series data;
the data conversion module is used for converting the historical time series data into a first time series data vector set according to the acquisition time corresponding to the historical time series data;
the second acquisition module is used for acquiring an identification vector set according to the first time series data vector set;
a third obtaining module, configured to determine a target identification vector set according to the identification vector set and the first time series data vector set;
a first determining module, configured to determine an activation vector set according to the first time-series data vector set and the target identification vector set;
the second determination module is used for acquiring a density function of each data vector in the activation vector set under a preset condition and determining a prediction density function of data to be predicted according to the density function;
the prediction module is used for predicting the probability that the data to be predicted meets the preset conditions according to the prediction density function; the data to be predicted comprises business transaction amount of banking every day, stock price or performance index data of an application system, and the preset condition comprises a data interval where the data to be predicted is located;
the third obtaining module is configured to perform the step of updating the identification vector set in a loop until a loop termination condition is met, and determine the identification vector set when the loop termination condition is met as the target identification vector set; the step of updating the identification vector set comprises the following steps: calculating a first distance corresponding to each data vector in the first set of time series data vectors, the first distance comprising a distance of a first data vector from each identification vector in the set of identification vectors, the first data vector comprising any data vector in the first set of time series data vectors; determining a target distance corresponding to the first data vector from the first distances, and determining that an identification vector corresponding to the target distance is a target identification vector corresponding to the first data vector, wherein the target distance comprises a distance with the minimum first distance; calculating a mean vector of data vectors corresponding to the target identification vector, and taking the mean vector as the updated target identification vector; determining the target identification vector set according to the updated target identification vector; the cycle termination condition comprises that the target distance is kept unchanged after a first preset number of continuous cycles;
the first determining module is configured to determine, from the first time-series data vector set, a target vector corresponding to the data to be predicted; calculating a second distance between the target vector and each target identification vector in the set of target identification vectors; and in the target identification vector set, determining a second preset number of data vectors closest to the target vector according to the second distance to obtain the activated vector set.
4. The apparatus of claim 3, wherein the second determining module is configured to calculate an information entropy of each data vector in the set of active vectors according to the density function; and determining the prediction density function according to the information entropy.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 1 or 2.
6. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of claim 1 or 2.
CN201811475791.8A 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment Active CN109754115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811475791.8A CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811475791.8A CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109754115A CN109754115A (en) 2019-05-14
CN109754115B true CN109754115B (en) 2021-03-26

Family

ID=66403636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811475791.8A Active CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109754115B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291824B (en) * 2020-02-24 2024-03-22 网易(杭州)网络有限公司 Time series processing method, device, electronic equipment and computer readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence
CN107180278A (en) * 2017-05-27 2017-09-19 重庆大学 A kind of real-time passenger flow forecasting of track traffic
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9824067B2 (en) * 2014-08-01 2017-11-21 Tata Consultancy Services Limited System and method for forecasting a time series data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence
CN107180278A (en) * 2017-05-27 2017-09-19 重庆大学 A kind of real-time passenger flow forecasting of track traffic
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Pair_Copula自回归模型及其在股票指数中的应用";王婷;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》;20180115(第01期);第J145-128页 *

Also Published As

Publication number Publication date
CN109754115A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN107861915B (en) Method and device for acquiring early warning threshold value and storage medium
US9792889B1 (en) Music modeling
CN114285728B (en) Predictive model training method, traffic prediction device and storage medium
CN110858062B (en) Target optimization parameter obtaining method and model training method and device
CN111914516B (en) Method, device, equipment and storage medium for generating network data prediction sequence
CN110020427B (en) Policy determination method and device
CN113516480B (en) Payment risk identification method, device and equipment
CN111931345B (en) Monitoring data prediction method, device, equipment and readable storage medium
US20180314978A1 (en) Learning apparatus and method for learning a model corresponding to a function changing in time series
CN111340221A (en) Method and device for sampling neural network structure
CN111047429A (en) Probability prediction method and device
CN111353601A (en) Method and apparatus for predicting delay of model structure
CN113468344B (en) Entity relationship extraction method and device, electronic equipment and computer readable medium
CN109754115B (en) Data prediction method and device, storage medium and electronic equipment
CN116721724A (en) Alloy performance prediction method and device, storage medium, electronic equipment and chip
JP6590477B2 (en) Information processing apparatus, information processing method, and program
CN116610840A (en) Similar data searching method, system and electronic equipment
CN111582456B (en) Method, apparatus, device and medium for generating network model information
CN112948763B (en) Piece quantity prediction method and device, electronic equipment and storage medium
CN114048136A (en) Test type determination method, device, server, medium and product
CN111339432A (en) Recommendation method and device of electronic object and electronic equipment
CN116561735B (en) Mutual trust authentication method and system based on multiple authentication sources and electronic equipment
CN114792256B (en) Crowd expansion method and device based on model selection
CN111582482B (en) Method, apparatus, device and medium for generating network model information
CN116307998B (en) Power equipment material transportation method, device, electronic equipment and computer medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant