CN109754115A

CN109754115A - Method, apparatus, storage medium and the electronic equipment of data prediction

Info

Publication number: CN109754115A
Application number: CN201811475791.8A
Authority: CN
Inventors: 孙木鑫
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2019-05-14
Anticipated expiration: 2038-12-04
Also published as: CN109754115B

Abstract

This disclosure relates to a kind of method, apparatus, storage medium and the electronic equipment of data prediction, it can be by obtaining multiple historical time sequence datas；Multiple historical time sequence data is converted into first time sequence data vector set according to multiple historical time sequence datas corresponding acquisition moment；Mark vector collection is obtained according to the first time sequence data vector set；Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set；Activation vector set is determined according to the first time sequence data vector set and the target identification vector set；The density function of each data vector on a preset condition based in the activation vector set is obtained, and determines the predicted density function of data to be predicted according to the density function；Meet the probability of the preset condition according to the predicted density function prediction data to be predicted.

Description

Method, apparatus, storage medium and the electronic equipment of data prediction

Technical field

This disclosure relates to which data predict field, and in particular, to the method, apparatus of data prediction a kind of, storage medium and Electronic equipment.

Background technique

Time series forecasting technology is based on ordered data associated with time sequencing, thus it is speculated that the development trend of data with Solving practical problems are instructed, nowadays, the prediction of time series data all plays extremely important work in different industries With for example, banking is used to predict the situation of change of daily trading volume；Exchange is used to predict that the stock price of stock market to become Law；Detect the CPU of application system, memory, the future trend etc. of the key indexes such as http response time.

But with the fast development of computer software technology, data volume scale is increasing, and time series data Complexity it is higher and higher so that the regularity of data variation is also increasingly difficult to excavate, traditional time series data prediction Method finds the changing rule of data by carrying out Mathematical Fitting to data, but for the pre- of discrete time series data It surveys, the precision of prediction of Classical forecast algorithm is lower, and conventional time series data prediction algorithm only shows shape with predicted value Formula provides prediction result and is not able to satisfy actual business demand.

Summary of the invention

Purpose of this disclosure is to provide method, apparatus, storage medium and the electronic equipments of a kind of prediction of data.

In a first aspect, providing a kind of method of data prediction, which comprises obtain multiple historical time sequence numbers According to；The multiple historical time sequence data is converted to according to multiple historical time sequence datas corresponding acquisition moment First time sequence data vector set；Mark vector collection is obtained according to the first time sequence data vector set；According to described Mark vector collection and the first time sequence data vector set determine target identification vector set；According to the first time sequence Data vector collection and the target identification vector set determine activation vector set；Obtain in the activation vector set each data to Density function on a preset condition based is measured, and determines the predicted density function of data to be predicted according to the density function；According to Data to be predicted described in the predicted density function prediction meet the probability of the preset condition.

Optionally, it is described according to multiple historical time sequence datas corresponding acquisition moment by the multiple history when Between sequence data to be converted to first time sequence data vector set include: corresponding according to multiple historical time sequence datas The historical time sequence data is converted to first time sequence data vector set, the public affairs by following formula by the acquisition moment Formula includes:

y_j=[x_j-m,x_j-m+1,...,x_j]^T

Wherein, x_jIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j moment, x_j-mIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j-m moment, y_jIt is described first Time series data vector set [y_m+1,y_m+2,...,y_t] in any data vector, the value range of j includes m+1 to t.

Optionally, described that target mark is determined according to the mark vector collection and the first time sequence data vector set Knowing vector set includes: that circulation executes mark vector collection update step, until meeting loop termination condition, and will meet circulation eventually Only mark vector collection when condition is determined as the target identification vector set；It includes: to calculate that the mark vector collection, which updates step, The corresponding first distance of each data vector in the first time sequence data vector set, the first distance include the first number It is concentrated at a distance from each mark vector according to vector and the mark vector, first data vector includes the first time sequence Any data vector in column data vector set；Determined from the first distance the corresponding target of first data vector away from From, and determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector, the mesh Subject distance includes the smallest distance of the first distance；The mean vector of the corresponding data vector of the target identification vector is calculated, And using the mean vector as the updated target identification vector；And it is true according to the updated target identification vector The fixed target identification vector set；The loop termination condition includes circulation of the target range in continuous first preset quantity After number, remain unchanged.

Optionally, described determined according to the first time sequence data vector set and the target identification vector set is swashed Vector set living includes: that the corresponding object vector of the data to be predicted is determined from the first time sequence data vector set； Calculate the second distance of each target identification vector in the object vector and the target identification vector set；In the target mark Know in vector set, according to the data vector of determining second preset quantity nearest with the object vector of the second distance, obtains To the activation vector set.

Optionally, the predicted density function that data to be predicted are determined according to the density function includes: according to The comentropy of each data vector in activation vector set described in Density functional calculations；According to the comentropy determine it is described preset it is close Spend function.

Second aspect provides a kind of data prediction meanss, and described device includes: the first acquisition module, multiple for obtaining Historical time sequence data；Data conversion module, for according to multiple historical time sequence datas corresponding acquisition moment The multiple historical time sequence data is converted into first time sequence data vector set；Second obtains module, is used for basis The first time sequence data vector set obtains mark vector collection；Third obtains module, for according to the mark vector collection Target identification vector set is determined with the first time sequence data vector set；First determining module, for according to described first Time series data vector set and the target identification vector set determine activation vector set；Second determining module, for obtaining The density function of each data vector on a preset condition based in the activation vector set, and determined according to the density function to pre- The predicted density function of measured data；Prediction module meets for the data to be predicted according to the predicted density function prediction The probability of the preset condition.

Optionally, the data conversion module was used for according to multiple historical time sequence datas corresponding acquisition moment The historical time sequence data is converted into first time sequence data vector set by following formula, the formula includes:

y_j=[x_j-m,x_j-m+1,...,x_j]^T

Optionally, the third obtains module, mark vector collection update step is executed for recycling, until meeting circulation eventually Only condition, and the mark vector collection when meeting loop termination condition is determined as the target identification vector set；The mark Vector set update step include: calculate in the first time sequence data vector set each data vector corresponding first away from From, the first distance include the first data vector and the mark vector is concentrated at a distance from each mark vector, and described first Data vector includes any data vector in the first time sequence data vector set；Institute is determined from the first distance State the corresponding target range of the first data vector, and determine the corresponding mark vector of the target range be first data to Corresponding target identification vector is measured, which includes the smallest distance of the first distance；Calculate the target identification vector The mean vector of corresponding data vector, and using the mean vector as the updated target identification vector；And according to The updated target identification vector determines the target identification vector set；The loop termination condition include the target away from From after the cycle-index of continuous first preset quantity, remain unchanged.

Optionally, first determining module is used to determine from the first time sequence data vector set described to pre- The corresponding object vector of measured data；Calculate each target identification vector in the object vector and the target identification vector set Second distance；In the target identification vector set, second nearest with the object vector is determined according to the second distance The data vector of preset quantity obtains the activation vector set.

Optionally, second determining module, for each in the activation vector set according to the Density functional calculations The comentropy of data vector；The pre-set density function is determined according to the comentropy.

The third aspect provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is processed The step of disclosure first aspect the method is realized when device executes.

Fourth aspect provides a kind of electronic equipment, comprising: memory is stored thereon with computer program；Processor is used In executing the computer program in the memory, the step of to realize disclosure first aspect the method.

Through the above technical solutions, can be by obtaining multiple historical time sequence datas；When according to multiple history Between the sequence data corresponding acquisition moment the multiple historical time sequence data is converted into first time sequence data vector Collection；Mark vector collection is obtained according to the first time sequence data vector set；According to the mark vector collection and described One time series data vector set determines target identification vector set；According to the first time sequence data vector set and described Target identification vector set determines activation vector set；Obtain each data vector on a preset condition based close in the activation vector set Function is spent, and determines the predicted density function of data to be predicted according to the density function；It is pre- according to the predicted density function The probability that the data to be predicted meet the preset condition is surveyed, in this way, can showing with the density function of data to be predicted Form provides prediction result, meets different preset conditions so as to show data to be predicted to user according to the density function Probability, and then higher reference value is provided for actual business demand.

Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.

Detailed description of the invention

Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:

Fig. 1 is a kind of flow chart of the method for data prediction shown according to an exemplary embodiment；

Fig. 2 is the flow chart of the method for another data prediction shown according to an exemplary embodiment；

Fig. 3 is a kind of block diagram of the device of data prediction shown according to an exemplary embodiment；

Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.

Specific embodiment

It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.

The disclosure provides method, apparatus, storage medium and the electronic equipment of a kind of data prediction, can be multiple by obtaining Historical time sequence data；According to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence Data are converted to first time sequence data vector set；Mark vector collection is obtained according to the first time sequence data vector set； Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set；At the first time according to this Sequence data vector set and the target identification vector set determine activation vector set；Obtain in the activation vector set each data to Density function on a preset condition based is measured, and determines the predicted density function of data to be predicted according to the density function；According to this Predicted density function prediction data to be predicted meet the probability of the preset condition, in this way, can be with the density letter of data to be predicted Several forms that show provide prediction result, meet difference in advance so as to show data to be predicted to user according to the density function If the probability of condition, and then higher reference value is provided for actual business demand.

The specific embodiment of the disclosure is described in detail with reference to the accompanying drawing.

Fig. 1 is a kind of flow chart of data predication method shown according to an exemplary embodiment, as shown in Figure 1, the party Method the following steps are included:

S101 obtains multiple historical time sequence datas.

Wherein, which is that can be used for describing in different moments collected data sequentially in time The case where data change over time, for example, the time series data may include the daily trading volume of banking, stock market The data such as the response time of stock price and application system, when multiple historical time sequence data may include default history Between the time series data of the first preset quantity that acquires in section.

S102, according to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence data Be converted to first time sequence data vector set.

In this step, it can be incited somebody to action according to multiple historical time sequence datas corresponding acquisition moment by following formula The historical time sequence data is converted to first time sequence data vector set, which includes:

y_j=[x_j-m,x_j-m+1,...,x_j]^T

Wherein, multiple historical time sequence datas can be expressed as [x₁,x₂,...,x_t], x_jIndicate multiple history The historical time sequence data acquired in time series data at the j moment, x_j-mIt indicates in multiple historical time sequence datas In the historical time sequence data that the j-m moment acquires, which can be expressed as [y_m+1, y_m+2,...,y_t], y_jFor first time sequence data vector set [y_m+1,y_m+2,...,y_t] in any data vector, and j Value range include m+1 to t,, can be according to different business need in practical application scene in addition, m value can be preset value Seek the size of setting m value.

After executing S102, it can will be acquired after m acquires the moment in multiple historical time sequence datas every A historical time sequence data is transformed to by m before object time sequence data and the object time sequence data Data composition column vector, wherein the object time sequence data be multiple historical time sequence data in m acquisition when Any time sequence data in historical time sequence data acquired after carving.

S103 obtains mark vector collection according to the first time sequence data vector set.

In this step, third preset quantity data can be randomly choosed from the first time sequence data vector set Vector, and by randomly selected third preset quantity data Vector Groups at the mark vector collection.In addition, to avoid over-fitting existing As the third preset quantity can be less than the number of data vector in the first time sequence data vector set.

S104 determines target identification vector set according to the mark vector collection and the first time sequence data vector set.

In this step, it can recycle and execute mark vector collection and update step, until meet loop termination condition, and will be Mark vector collection when meeting loop termination condition is determined as the target identification vector set；The mark vector collection updates step packet It includes: calculating the corresponding first distance of each data vector in the first time sequence data vector set, which includes the One data vector is concentrated at a distance from each mark vector with the mark vector, which includes the first time sequence Any data vector that data vector is concentrated；The corresponding target range of the first data vector is determined from the first distance, and Determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector, which includes The smallest distance of the first distance；Calculate the mean vector of the corresponding data vector of target identification vector, and by the mean value to Amount is used as the updated target identification vector；And the target identification vector is determined according to the updated target identification vector Collection；The loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.

In one possible implementation, determining the target range after the cycle-index of continuous first preset quantity When remaining unchanged, it can determine that the mark vector has been restrained, at this point it is possible to which convergent mark vector is determined as target identification Vector, and then can determine the target identification vector set, time sequence can be carried out according to the target identification vector set so as to subsequent The prediction of column data.

S105 determines activation vector set according to the first time sequence data vector set and the target identification vector set.

In this step, the corresponding target of the data to be predicted can be determined from the first time sequence data vector set Vector；Calculate the second distance of each target identification vector in the object vector and the target identification vector set；In the target mark Know in vector set, according to the data vector of determining second preset quantity nearest with the object vector of the second distance, is somebody's turn to do Activate vector set.

S106 obtains the density function of each data vector on a preset condition based in the activation vector set, and close according to this Degree function determines the predicted density function of data to be predicted.

In view of comentropy can be used as the uncertainty measure of density function, therefore, if certain in the activation vector set The comentropy of a data vector is small, then illustrates that the predicted value based on the data variable is mostly invalid prediction, at this time, it may be necessary to reduce Its weight therefore, in one possible implementation, can be according to the density function meter so as to improve the accuracy of prediction Calculate the comentropy of each data vector in the activation vector set；The pre-set density function is determined according to the comentropy.

S107 meets the probability of the preset condition according to the predicted density function prediction data to be predicted.

It should be noted that after the true value for getting data to be predicted, it can be using the data to be predicted as new Historical time sequence data, and the close of each data vector in activation vector set is updated according to new historical time sequence data Spend function so that the data predication method in the disclosure can automatic adaptation time sequence new rule so that the time The prediction of sequence data is more acurrate, and does not need in advance to learn a large amount of historical datas, improves the suitable of prediction algorithm The property used.

Using the above method, prediction result can be provided in the form of the showing of the density function of data to be predicted, so as to To show the probability that data to be predicted meet different preset conditions to user according to the density function, and then needed for actual business It asks and higher reference value is provided.

Fig. 2 is a kind of flow chart of data predication method shown according to an exemplary embodiment, as shown in Fig. 2, the party Method the following steps are included:

S201 obtains multiple historical time sequence datas.

S202, according to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence data Be converted to first time sequence data vector set.

y_j=[x_j-m,x_j-m+1,...,x_j]^T

It specifically, is [x in multiple historical time sequence data₁,x₂,...,x_t] when, it can be incited somebody to action according to above-mentioned formula It is [y that the historical time sequence data, which is converted to first time sequence data vector set,_m+1,y_m+2,...,y_t], wherein

y_m+1=[x₁,x₂,...,x_m+1]^T

y_m+2=[x₂,x₃,...,x_m+2]^T

......

y_t=[x_t-m,x_t-m+1,...,x_t]^T

Illustratively, it is illustrated so that the time series data is the daily trading volume of banking as an example, for purposes of illustration only, will The nearest 10 days trading volumes of the banking got are expressed as [x₁,x₂,...,x₁₀] (t=10), at this point, x_iIndicate banking i-th The trading volume in (i gets 10 from 1) day, when m value is set as 5, which is [y₆,y₇,y₈, y₉,y₁₀], wherein

y₆=[x₁,x₂,...,x₆]^T

y₇=[x₂,x₃,...,x₇]^T

y₈=[x₃,x₄,...,x₈]^T

y₉=[x₄,x₅,...,x₉]^T

y₁₀=[x₅,x₆,...,x₁₀]^T

Above-mentioned example is merely illustrative, and the disclosure does not limit this.

That is, after executing S202, can by multiple historical time sequence datas after m acquires the moment Each of the acquisition historical time sequence data is transformed to by object time sequence data and the object time sequence data Before m data composition column vector, wherein the object time sequence data be multiple historical time sequence data in M acquires any time sequence data in the historical time sequence data acquired after the moment.

S203 obtains mark vector collection according to the first time sequence data vector set.

In this step, third preset quantity data can be randomly choosed from the first time sequence data vector set Vector, and by the randomly selected third preset quantity data Vector Groups at the mark vector collection, in addition, to avoid over-fitting Phenomenon, the third preset quantity can be less than the number of data vector in the first time sequence data vector set.

It illustratively, is [y in the first time sequence data vector set_m+1,y_m+2,...,y_t] when, K can be randomly selected (as third preset quantity) a y_jAs the mark vector of different data mode, then by randomly selected K y_jComposition one The mark vector collection of K dimension, for example, the mark vector collection can be [y_m+1,y_m+2,...,y_m+k], for example, in the first time sequence Column data vector set is [y₆,y₇,y₈,y₉,y₁₀] when, 3 (K=can be randomly choosed in the first time sequence data vector set 3) a data vector (as y_j) one 3 mark vector collection tieed up of composition, mark vector collection of 3 dimension can be by [y₆,y₇,y₈, y₉,y₁₀] in any three data vectors (for example, [y₆,y₇,y₈]、[y₇,y₈,y₉]、[y₇,y₉,y₁₀] etc.) composition, above-mentioned example It is merely illustrative, the disclosure is not construed as limiting this.

S204 calculates the corresponding first distance of each data vector in the first time sequence data vector set.

Wherein, the first distance may include the first data vector and the mark vector concentrate each mark vector away from From first data vector may include any data vector in the first time sequence data vector set, in a kind of possibility Implementation in, can be concentrated by calculating each data vector and mark vector in the first time sequence data vector set The Euclidean distance of each mark vector obtains the first distance.

Illustratively, with the first time sequence data vector set for [y_m+1,y_m+2,...,y_t], which integrates as K dimension [y_m+1,y_m+2,...,y_m+k] for be illustrated, first data vector be y_m+1When, calculate the first data vector y_m+1With The mark vector collection [y_m+1,y_m+2,...,y_m+k] in each mark vector distance, obtain K and first data vector y_m+1 Corresponding first distance；It is y in first data vector_m+k+1When, calculate the first data vector y_m+k+1With the mark vector collection [y_m+1,y_m+2,...,y_m+k] in each mark vector distance, obtain K and first data vector y_m+k+1Corresponding first away from From first time sequence data vector set [y can be calculated according to similar calculation method in this way_m+1,y_m+2,...,y_t] in it is each Data vector and the mark vector collection [y_m+1,y_m+2,...,y_m+k] in each mark vector distance, obtain the first distance, on It states example to be merely illustrative, the disclosure is not construed as limiting this.

S205 determines the corresponding target range of the first data vector from the first distance, and determines the target range Corresponding mark vector is the corresponding target identification vector of first data vector, which may include the first distance The smallest distance.

It can determine that each data vector in the first time sequence data vector set respectively corresponds after executing S204 The K first distances, at this point it is possible to which the smallest distance in K first distance corresponding with first data vector is true Be set to target range corresponding with first data vector, and by the corresponding mark vector of the target range be determined as with this first The corresponding target identification vector of data vector.

Illustratively, continue with the first time sequence data vector set as [y₆,y₇,y₈,y₉,y₁₀], which is [the y of 3 dimensions₆,y₇,y₈] for be illustrated, at this point, first data vector be y₆,y₇,y₈,y₉,y₁₀In any one data to Amount is y in first data vector₆When, calculate y₆With mark vector collection [y₆,y₇,y₈] in each mark vector distance, at this time It can determine the first data vector y₆Corresponding target identification vector is y₆, similarly, can determine the first data vector y₇It is corresponding Target identification vector is y₇, the first data vector y₈Corresponding target identification vector is y₈, it is y in first data vector₉When, Calculate y₉With mark vector collection [y₆,y₇,y₈] in each mark vector distance, it is assumed that y₉With mark vector y₆Distance it is minimum, y₁₀With mark vector y₇Distance it is minimum, at this point it is possible to determine the first data vector y₉Corresponding target identification vector is y₆, the One data vector y₁₀Corresponding target identification vector is y₇, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.

S206 calculates the mean vector of the corresponding data vector of target identification vector, and using the mean vector as more The target identification vector after new；And the target identification vector set is determined according to the updated target identification vector.

Illustratively, continue with the first time sequence data vector set as [y₆,y₇,y₈,y₉,y₁₀], which is [the y of 3 dimensions₆,y₇,y₈] for be illustrated, after executing S205, can determine target identification vector y₆Corresponding described One time series data vector set [y₆,y₇,y₈,y₉,y₁₀] in data vector be y₆And y₉, target identification vector y₇Corresponding institute State first time sequence data vector set [y₆,y₇,y₈,y₉,y₁₀] in data vector be y₇And y₁₀, target identification vector y₈It is right The first time sequence data vector set [y answered₆,y₇,y₈,y₉,y₁₀] in data vector be y₈, in this way, can be by y₆With y₉The mean vector of two data vectors is as updated target identification vector y₆', it can be by y₇And y₁₀Two data vectors Mean vector is as updated target identification vector y₇' can be by data vector y₈As updated target identification vector y₈’ (at this point, the target identification vector y before updating₈Data vector y as in first time sequence data vector set₈Itself, is not necessarily to Calculate mean value), in this way, can determine that updated target identification vector set is according to updated target identification vectorAbove-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.

S207, determines whether the target range remains unchanged after the cycle-index of continuous first preset quantity.

When determining that the target range remains unchanged after the cycle-index of continuous first preset quantity, S208 is executed；? When determining that the cycle-index does not reach first preset quantity and/or the target range changes, S204 is executed extremely S207。

S208 determines the corresponding object vector of data to be predicted from the first time sequence data vector set.

It illustratively, is [x in multiple historical time sequence data₁,x₂,...,x_t] when, which is x_t+1 (it should be noted that in the disclosure, it can be to the data x to be predicted_t+1The probability of place preset condition is predicted), That is in one possible implementation, multiple historical time sequence data [x can be used₁,x₂,...,x_t] prediction t+1 The data x at moment_t+1, at this time can be from first time sequence data vector set [y_m+1,y_m+2,...,y_t] in get to pre- Measured data x_t+1Corresponding object vector are as follows: y_t=[x_t-m,x_t-m+1,...,x_t]^T, above-mentioned example is merely illustrative, the disclosure pair This is not construed as limiting.

S209 calculates the second distance of each target identification vector in the object vector and the target identification vector set.

It in one possible implementation, can be by calculating the European of the object vector and each target identification vector Distance obtains the second distance.

S210 determines second nearest with the object vector in advance according to the second distance in the target identification vector set If the data vector of quantity, the activation vector set is obtained.

It illustratively, is y with the object vector_predict=[2,2,3,4]^T, which is [y₁,y₂,y₃, y₄], also, y₁=[1,2,3,4]^T, y₂=[2,2,4,4]^T, y₃=[4,4,2,2]^T, y₄=[4,3,2,1]^TFor said It is bright, at this point, object vector y_predictWith the target identification vector set [y₁,y₂,y₃,y₄] in four target identification vectors Two distances are respectively as follows:

dist(y_predict,y₁)=1

dist(y_predict,y₂)=1

dist(y_predict,y₃)=3.16

dist(y_predict,y₄)=3.87

At this point, can determine the target identification vector y in the target identification vector set when second preset quantity is 2₁ And y₂The activation vector set [y can be formed₁,y₂], above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.

S211 obtains the density function of each data vector on a preset condition based in the activation vector set, and close according to this Degree function calculates the comentropy of each data vector in the activation vector set.

In view of comentropy can be used as the uncertainty measure of density function, therefore, if certain in the activation vector set The comentropy of a data vector is small, then illustrates that the predicted value based on the data variable is mostly invalid prediction, at this time, it may be necessary to reduce Its weight, so as to improve the accuracy of prediction.

In one possible implementation, for ease of description, can be indicated with following formula in activation vector set i-th The density function of data vector:

Wherein, f_i(x) density function of i-th of data vector in activation vector set, p are indicated_nFor the statistics time of density function Number is normalized as a result, a+ Δ≤x≤a+2 Δ, a+2 Δ≤x≤a+3 Δ and a+n Δ≤x≤a+ (n+1) Δ respectively indicate Different preset conditions where data to be predicted, a are the preset boundary threshold value of multiple preset conditions, and Δ is preset data change Change amount, n are the number of preset condition.At this point it is possible to be calculated by the following formula in the activation vector set according to the density function The comentropy of each data vector:

Wherein, I (f_i(x)) the density function f of i-th of data vector in activation vector set is indicated_i(x) comentropy, p_jTable Show that data to be predicted are located at the probability of j-th of preset condition.

S212 determines the pre-set density function of data to be predicted according to the comentropy.

In one possible implementation, it can be calculated by the following formula to obtain the number to be predicted according to the comentropy According to predicted density function:

Wherein, f (x) indicates the predicted density function of the data to be predicted, f_i(x) i-th of number in activation vector set is indicated According to the density function of vector, I (f_i(x)) the density function f of i-th of data vector in activation vector set is indicated_i(x) comentropy.

S213 meets the probability of the preset condition according to the predicted density function prediction data to be predicted.

Illustratively, it is illustrated for predicting the daily trading volume of banking, at this point, the data x to be predicted_t+1As Following three preset conditions: trading volume position to be predicted can be set in one possible implementation in trading volume to be predicted In 80,000 or less (as x_t+1< 80000), trading volume to be predicted (as 80000≤x between 80,000 to 100,000_t+1≤ 100000), trading volume to be predicted is located at 100,000 or more (as x_t+1> 100000), at this point it is possible to according to the predicted density letter Number f (x) predicts that the probability that trading volume to be predicted meets above three preset condition is respectively as follows: trading volume to be predicted positioned at 80,000 Probability below is 10%, and probability of the trading volume to be predicted between 80,000 to 100,000 is 75%, trading volume position to be predicted It is 15% in 100,000 or more probability, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.

Fig. 3 is a kind of block diagram of the device of data prediction shown according to an exemplary embodiment, as shown in figure 3, the dress It sets and includes:

First obtains module 301, for obtaining multiple historical time sequence datas；

Data conversion module 302, being used for will be multiple according to multiple historical time sequence datas corresponding acquisition moment Historical time sequence data is converted to first time sequence data vector set；

Second obtains module 303, for obtaining mark vector collection according to the first time sequence data vector set；

Third obtains module 304, for being determined according to the mark vector collection and the first time sequence data vector set Target identification vector set；

First determining module 305, for according to the first time sequence data vector set and the target identification vector set Determine activation vector set；

Second determining module 306, for obtaining the density of each data vector on a preset condition based in the activation vector set Function, and determine according to the density function predicted density function of data to be predicted；

Prediction module 307, for meeting the probability of the preset condition according to the predicted density function prediction data to be predicted.

Optionally, which was used for according to multiple historical time sequence datas corresponding acquisition moment The historical time sequence data is converted into first time sequence data vector set by following formula, which includes:

y_j=[x_j-m,x_j-m+1,...,x_j]^T

Optionally, which obtains module 304, mark vector collection update step is executed for recycling, until meeting circulation Termination condition, and the mark vector collection when meeting loop termination condition is determined as the target identification vector set；The mark to It includes: to calculate the corresponding first distance of each data vector in the first time sequence data vector set that quantity set, which updates step, should First distance includes that the first data vector is concentrated at a distance from each mark vector with the mark vector, which includes Any data vector in the first time sequence data vector set；Determine that first data vector is corresponding from the first distance Target range, and determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector； The mean vector of the corresponding data vector of target identification vector is calculated, and using the mean vector as the updated target mark Know vector；And the target identification vector set is determined according to the updated target identification vector；The loop termination condition includes should Target range remains unchanged after the cycle-index of continuous first preset quantity.

Optionally, which is used to determine that this is to be predicted from the first time sequence data vector set The corresponding object vector of data；Calculate second of each target identification vector in the object vector and the target identification vector set away from From；In the target identification vector set, according to the number of determining second preset quantity nearest with the object vector of the second distance According to vector, the activation vector set is obtained.

Optionally, second determining module 306, for according to each data in the Density functional calculations activation vector set The comentropy of vector；The pre-set density function is determined according to the comentropy.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Using above-mentioned apparatus, prediction result can be provided in the form of the showing of the density function of data to be predicted, so as to To show the probability that data to be predicted meet different preset conditions to user according to the density function, and then needed for actual business It asks and higher reference value is provided.

Fig. 4 is the block diagram of a kind of electronic equipment 400 shown according to an exemplary embodiment.As shown in figure 4, the electronics is set Standby 400 may include: processor 401, memory 402.The electronic equipment 400 can also include multimedia component 403, input/ Export one or more of (I/O) interface 404 and communication component 405.

Wherein, processor 401 is used to control the integrated operation of the electronic equipment 400, to complete above-mentioned data prediction side All or part of the steps in method.Memory 402 is for storing various types of data to support the behaviour in the electronic equipment 400 To make, these data for example may include the instruction of any application or method for operating on the electronic equipment 400, with And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 402 It can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random-access is deposited Reservoir (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 403 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 402 is sent by communication component 405.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 404 provides interface between processor 401 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 405 is for the electronic equipment 400 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 405 may include: Wi-Fi module, bluetooth module, NFC module.

In one exemplary embodiment, electronic equipment 400 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing above-mentioned data predication method.

In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of above-mentioned data predication method is realized when program instruction is executed by processor.For example, the computer readable storage medium It can be the above-mentioned memory 402 including program instruction, above procedure instruction can be executed by the processor 401 of electronic equipment 400 To complete above-mentioned data predication method.

The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.

It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, can be combined in any appropriate way, in order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.

In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims

1. a kind of method of data prediction, which is characterized in that the described method includes:

Obtain multiple historical time sequence datas；

The multiple historical time sequence data is converted according to multiple historical time sequence datas corresponding acquisition moment For first time sequence data vector set；

Mark vector collection is obtained according to the first time sequence data vector set；

Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set；

Activation vector set is determined according to the first time sequence data vector set and the target identification vector set；

The density function of each data vector on a preset condition based in the activation vector set is obtained, and according to the density function Determine the predicted density function of data to be predicted；

The data to be predicted according to the predicted density function prediction meet the probability of the preset condition.

2. the method according to claim 1, wherein described according to the mark vector collection and the first time Sequence data vector set determines that target identification vector set includes:

Circulation executes mark vector collection and updates step, until meeting loop termination condition, and will be when meeting loop termination condition Mark vector collection be determined as the target identification vector set；

It includes: to calculate each data vector pair in the first time sequence data vector set that the mark vector collection, which updates step, The first distance answered, the first distance includes the first data vector and the mark vector concentrate each mark vector away from From first data vector includes any data vector in the first time sequence data vector set；

The corresponding target range of first data vector is determined from the first distance, and determines that the target range is corresponding Mark vector be the corresponding target identification vector of first data vector, which includes that the first distance is the smallest Distance；

The mean vector of the corresponding data vector of the target identification vector is calculated, and using the mean vector as updated The target identification vector；And the target identification vector set is determined according to the updated target identification vector；

The loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.

3. the method according to claim 1, wherein it is described according to the first time sequence data vector set with And the target identification vector set determines that activation vector set includes:

The corresponding object vector of the data to be predicted is determined from the first time sequence data vector set；

Calculate the second distance of each target identification vector in the object vector and the target identification vector set；

In the target identification vector set, according to determining second present count nearest with the object vector of the second distance The data vector of amount obtains the activation vector set.

4. method according to any one of claims 1 to 3, which is characterized in that it is described according to the density function determine to The predicted density function of prediction data includes:

The comentropy of each data vector in the activation vector set according to the Density functional calculations；

The pre-set density function is determined according to the comentropy.

5. a kind of data prediction meanss, which is characterized in that described device includes:

First obtains module, for obtaining multiple historical time sequence datas；

Data conversion module, for according to multiple historical time sequence datas corresponding acquisition moment by the multiple history Time series data is converted to first time sequence data vector set；

Second obtains module, for obtaining mark vector collection according to the first time sequence data vector set；

Third obtains module, for determining target mark according to the mark vector collection and the first time sequence data vector set Know vector set；

First determining module, for being determined according to the first time sequence data vector set and the target identification vector set Activate vector set；

Second determining module, for obtaining the density function of each data vector on a preset condition based in the activation vector set, And the predicted density function of data to be predicted is determined according to the density function；

Prediction module meets the general of the preset condition for the data to be predicted according to the predicted density function prediction Rate.

6. device according to claim 5, which is characterized in that the third obtains module, for recycle execute mark to Quantity set updates step, until meeting loop termination condition, and the mark vector collection when meeting loop termination condition is determined as The target identification vector set；It includes: to calculate the first time sequence data vector set that the mark vector collection, which updates step, In the corresponding first distance of each data vector, the first distance includes the first data vector and the mark vector concentrate it is every The distance of a mark vector, first data vector include any data in the first time sequence data vector set to Amount；The corresponding target range of first data vector is determined from the first distance, and determines that the target range is corresponding Mark vector be the corresponding target identification vector of first data vector, which includes that the first distance is the smallest Distance；Calculate the mean vector of the corresponding data vector of the target identification vector, and using the mean vector as updating after The target identification vector；And the target identification vector set is determined according to the updated target identification vector；It is described Loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.

7. device according to claim 5, which is characterized in that first determining module is used for from the first time sequence The corresponding object vector of the data to be predicted is determined in column data vector set；Calculate the object vector and the target identification The second distance of each target identification vector in vector set；It is true according to the second distance in the target identification vector set The data vector of fixed second preset quantity nearest with the object vector, obtains the activation vector set.

8. according to the described in any item devices of claim 5 to 7, which is characterized in that second determining module, for according to institute State the comentropy of each data vector in activation vector set described in Density functional calculations；It is determined according to the comentropy described default Density function.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-4 the method is realized when row.

10. a kind of electronic equipment characterized by comprising

Memory is stored thereon with computer program；

Processor, for executing the computer program in the memory, to realize described in any one of claim 1-4 The step of method.