CN109754115A - Method, apparatus, storage medium and the electronic equipment of data prediction - Google Patents

Method, apparatus, storage medium and the electronic equipment of data prediction Download PDF

Info

Publication number
CN109754115A
CN109754115A CN201811475791.8A CN201811475791A CN109754115A CN 109754115 A CN109754115 A CN 109754115A CN 201811475791 A CN201811475791 A CN 201811475791A CN 109754115 A CN109754115 A CN 109754115A
Authority
CN
China
Prior art keywords
vector
data
time sequence
vector set
target identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811475791.8A
Other languages
Chinese (zh)
Other versions
CN109754115B (en
Inventor
孙木鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811475791.8A priority Critical patent/CN109754115B/en
Publication of CN109754115A publication Critical patent/CN109754115A/en
Application granted granted Critical
Publication of CN109754115B publication Critical patent/CN109754115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This disclosure relates to a kind of method, apparatus, storage medium and the electronic equipment of data prediction, it can be by obtaining multiple historical time sequence datas;Multiple historical time sequence data is converted into first time sequence data vector set according to multiple historical time sequence datas corresponding acquisition moment;Mark vector collection is obtained according to the first time sequence data vector set;Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set;Activation vector set is determined according to the first time sequence data vector set and the target identification vector set;The density function of each data vector on a preset condition based in the activation vector set is obtained, and determines the predicted density function of data to be predicted according to the density function;Meet the probability of the preset condition according to the predicted density function prediction data to be predicted.

Description

Method, apparatus, storage medium and the electronic equipment of data prediction
Technical field
This disclosure relates to which data predict field, and in particular, to the method, apparatus of data prediction a kind of, storage medium and Electronic equipment.
Background technique
Time series forecasting technology is based on ordered data associated with time sequencing, thus it is speculated that the development trend of data with Solving practical problems are instructed, nowadays, the prediction of time series data all plays extremely important work in different industries With for example, banking is used to predict the situation of change of daily trading volume;Exchange is used to predict that the stock price of stock market to become Law;Detect the CPU of application system, memory, the future trend etc. of the key indexes such as http response time.
But with the fast development of computer software technology, data volume scale is increasing, and time series data Complexity it is higher and higher so that the regularity of data variation is also increasingly difficult to excavate, traditional time series data prediction Method finds the changing rule of data by carrying out Mathematical Fitting to data, but for the pre- of discrete time series data It surveys, the precision of prediction of Classical forecast algorithm is lower, and conventional time series data prediction algorithm only shows shape with predicted value Formula provides prediction result and is not able to satisfy actual business demand.
Summary of the invention
Purpose of this disclosure is to provide method, apparatus, storage medium and the electronic equipments of a kind of prediction of data.
In a first aspect, providing a kind of method of data prediction, which comprises obtain multiple historical time sequence numbers According to;The multiple historical time sequence data is converted to according to multiple historical time sequence datas corresponding acquisition moment First time sequence data vector set;Mark vector collection is obtained according to the first time sequence data vector set;According to described Mark vector collection and the first time sequence data vector set determine target identification vector set;According to the first time sequence Data vector collection and the target identification vector set determine activation vector set;Obtain in the activation vector set each data to Density function on a preset condition based is measured, and determines the predicted density function of data to be predicted according to the density function;According to Data to be predicted described in the predicted density function prediction meet the probability of the preset condition.
Optionally, it is described according to multiple historical time sequence datas corresponding acquisition moment by the multiple history when Between sequence data to be converted to first time sequence data vector set include: corresponding according to multiple historical time sequence datas The historical time sequence data is converted to first time sequence data vector set, the public affairs by following formula by the acquisition moment Formula includes:
yj=[xj-m,xj-m+1,...,xj]T
Wherein, xjIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j moment, xj-mIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j-m moment, yjIt is described first Time series data vector set [ym+1,ym+2,...,yt] in any data vector, the value range of j includes m+1 to t.
Optionally, described that target mark is determined according to the mark vector collection and the first time sequence data vector set Knowing vector set includes: that circulation executes mark vector collection update step, until meeting loop termination condition, and will meet circulation eventually Only mark vector collection when condition is determined as the target identification vector set;It includes: to calculate that the mark vector collection, which updates step, The corresponding first distance of each data vector in the first time sequence data vector set, the first distance include the first number It is concentrated at a distance from each mark vector according to vector and the mark vector, first data vector includes the first time sequence Any data vector in column data vector set;Determined from the first distance the corresponding target of first data vector away from From, and determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector, the mesh Subject distance includes the smallest distance of the first distance;The mean vector of the corresponding data vector of the target identification vector is calculated, And using the mean vector as the updated target identification vector;And it is true according to the updated target identification vector The fixed target identification vector set;The loop termination condition includes circulation of the target range in continuous first preset quantity After number, remain unchanged.
Optionally, described determined according to the first time sequence data vector set and the target identification vector set is swashed Vector set living includes: that the corresponding object vector of the data to be predicted is determined from the first time sequence data vector set; Calculate the second distance of each target identification vector in the object vector and the target identification vector set;In the target mark Know in vector set, according to the data vector of determining second preset quantity nearest with the object vector of the second distance, obtains To the activation vector set.
Optionally, the predicted density function that data to be predicted are determined according to the density function includes: according to The comentropy of each data vector in activation vector set described in Density functional calculations;According to the comentropy determine it is described preset it is close Spend function.
Second aspect provides a kind of data prediction meanss, and described device includes: the first acquisition module, multiple for obtaining Historical time sequence data;Data conversion module, for according to multiple historical time sequence datas corresponding acquisition moment The multiple historical time sequence data is converted into first time sequence data vector set;Second obtains module, is used for basis The first time sequence data vector set obtains mark vector collection;Third obtains module, for according to the mark vector collection Target identification vector set is determined with the first time sequence data vector set;First determining module, for according to described first Time series data vector set and the target identification vector set determine activation vector set;Second determining module, for obtaining The density function of each data vector on a preset condition based in the activation vector set, and determined according to the density function to pre- The predicted density function of measured data;Prediction module meets for the data to be predicted according to the predicted density function prediction The probability of the preset condition.
Optionally, the data conversion module was used for according to multiple historical time sequence datas corresponding acquisition moment The historical time sequence data is converted into first time sequence data vector set by following formula, the formula includes:
yj=[xj-m,xj-m+1,...,xj]T
Wherein, xjIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j moment, xj-mIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j-m moment, yjIt is described first Time series data vector set [ym+1,ym+2,...,yt] in any data vector, the value range of j includes m+1 to t.
Optionally, the third obtains module, mark vector collection update step is executed for recycling, until meeting circulation eventually Only condition, and the mark vector collection when meeting loop termination condition is determined as the target identification vector set;The mark Vector set update step include: calculate in the first time sequence data vector set each data vector corresponding first away from From, the first distance include the first data vector and the mark vector is concentrated at a distance from each mark vector, and described first Data vector includes any data vector in the first time sequence data vector set;Institute is determined from the first distance State the corresponding target range of the first data vector, and determine the corresponding mark vector of the target range be first data to Corresponding target identification vector is measured, which includes the smallest distance of the first distance;Calculate the target identification vector The mean vector of corresponding data vector, and using the mean vector as the updated target identification vector;And according to The updated target identification vector determines the target identification vector set;The loop termination condition include the target away from From after the cycle-index of continuous first preset quantity, remain unchanged.
Optionally, first determining module is used to determine from the first time sequence data vector set described to pre- The corresponding object vector of measured data;Calculate each target identification vector in the object vector and the target identification vector set Second distance;In the target identification vector set, second nearest with the object vector is determined according to the second distance The data vector of preset quantity obtains the activation vector set.
Optionally, second determining module, for each in the activation vector set according to the Density functional calculations The comentropy of data vector;The pre-set density function is determined according to the comentropy.
The third aspect provides a kind of computer readable storage medium, is stored thereon with computer program, and the program is processed The step of disclosure first aspect the method is realized when device executes.
Fourth aspect provides a kind of electronic equipment, comprising: memory is stored thereon with computer program;Processor is used In executing the computer program in the memory, the step of to realize disclosure first aspect the method.
Through the above technical solutions, can be by obtaining multiple historical time sequence datas;When according to multiple history Between the sequence data corresponding acquisition moment the multiple historical time sequence data is converted into first time sequence data vector Collection;Mark vector collection is obtained according to the first time sequence data vector set;According to the mark vector collection and described One time series data vector set determines target identification vector set;According to the first time sequence data vector set and described Target identification vector set determines activation vector set;Obtain each data vector on a preset condition based close in the activation vector set Function is spent, and determines the predicted density function of data to be predicted according to the density function;It is pre- according to the predicted density function The probability that the data to be predicted meet the preset condition is surveyed, in this way, can showing with the density function of data to be predicted Form provides prediction result, meets different preset conditions so as to show data to be predicted to user according to the density function Probability, and then higher reference value is provided for actual business demand.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the method for data prediction shown according to an exemplary embodiment;
Fig. 2 is the flow chart of the method for another data prediction shown according to an exemplary embodiment;
Fig. 3 is a kind of block diagram of the device of data prediction shown according to an exemplary embodiment;
Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
The disclosure provides method, apparatus, storage medium and the electronic equipment of a kind of data prediction, can be multiple by obtaining Historical time sequence data;According to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence Data are converted to first time sequence data vector set;Mark vector collection is obtained according to the first time sequence data vector set; Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set;At the first time according to this Sequence data vector set and the target identification vector set determine activation vector set;Obtain in the activation vector set each data to Density function on a preset condition based is measured, and determines the predicted density function of data to be predicted according to the density function;According to this Predicted density function prediction data to be predicted meet the probability of the preset condition, in this way, can be with the density letter of data to be predicted Several forms that show provide prediction result, meet difference in advance so as to show data to be predicted to user according to the density function If the probability of condition, and then higher reference value is provided for actual business demand.
The specific embodiment of the disclosure is described in detail with reference to the accompanying drawing.
Fig. 1 is a kind of flow chart of data predication method shown according to an exemplary embodiment, as shown in Figure 1, the party Method the following steps are included:
S101 obtains multiple historical time sequence datas.
Wherein, which is that can be used for describing in different moments collected data sequentially in time The case where data change over time, for example, the time series data may include the daily trading volume of banking, stock market The data such as the response time of stock price and application system, when multiple historical time sequence data may include default history Between the time series data of the first preset quantity that acquires in section.
S102, according to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence data Be converted to first time sequence data vector set.
In this step, it can be incited somebody to action according to multiple historical time sequence datas corresponding acquisition moment by following formula The historical time sequence data is converted to first time sequence data vector set, which includes:
yj=[xj-m,xj-m+1,...,xj]T
Wherein, multiple historical time sequence datas can be expressed as [x1,x2,...,xt], xjIndicate multiple history The historical time sequence data acquired in time series data at the j moment, xj-mIt indicates in multiple historical time sequence datas In the historical time sequence data that the j-m moment acquires, which can be expressed as [ym+1, ym+2,...,yt], yjFor first time sequence data vector set [ym+1,ym+2,...,yt] in any data vector, and j Value range include m+1 to t,, can be according to different business need in practical application scene in addition, m value can be preset value Seek the size of setting m value.
After executing S102, it can will be acquired after m acquires the moment in multiple historical time sequence datas every A historical time sequence data is transformed to by m before object time sequence data and the object time sequence data Data composition column vector, wherein the object time sequence data be multiple historical time sequence data in m acquisition when Any time sequence data in historical time sequence data acquired after carving.
S103 obtains mark vector collection according to the first time sequence data vector set.
In this step, third preset quantity data can be randomly choosed from the first time sequence data vector set Vector, and by randomly selected third preset quantity data Vector Groups at the mark vector collection.In addition, to avoid over-fitting existing As the third preset quantity can be less than the number of data vector in the first time sequence data vector set.
S104 determines target identification vector set according to the mark vector collection and the first time sequence data vector set.
In this step, it can recycle and execute mark vector collection and update step, until meet loop termination condition, and will be Mark vector collection when meeting loop termination condition is determined as the target identification vector set;The mark vector collection updates step packet It includes: calculating the corresponding first distance of each data vector in the first time sequence data vector set, which includes the One data vector is concentrated at a distance from each mark vector with the mark vector, which includes the first time sequence Any data vector that data vector is concentrated;The corresponding target range of the first data vector is determined from the first distance, and Determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector, which includes The smallest distance of the first distance;Calculate the mean vector of the corresponding data vector of target identification vector, and by the mean value to Amount is used as the updated target identification vector;And the target identification vector is determined according to the updated target identification vector Collection;The loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.
In one possible implementation, determining the target range after the cycle-index of continuous first preset quantity When remaining unchanged, it can determine that the mark vector has been restrained, at this point it is possible to which convergent mark vector is determined as target identification Vector, and then can determine the target identification vector set, time sequence can be carried out according to the target identification vector set so as to subsequent The prediction of column data.
S105 determines activation vector set according to the first time sequence data vector set and the target identification vector set.
In this step, the corresponding target of the data to be predicted can be determined from the first time sequence data vector set Vector;Calculate the second distance of each target identification vector in the object vector and the target identification vector set;In the target mark Know in vector set, according to the data vector of determining second preset quantity nearest with the object vector of the second distance, is somebody's turn to do Activate vector set.
S106 obtains the density function of each data vector on a preset condition based in the activation vector set, and close according to this Degree function determines the predicted density function of data to be predicted.
In view of comentropy can be used as the uncertainty measure of density function, therefore, if certain in the activation vector set The comentropy of a data vector is small, then illustrates that the predicted value based on the data variable is mostly invalid prediction, at this time, it may be necessary to reduce Its weight therefore, in one possible implementation, can be according to the density function meter so as to improve the accuracy of prediction Calculate the comentropy of each data vector in the activation vector set;The pre-set density function is determined according to the comentropy.
S107 meets the probability of the preset condition according to the predicted density function prediction data to be predicted.
It should be noted that after the true value for getting data to be predicted, it can be using the data to be predicted as new Historical time sequence data, and the close of each data vector in activation vector set is updated according to new historical time sequence data Spend function so that the data predication method in the disclosure can automatic adaptation time sequence new rule so that the time The prediction of sequence data is more acurrate, and does not need in advance to learn a large amount of historical datas, improves the suitable of prediction algorithm The property used.
Using the above method, prediction result can be provided in the form of the showing of the density function of data to be predicted, so as to To show the probability that data to be predicted meet different preset conditions to user according to the density function, and then needed for actual business It asks and higher reference value is provided.
Fig. 2 is a kind of flow chart of data predication method shown according to an exemplary embodiment, as shown in Fig. 2, the party Method the following steps are included:
S201 obtains multiple historical time sequence datas.
Wherein, which is that can be used for describing in different moments collected data sequentially in time The case where data change over time, for example, the time series data may include the daily trading volume of banking, stock market The data such as the response time of stock price and application system, when multiple historical time sequence data may include default history Between the time series data of the first preset quantity that acquires in section.
S202, according to multiple historical time sequence datas corresponding acquisition moment by multiple historical time sequence data Be converted to first time sequence data vector set.
In this step, it can be incited somebody to action according to multiple historical time sequence datas corresponding acquisition moment by following formula The historical time sequence data is converted to first time sequence data vector set, which includes:
yj=[xj-m,xj-m+1,...,xj]T
Wherein, multiple historical time sequence datas can be expressed as [x1,x2,...,xt], xjIndicate multiple history The historical time sequence data acquired in time series data at the j moment, xj-mIt indicates in multiple historical time sequence datas In the historical time sequence data that the j-m moment acquires, which can be expressed as [ym+1, ym+2,...,yt], yjFor first time sequence data vector set [ym+1,ym+2,...,yt] in any data vector, and j Value range include m+1 to t,, can be according to different business need in practical application scene in addition, m value can be preset value Seek the size of setting m value.
It specifically, is [x in multiple historical time sequence data1,x2,...,xt] when, it can be incited somebody to action according to above-mentioned formula It is [y that the historical time sequence data, which is converted to first time sequence data vector set,m+1,ym+2,...,yt], wherein
ym+1=[x1,x2,...,xm+1]T
ym+2=[x2,x3,...,xm+2]T
......
yt=[xt-m,xt-m+1,...,xt]T
Illustratively, it is illustrated so that the time series data is the daily trading volume of banking as an example, for purposes of illustration only, will The nearest 10 days trading volumes of the banking got are expressed as [x1,x2,...,x10] (t=10), at this point, xiIndicate banking i-th The trading volume in (i gets 10 from 1) day, when m value is set as 5, which is [y6,y7,y8, y9,y10], wherein
y6=[x1,x2,...,x6]T
y7=[x2,x3,...,x7]T
y8=[x3,x4,...,x8]T
y9=[x4,x5,...,x9]T
y10=[x5,x6,...,x10]T
Above-mentioned example is merely illustrative, and the disclosure does not limit this.
That is, after executing S202, can by multiple historical time sequence datas after m acquires the moment Each of the acquisition historical time sequence data is transformed to by object time sequence data and the object time sequence data Before m data composition column vector, wherein the object time sequence data be multiple historical time sequence data in M acquires any time sequence data in the historical time sequence data acquired after the moment.
S203 obtains mark vector collection according to the first time sequence data vector set.
In this step, third preset quantity data can be randomly choosed from the first time sequence data vector set Vector, and by the randomly selected third preset quantity data Vector Groups at the mark vector collection, in addition, to avoid over-fitting Phenomenon, the third preset quantity can be less than the number of data vector in the first time sequence data vector set.
It illustratively, is [y in the first time sequence data vector setm+1,ym+2,...,yt] when, K can be randomly selected (as third preset quantity) a yjAs the mark vector of different data mode, then by randomly selected K yjComposition one The mark vector collection of K dimension, for example, the mark vector collection can be [ym+1,ym+2,...,ym+k], for example, in the first time sequence Column data vector set is [y6,y7,y8,y9,y10] when, 3 (K=can be randomly choosed in the first time sequence data vector set 3) a data vector (as yj) one 3 mark vector collection tieed up of composition, mark vector collection of 3 dimension can be by [y6,y7,y8, y9,y10] in any three data vectors (for example, [y6,y7,y8]、[y7,y8,y9]、[y7,y9,y10] etc.) composition, above-mentioned example It is merely illustrative, the disclosure is not construed as limiting this.
S204 calculates the corresponding first distance of each data vector in the first time sequence data vector set.
Wherein, the first distance may include the first data vector and the mark vector concentrate each mark vector away from From first data vector may include any data vector in the first time sequence data vector set, in a kind of possibility Implementation in, can be concentrated by calculating each data vector and mark vector in the first time sequence data vector set The Euclidean distance of each mark vector obtains the first distance.
Illustratively, with the first time sequence data vector set for [ym+1,ym+2,...,yt], which integrates as K dimension [ym+1,ym+2,...,ym+k] for be illustrated, first data vector be ym+1When, calculate the first data vector ym+1With The mark vector collection [ym+1,ym+2,...,ym+k] in each mark vector distance, obtain K and first data vector ym+1 Corresponding first distance;It is y in first data vectorm+k+1When, calculate the first data vector ym+k+1With the mark vector collection [ym+1,ym+2,...,ym+k] in each mark vector distance, obtain K and first data vector ym+k+1Corresponding first away from From first time sequence data vector set [y can be calculated according to similar calculation method in this waym+1,ym+2,...,yt] in it is each Data vector and the mark vector collection [ym+1,ym+2,...,ym+k] in each mark vector distance, obtain the first distance, on It states example to be merely illustrative, the disclosure is not construed as limiting this.
S205 determines the corresponding target range of the first data vector from the first distance, and determines the target range Corresponding mark vector is the corresponding target identification vector of first data vector, which may include the first distance The smallest distance.
It can determine that each data vector in the first time sequence data vector set respectively corresponds after executing S204 The K first distances, at this point it is possible to which the smallest distance in K first distance corresponding with first data vector is true Be set to target range corresponding with first data vector, and by the corresponding mark vector of the target range be determined as with this first The corresponding target identification vector of data vector.
Illustratively, continue with the first time sequence data vector set as [y6,y7,y8,y9,y10], which is [the y of 3 dimensions6,y7,y8] for be illustrated, at this point, first data vector be y6,y7,y8,y9,y10In any one data to Amount is y in first data vector6When, calculate y6With mark vector collection [y6,y7,y8] in each mark vector distance, at this time It can determine the first data vector y6Corresponding target identification vector is y6, similarly, can determine the first data vector y7It is corresponding Target identification vector is y7, the first data vector y8Corresponding target identification vector is y8, it is y in first data vector9When, Calculate y9With mark vector collection [y6,y7,y8] in each mark vector distance, it is assumed that y9With mark vector y6Distance it is minimum, y10With mark vector y7Distance it is minimum, at this point it is possible to determine the first data vector y9Corresponding target identification vector is y6, the One data vector y10Corresponding target identification vector is y7, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S206 calculates the mean vector of the corresponding data vector of target identification vector, and using the mean vector as more The target identification vector after new;And the target identification vector set is determined according to the updated target identification vector.
Illustratively, continue with the first time sequence data vector set as [y6,y7,y8,y9,y10], which is [the y of 3 dimensions6,y7,y8] for be illustrated, after executing S205, can determine target identification vector y6Corresponding described One time series data vector set [y6,y7,y8,y9,y10] in data vector be y6And y9, target identification vector y7Corresponding institute State first time sequence data vector set [y6,y7,y8,y9,y10] in data vector be y7And y10, target identification vector y8It is right The first time sequence data vector set [y answered6,y7,y8,y9,y10] in data vector be y8, in this way, can be by y6With y9The mean vector of two data vectors is as updated target identification vector y6', it can be by y7And y10Two data vectors Mean vector is as updated target identification vector y7' can be by data vector y8As updated target identification vector y8’ (at this point, the target identification vector y before updating8Data vector y as in first time sequence data vector set8Itself, is not necessarily to Calculate mean value), in this way, can determine that updated target identification vector set is according to updated target identification vectorAbove-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S207, determines whether the target range remains unchanged after the cycle-index of continuous first preset quantity.
In one possible implementation, determining the target range after the cycle-index of continuous first preset quantity When remaining unchanged, it can determine that the mark vector has been restrained, at this point it is possible to which convergent mark vector is determined as target identification Vector, and then can determine the target identification vector set, time sequence can be carried out according to the target identification vector set so as to subsequent The prediction of column data.
When determining that the target range remains unchanged after the cycle-index of continuous first preset quantity, S208 is executed;? When determining that the cycle-index does not reach first preset quantity and/or the target range changes, S204 is executed extremely S207。
S208 determines the corresponding object vector of data to be predicted from the first time sequence data vector set.
It illustratively, is [x in multiple historical time sequence data1,x2,...,xt] when, which is xt+1 (it should be noted that in the disclosure, it can be to the data x to be predictedt+1The probability of place preset condition is predicted), That is in one possible implementation, multiple historical time sequence data [x can be used1,x2,...,xt] prediction t+1 The data x at momentt+1, at this time can be from first time sequence data vector set [ym+1,ym+2,...,yt] in get to pre- Measured data xt+1Corresponding object vector are as follows: yt=[xt-m,xt-m+1,...,xt]T, above-mentioned example is merely illustrative, the disclosure pair This is not construed as limiting.
S209 calculates the second distance of each target identification vector in the object vector and the target identification vector set.
It in one possible implementation, can be by calculating the European of the object vector and each target identification vector Distance obtains the second distance.
S210 determines second nearest with the object vector in advance according to the second distance in the target identification vector set If the data vector of quantity, the activation vector set is obtained.
It illustratively, is y with the object vectorpredict=[2,2,3,4]T, which is [y1,y2,y3, y4], also, y1=[1,2,3,4]T, y2=[2,2,4,4]T, y3=[4,4,2,2]T, y4=[4,3,2,1]TFor said It is bright, at this point, object vector ypredictWith the target identification vector set [y1,y2,y3,y4] in four target identification vectors Two distances are respectively as follows:
dist(ypredict,y1)=1
dist(ypredict,y2)=1
dist(ypredict,y3)=3.16
dist(ypredict,y4)=3.87
At this point, can determine the target identification vector y in the target identification vector set when second preset quantity is 21 And y2The activation vector set [y can be formed1,y2], above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S211 obtains the density function of each data vector on a preset condition based in the activation vector set, and close according to this Degree function calculates the comentropy of each data vector in the activation vector set.
In view of comentropy can be used as the uncertainty measure of density function, therefore, if certain in the activation vector set The comentropy of a data vector is small, then illustrates that the predicted value based on the data variable is mostly invalid prediction, at this time, it may be necessary to reduce Its weight, so as to improve the accuracy of prediction.
In one possible implementation, for ease of description, can be indicated with following formula in activation vector set i-th The density function of data vector:
Wherein, fi(x) density function of i-th of data vector in activation vector set, p are indicatednFor the statistics time of density function Number is normalized as a result, a+ Δ≤x≤a+2 Δ, a+2 Δ≤x≤a+3 Δ and a+n Δ≤x≤a+ (n+1) Δ respectively indicate Different preset conditions where data to be predicted, a are the preset boundary threshold value of multiple preset conditions, and Δ is preset data change Change amount, n are the number of preset condition.At this point it is possible to be calculated by the following formula in the activation vector set according to the density function The comentropy of each data vector:
Wherein, I (fi(x)) the density function f of i-th of data vector in activation vector set is indicatedi(x) comentropy, pjTable Show that data to be predicted are located at the probability of j-th of preset condition.
S212 determines the pre-set density function of data to be predicted according to the comentropy.
In one possible implementation, it can be calculated by the following formula to obtain the number to be predicted according to the comentropy According to predicted density function:
Wherein, f (x) indicates the predicted density function of the data to be predicted, fi(x) i-th of number in activation vector set is indicated According to the density function of vector, I (fi(x)) the density function f of i-th of data vector in activation vector set is indicatedi(x) comentropy.
S213 meets the probability of the preset condition according to the predicted density function prediction data to be predicted.
Illustratively, it is illustrated for predicting the daily trading volume of banking, at this point, the data x to be predictedt+1As Following three preset conditions: trading volume position to be predicted can be set in one possible implementation in trading volume to be predicted In 80,000 or less (as xt+1< 80000), trading volume to be predicted (as 80000≤x between 80,000 to 100,000t+1≤ 100000), trading volume to be predicted is located at 100,000 or more (as xt+1> 100000), at this point it is possible to according to the predicted density letter Number f (x) predicts that the probability that trading volume to be predicted meets above three preset condition is respectively as follows: trading volume to be predicted positioned at 80,000 Probability below is 10%, and probability of the trading volume to be predicted between 80,000 to 100,000 is 75%, trading volume position to be predicted It is 15% in 100,000 or more probability, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
It should be noted that after the true value for getting data to be predicted, it can be using the data to be predicted as new Historical time sequence data, and the close of each data vector in activation vector set is updated according to new historical time sequence data Spend function so that the data predication method in the disclosure can automatic adaptation time sequence new rule so that the time The prediction of sequence data is more acurrate, and does not need in advance to learn a large amount of historical datas, improves the suitable of prediction algorithm The property used.
Using the above method, prediction result can be provided in the form of the showing of the density function of data to be predicted, so as to To show the probability that data to be predicted meet different preset conditions to user according to the density function, and then needed for actual business It asks and higher reference value is provided.
Fig. 3 is a kind of block diagram of the device of data prediction shown according to an exemplary embodiment, as shown in figure 3, the dress It sets and includes:
First obtains module 301, for obtaining multiple historical time sequence datas;
Data conversion module 302, being used for will be multiple according to multiple historical time sequence datas corresponding acquisition moment Historical time sequence data is converted to first time sequence data vector set;
Second obtains module 303, for obtaining mark vector collection according to the first time sequence data vector set;
Third obtains module 304, for being determined according to the mark vector collection and the first time sequence data vector set Target identification vector set;
First determining module 305, for according to the first time sequence data vector set and the target identification vector set Determine activation vector set;
Second determining module 306, for obtaining the density of each data vector on a preset condition based in the activation vector set Function, and determine according to the density function predicted density function of data to be predicted;
Prediction module 307, for meeting the probability of the preset condition according to the predicted density function prediction data to be predicted.
Optionally, which was used for according to multiple historical time sequence datas corresponding acquisition moment The historical time sequence data is converted into first time sequence data vector set by following formula, which includes:
yj=[xj-m,xj-m+1,...,xj]T
Wherein, xjIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j moment, xj-mIndicate the historical time sequence data acquired in multiple historical time sequence datas at the j-m moment, yjIt is described first Time series data vector set [ym+1,ym+2,...,yt] in any data vector, the value range of j includes m+1 to t.
Optionally, which obtains module 304, mark vector collection update step is executed for recycling, until meeting circulation Termination condition, and the mark vector collection when meeting loop termination condition is determined as the target identification vector set;The mark to It includes: to calculate the corresponding first distance of each data vector in the first time sequence data vector set that quantity set, which updates step, should First distance includes that the first data vector is concentrated at a distance from each mark vector with the mark vector, which includes Any data vector in the first time sequence data vector set;Determine that first data vector is corresponding from the first distance Target range, and determine that the corresponding mark vector of the target range is the corresponding target identification vector of first data vector; The mean vector of the corresponding data vector of target identification vector is calculated, and using the mean vector as the updated target mark Know vector;And the target identification vector set is determined according to the updated target identification vector;The loop termination condition includes should Target range remains unchanged after the cycle-index of continuous first preset quantity.
Optionally, which is used to determine that this is to be predicted from the first time sequence data vector set The corresponding object vector of data;Calculate second of each target identification vector in the object vector and the target identification vector set away from From;In the target identification vector set, according to the number of determining second preset quantity nearest with the object vector of the second distance According to vector, the activation vector set is obtained.
Optionally, second determining module 306, for according to each data in the Density functional calculations activation vector set The comentropy of vector;The pre-set density function is determined according to the comentropy.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Using above-mentioned apparatus, prediction result can be provided in the form of the showing of the density function of data to be predicted, so as to To show the probability that data to be predicted meet different preset conditions to user according to the density function, and then needed for actual business It asks and higher reference value is provided.
Fig. 4 is the block diagram of a kind of electronic equipment 400 shown according to an exemplary embodiment.As shown in figure 4, the electronics is set Standby 400 may include: processor 401, memory 402.The electronic equipment 400 can also include multimedia component 403, input/ Export one or more of (I/O) interface 404 and communication component 405.
Wherein, processor 401 is used to control the integrated operation of the electronic equipment 400, to complete above-mentioned data prediction side All or part of the steps in method.Memory 402 is for storing various types of data to support the behaviour in the electronic equipment 400 To make, these data for example may include the instruction of any application or method for operating on the electronic equipment 400, with And the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..The memory 402 It can be realized by any kind of volatibility or non-volatile memory device or their combination, such as static random-access is deposited Reservoir (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 403 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 402 is sent by communication component 405.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 404 provides interface between processor 401 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 405 is for the electronic equipment 400 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 400 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing above-mentioned data predication method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of above-mentioned data predication method is realized when program instruction is executed by processor.For example, the computer readable storage medium It can be the above-mentioned memory 402 including program instruction, above procedure instruction can be executed by the processor 401 of electronic equipment 400 To complete above-mentioned data predication method.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, can be combined in any appropriate way, in order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of method of data prediction, which is characterized in that the described method includes:
Obtain multiple historical time sequence datas;
The multiple historical time sequence data is converted according to multiple historical time sequence datas corresponding acquisition moment For first time sequence data vector set;
Mark vector collection is obtained according to the first time sequence data vector set;
Target identification vector set is determined according to the mark vector collection and the first time sequence data vector set;
Activation vector set is determined according to the first time sequence data vector set and the target identification vector set;
The density function of each data vector on a preset condition based in the activation vector set is obtained, and according to the density function Determine the predicted density function of data to be predicted;
The data to be predicted according to the predicted density function prediction meet the probability of the preset condition.
2. the method according to claim 1, wherein described according to the mark vector collection and the first time Sequence data vector set determines that target identification vector set includes:
Circulation executes mark vector collection and updates step, until meeting loop termination condition, and will be when meeting loop termination condition Mark vector collection be determined as the target identification vector set;
It includes: to calculate each data vector pair in the first time sequence data vector set that the mark vector collection, which updates step, The first distance answered, the first distance includes the first data vector and the mark vector concentrate each mark vector away from From first data vector includes any data vector in the first time sequence data vector set;
The corresponding target range of first data vector is determined from the first distance, and determines that the target range is corresponding Mark vector be the corresponding target identification vector of first data vector, which includes that the first distance is the smallest Distance;
The mean vector of the corresponding data vector of the target identification vector is calculated, and using the mean vector as updated The target identification vector;And the target identification vector set is determined according to the updated target identification vector;
The loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.
3. the method according to claim 1, wherein it is described according to the first time sequence data vector set with And the target identification vector set determines that activation vector set includes:
The corresponding object vector of the data to be predicted is determined from the first time sequence data vector set;
Calculate the second distance of each target identification vector in the object vector and the target identification vector set;
In the target identification vector set, according to determining second present count nearest with the object vector of the second distance The data vector of amount obtains the activation vector set.
4. method according to any one of claims 1 to 3, which is characterized in that it is described according to the density function determine to The predicted density function of prediction data includes:
The comentropy of each data vector in the activation vector set according to the Density functional calculations;
The pre-set density function is determined according to the comentropy.
5. a kind of data prediction meanss, which is characterized in that described device includes:
First obtains module, for obtaining multiple historical time sequence datas;
Data conversion module, for according to multiple historical time sequence datas corresponding acquisition moment by the multiple history Time series data is converted to first time sequence data vector set;
Second obtains module, for obtaining mark vector collection according to the first time sequence data vector set;
Third obtains module, for determining target mark according to the mark vector collection and the first time sequence data vector set Know vector set;
First determining module, for being determined according to the first time sequence data vector set and the target identification vector set Activate vector set;
Second determining module, for obtaining the density function of each data vector on a preset condition based in the activation vector set, And the predicted density function of data to be predicted is determined according to the density function;
Prediction module meets the general of the preset condition for the data to be predicted according to the predicted density function prediction Rate.
6. device according to claim 5, which is characterized in that the third obtains module, for recycle execute mark to Quantity set updates step, until meeting loop termination condition, and the mark vector collection when meeting loop termination condition is determined as The target identification vector set;It includes: to calculate the first time sequence data vector set that the mark vector collection, which updates step, In the corresponding first distance of each data vector, the first distance includes the first data vector and the mark vector concentrate it is every The distance of a mark vector, first data vector include any data in the first time sequence data vector set to Amount;The corresponding target range of first data vector is determined from the first distance, and determines that the target range is corresponding Mark vector be the corresponding target identification vector of first data vector, which includes that the first distance is the smallest Distance;Calculate the mean vector of the corresponding data vector of the target identification vector, and using the mean vector as updating after The target identification vector;And the target identification vector set is determined according to the updated target identification vector;It is described Loop termination condition includes the target range after the cycle-index of continuous first preset quantity, is remained unchanged.
7. device according to claim 5, which is characterized in that first determining module is used for from the first time sequence The corresponding object vector of the data to be predicted is determined in column data vector set;Calculate the object vector and the target identification The second distance of each target identification vector in vector set;It is true according to the second distance in the target identification vector set The data vector of fixed second preset quantity nearest with the object vector, obtains the activation vector set.
8. according to the described in any item devices of claim 5 to 7, which is characterized in that second determining module, for according to institute State the comentropy of each data vector in activation vector set described in Density functional calculations;It is determined according to the comentropy described default Density function.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-4 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-4 The step of method.
CN201811475791.8A 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment Active CN109754115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811475791.8A CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811475791.8A CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109754115A true CN109754115A (en) 2019-05-14
CN109754115B CN109754115B (en) 2021-03-26

Family

ID=66403636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811475791.8A Active CN109754115B (en) 2018-12-04 2018-12-04 Data prediction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109754115B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291824A (en) * 2020-02-24 2020-06-16 网易(杭州)网络有限公司 Time sequence processing method and device, electronic equipment and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034615A1 (en) * 2014-08-01 2016-02-04 Tata Consultancy Services Limited System and method for forecasting a time series data
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence
CN107180278A (en) * 2017-05-27 2017-09-19 重庆大学 A kind of real-time passenger flow forecasting of track traffic
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034615A1 (en) * 2014-08-01 2016-02-04 Tata Consultancy Services Limited System and method for forecasting a time series data
CN107092582A (en) * 2017-03-31 2017-08-25 江苏方天电力技术有限公司 One kind is based on the posterior exceptional value on-line checking of residual error and method for evaluating confidence
CN107180278A (en) * 2017-05-27 2017-09-19 重庆大学 A kind of real-time passenger flow forecasting of track traffic
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王婷: ""Pair_Copula自回归模型及其在股票指数中的应用"", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291824A (en) * 2020-02-24 2020-06-16 网易(杭州)网络有限公司 Time sequence processing method and device, electronic equipment and computer readable medium
CN111291824B (en) * 2020-02-24 2024-03-22 网易(杭州)网络有限公司 Time series processing method, device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN109754115B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN108197327B (en) Song recommendation method, device and storage medium
Chorin et al. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics
WO2019120019A1 (en) User gender prediction method and apparatus, storage medium and electronic device
JP6091981B2 (en) Menstruation scheduled date calculation device and program
JP5940581B2 (en) Power consumption prediction apparatus, method, and non-transitory computer-readable storage medium
CN108317996B (en) Floor determining method, related equipment and system
CN113240936B (en) Parking area recommendation method and device, electronic equipment and medium
CN102741840B (en) For the method and apparatus to individual scene modeling
JP6521835B2 (en) Movement path prediction device, movement path prediction method, and movement path prediction program
CN105528403B (en) Target data identification method and device
CN110858062B (en) Target optimization parameter obtaining method and model training method and device
CN109239807A (en) Rainfall appraisal procedure and system and terminal
JP6543215B2 (en) Destination prediction apparatus, destination prediction method, and destination prediction program
CN112764513A (en) Prompting method and electronic equipment
CN108764283A (en) A kind of the loss value-acquiring method and device of disaggregated model
CN109615171A (en) Characteristic threshold value determines that method and device, problem objects determine method and device
CN109754115A (en) Method, apparatus, storage medium and the electronic equipment of data prediction
CN108363947A (en) Delay demographic method for early warning based on big data and device
JP6433877B2 (en) Destination prediction apparatus, destination prediction method, and destination prediction program
CN109658187A (en) Recommend method, apparatus, storage medium and the electronic equipment of cloud service provider
CN114795000B (en) Control method and control device of cleaning equipment, electronic equipment and storage medium
CN116721724A (en) Alloy performance prediction method and device, storage medium, electronic equipment and chip
CN109461231B (en) Door lock control method and device, control equipment and readable storage medium
CN110268366B (en) Information processing device, information processing method, and program
JP6107944B2 (en) Portable information processing apparatus, information processing system, and information processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant