CN110348608A - A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm - Google Patents
A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm Download PDFInfo
- Publication number
- CN110348608A CN110348608A CN201910527461.7A CN201910527461A CN110348608A CN 110348608 A CN110348608 A CN 110348608A CN 201910527461 A CN201910527461 A CN 201910527461A CN 110348608 A CN110348608 A CN 110348608A
- Authority
- CN
- China
- Prior art keywords
- time series
- fusion model
- lstm
- fuzzy clustering
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Technology Law (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
Abstract
A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, fuzzy clustering is carried out to the stock certificate data collection after standardization, time series is then converted to, obtains time series collection S and fuzzy division time series collection T, to obtain Fusion Model, Fusion Model exports predicted value;After obtaining predicted value, calculate the global error of Fusion Model, give the global error manual delivery of Fusion Model to each LSTM network output layer, LSTM network is automatically by fractional error back transfer to input layer later, all weights in cell unit are updated, trained Fusion Model is obtained;Test set in time series collection S is passed to trained Fusion Model, exports final predicted value.The present invention, for the uncertainty of different clusters, can effectively reduce the error between the prediction and true value of LSTM network by each sample data in fitting reality, so that prediction result and actual conditions are more nearly and accuracy improves.
Description
Technical field
The present invention relates to Prediction of Stock Price, in particular to a kind of prediction side that LSTM is improved based on fuzzy clustering algorithm
Method.
Background technique
Prediction of Stock Price is according to stock market, including share price historical transactional information and the relevant market information of stock
Predictive behavior of the development to future stock price ups and downs.Since stock market is in the significance of business and financial field, stock
Ticket price expectation all attracts extensive concern all the time.
Due to the fluctuation and uncertainty of stock itself, Stock Price Fluctuation is a highly complex nonlinear system
System, the adjustment of stock are promoted not in accordance with uniform time course, and with the progradation of itself, stock price is carried out
The field that many researchers pay close attention to is also become when prediction is to preferably be selected stocks and be selected to obtain maximum return.When
Preceding more common Prediction of Stock Index method includes conventional machines learning method and financial time series analysis, such as supporting vector
A variety of methods such as machine, decision tree, regression analysis, arma modeling.This kind of algorithm interpretation is strong, but artificial parameter is excessive, excessively quasi-
Close and the problems such as poor fitting always exist among such algorithm, and such algorithm due to itself structure it is simple, often one
Model can only cannot analyze the correlation between more stock, can not analyze in market and exist for single stock modeling
Message face driving factors bring influence.Although the deep learning method risen in recent years wants weak in the interpretation of modeling
Learn in conventional machines, but the structure for itself imitating human brain neuron allows it to complete the analysis of various complexity with flying colors
Task, Conjoint Analysis and text and the fusion decision of stock information etc. in this just comprising multiply ticket.Pass through deep learning
Model come predict shares changing tendency itself have two big advantages: first is that in model artificial parameter often do not directly affect prediction effect
Fruit reduces influence of the human factor to prediction result;Second is that deep learning usually will be due to traditional machine on generalization ability
Device study, confirmation has been obtained in these in some matches.The application being fruitful for deep learning on finance data
Research is also less, how to select effective strategy for largely stock is analyzed and obtained in stock certificate data a large amount of in market
To every stock prediction result and guarantee that precision controlling risk needs further to be studied.
Stock market is in current development, and the type of industry is more clear, and index of correlation is also perfected.In this way
Basis under, by apply cluster algorithm, can to the industry tendency and future development emphasis in stock market, carry out
More accurately analyze.Clustering is a kind of method for effectively rationally sorting out data, it can send out from mass data
Now lie in data structure therein.
The characteristics of clustering algorithm is all rigid for the division of data at present, hard plot is stroke that each sample is stringent
It assigns in a class, however true data acquisition system has each class the subjection degree of ambiguity, it is therefore desirable to Yi Zhongju
Class algorithm is more in line with the cluster situation of truthful data, to realize the prediction of stock price.
Summary of the invention
It is an object of the invention to solve the problems, such as the price expectation in stock market, provide a kind of based on fuzzy clustering algorithm
The prediction technique of LSTM is improved, which introduces fuzzy clustering algorithm, carries out fuzzy clustering to the data serialized and obtains
Summation is weighted to the LSTM network output by fusion to subordinated-degree matrix, and using subordinated-degree matrix, finally obtains stock
Admission fee lattice predicted value, can fluctuating characteristic and scene in effective simulated stock trend so that prediction result it is more accurate and
Meet reality.
In order to achieve the above object, the invention adopts the following technical scheme:
A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, which comprises the following steps:
(1) stock certificate data is standardized, obtains standardization stock certificate data collection;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm;
(3) the stock certificate data collection after fuzzy clustering is converted, obtains time series collection S and fuzzy division time sequence
Column collection T;
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model output is pre-
Measured value;
(5) global error of Fusion Model is calculated, by Fusion Model according to objective function after obtaining predicted value every time
Global error manual delivery give each LSTM network output layer, then LSTM network automatically by fractional error back transfer extremely
Input layer obtains trained Fusion Model to automatically update all weights in cell unit;
(6) test set in time series collection S is passed in step (5) trained Fusion Model, output is most
Whole predicted value.
A further improvement of the present invention lies in that detailed process is as follows for step (1):
1) the mathematic expectaion μ of each characteristic series of stock certificate data collection is found outkWith standard deviation sk;
2) it is standardized: zi=(xi-μk)/sk, in which: ziFor the variate-value after standardization, xiFor real variable
Value.
A further improvement of the present invention lies in that detailed process is as follows for step (2):
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
In formula, uijCorrespond to the degree of membership of i-th of classification for j-th of sample;
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
Wherein, ciFor the cluster centre of i-th of classification;xjFor real variable value;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined1Or it is opposite
It is less than some threshold epsilon in the knots modification of last time target function value2, then stop;Otherwise step 4) is carried out;
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample;U is
Subordinated-degree matrix, c1For the cluster centre of first classification, ccFor the cluster centre of c-th of classification,mFor a degree of membership factor;
4) new subordinated-degree matrix U, and return step 2 are calculated with formula (2.4));
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
A further improvement of the present invention lies in that detailed process is as follows for step (3): being drawn with the time interval that time step is 5
Divide standardization stock certificate data collection, it is assumed that if current time is t, (xt,xt+1,xt+2,xt+3,xt+4) be one be with moment t
The time series at moment when starting;It is the time series of initial time for (x using the t+1 momentt+1,xt+2,xt+3,xt+4,xt+5), with
This analogizes, and will standardize stock certificate data collection to be processed into the time series collection S for being 5 with time interval;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut,
ut+1,ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series;Using the t+1 moment as the time of initial time
Sequence is (ut+1,ut+2,ut+3,ut+4,ut+5), and so on, converting subordinated-degree matrix U to time interval is that 5 fuzzy is drawn
Sequence sets T between timesharing.
A further improvement of the present invention lies in that the detailed process of step (4) are as follows: the training set in time series collection S is defeated
Enter to each LSTM network, by the multi-level mapping of cell unit to output layer, uses fuzzy division time series collection T as often
Weight and corresponding output valve weighting work and, are obtained Fusion Model by the weight of a LSTM network output.
A further improvement of the present invention lies in that in step (4), it is assumed that fuzzy clustering has n cluster, the degree of membership of the i-th sample
For (ui1,ui2,...,uin), n LSTM network, output is (ai1,ai2,...,ain), then the output of Fusion Model are as follows:
A further improvement of the present invention lies in that the formula of objective function are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
A further improvement of the present invention lies in that in step (5), the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve, i=1,2 ..., m;
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership, i=1,2 ..., m and k=1,2 ..., n.
Compared with prior art, the invention has the following advantages:
The present invention improves the prediction technique of LSTM by introducing based on fuzzy clustering, and the present invention is using FCM algorithm to data
Fuzzy clustering is carried out, compared to " the hard cluster " being divided into a class that each sample is stringent is played, fuzzy cluster analysis will gather
Each cluster that class generates regards fuzzy set as, determines clustering relationships by degree of membership, is a kind of flexible division, obtains sample
Belong to the degree of uncertainty of each class, so that cluster result more accurate and flexible, FCM algorithm reference fuzzy set concept is improved
The robustness of model prediction;The weight that the present invention uses sample degree of membership to export as multiple LSTM network models, will be more with this
A LSTM Model Fusion obtains final prediction network model, the multiple models of Model Fusion set, so that prediction result more may be used
Letter, strengthens the prediction effect of model.Fuzzy clustering determines that each sample data is under the jurisdiction of each class by membership function
Degree, rather than a data object is referred to rigidly in certain cluster.Based on fuzzy clustering algorithm to LSTM (Long
Short-Term Memory) network improves, and each sample data, can for the uncertainty of different clusters in fitting reality
The error between the prediction and true value of LSTM network is effectively reduced, so that prediction result and actual conditions are more nearly and accurately
Degree improves.
Detailed description of the invention
Fig. 1 is FCM algorithm flow chart;
Fig. 2 is LSTM structure chart;
Fig. 3 is the improved model figure of the LSTM based on fuzzy clustering algorithm;
Fig. 4 is prediction technique flow chart.
Specific embodiment
In order to which the contents of the present invention, effect and advantage is more clearly understood, with reference to the accompanying drawings and examples to this
Invention is described in detail.
The present invention is LSTM (Long Short-Term Memory) improved prediction technique based on fuzzy clustering, is passed through
Stock certificate data is standardized and converts time series for standardized data (timestep of time series is
5 days, be used as a time series within every five days), can effective acceleration model training effectiveness, and eliminate model for data volume
The hypersensitivity of grade, converts time series for data and to joined temporal correlation in training pattern, be more in line with stock
Admission fee lattice this characteristic relevant to the previous data of history;By introducing fuzzy set and concept, all sample datas are used into FCM
Algorithm carries out fuzzy clustering, obtains cluster by-product-subordinated-degree matrix, the meter of the final predicted value of degree of membership in the present invention
Played significant role in calculation, while having utilized fuzzy clustering, eliminate a sample that tradition clusters firmly can only hardness belong to one
The drawbacks of a classification, fuzzy clustering make sample have certain subjection degree for each classification, make its cluster result more
Add the cluster situation for meeting real world;It, can effectively will be multiple by using the subordinated-degree matrix that is obtained by fuzzy clustering
The LSTM network integration reinforces prediction effect to together;The present invention keeps degree of membership weight constant during updating weight, will
Overall error gives the output of each LSTM according to degree of membership weight proportion back transfer, and each LSTM is allowed to obtain accurate portion
Point error and the weight in its internal cell is automatically updated to front transfer, finally minimize objective function, reduce model
Global error so that prediction result is more nearly true value, and has better generalization ability to unknown sample.
It is of the invention that detailed process is as follows:
(1) data normalization
Since the time interval of data collection is larger, the passage of stock price at any time is changed in magnitude, if
Mass data value is not the same magnitude, and very big influence can be caused to the precision of model prediction, and in addition LSTM is to input number
According to scale it is very sensitive, especially with sigmoid or tanh activation function when, data normalization is one and is done well
Method.It converts function are as follows: x*=(x- μ)/σ, wherein μ is the mean value of all sample datas, and σ is the standard of all sample datas
Difference.
The step of stock certificate data is standardized, standardized data collection is obtained is as follows:
1) the mathematic expectaion μ of stock certificate data collection (TAIEX) each characteristic series is found outkWith standard deviation sk;
2) it is standardized: zi=(xi-μk)/sk, in which: ziFor the variate-value after standardization, xiFor real variable
Value;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm
Cluster is so that the sample time similarity for being divided into same cluster is maximum, and the similarity between different clusters is minimum,
Fuzzy clustering establishes uncertain description of the sample to classification, can more reflect objective world.Referring to Fig. 1, FCM algorithm is by n
Sample xi(i=1,2 ..., n) is divided into four ambiguity groups, and seeks every group of cluster centre, so that the target of non-similarity index
Function reaches minimum.Constraint condition is the sum of the degree of membership that a sample belongs to all classes, and is 1;
In formula, uijThe degree of membership for corresponding to i-th of classification for j-th of sample, between 0,1.
Objective function J (U, the c of FCM algorithm1,...,cc) are as follows:
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample, and
It is a Weighted Index;U is subordinated-degree matrix, c1For the cluster centre of first classification, ccIn cluster for c-th of classification
The heart,mFor a degree of membership factor, generally 2;
Formula 2.2, which is multiplied at a distance from current sample to each class center by the degree of membership of all current samples, to be formed.
Wherein, ciIt is a Weighted Index for the cluster centre of i-th of classification;xjFor real variable value;
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
The step of fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm are as follows:
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined1(ε1Generally
0.01) or it relative to last time target function value knots modification be less than some threshold epsilon2(ε2It generally takes 0.001), then stops;It is no
Then (i.e. if target function value is greater than some threshold epsilon determined1Or it is greater than relative to the knots modification of last time target function value
Some threshold epsilon2) carry out step 4);
4) new subordinated-degree matrix U, return step 2 are calculated with formula (2.4)).
C cluster centre point one fuzzy partition matrix U of vector sum is obtained by above-mentioned steps, what this matrix indicated is
Each sample point belongs to the degree of membership of each class, also referred to as subordinated-degree matrix U.
(3) time series is converted by the stock certificate data collection after standardization, when obtaining time series collection S and fuzzy division
Between sequence sets T;
Criteria for classifying stock certificate data collection is divided with the time interval that timestep (time step) is 5, it is assumed that if current
Moment is t, then (xt,xt+1,xt+2,xt+3,xt+4) be one with moment t be starting when the moment time series (xtFor normalized number
According to a sample record of concentration);It is the time series of initial time for (x using the t+1 momentt+1,xt+2,xt+3,xt+4,xt+5), with
This analogizes, and will standardize stock certificate data collection to be processed into the time series collection S for being 5 with time interval, each sequence is LSTM's
Primary input;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut,
ut+1,ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series (utFor the galley proof in subordinated-degree matrix U
This record);It is the time series of initial time for (u using the t+1 momentt+1,ut+2,ut+3,ut+4,ut+5), and so on, it will be subordinate to
The fuzzy division time series collection T that it is 5 with time interval that degree matrix U, which is converted into,.
Then time series collection S is divided into data set and test set two parts using k folding cross validation method, uses k
Time series collection S is divided into k disjoint subsets by folding cross validation method, it is assumed that the time series number in time series collection S
Amount is p, then each subset has p/k sequence, corresponding disjoint subset is { s1,s2,...,sk};K is natural number.
K-1 subset { S is selected from training subset every time1,S2,...,Sj-1,Sj+1,...,SkUnion, that is, only stay every time
Next subset Sj, so as to carry out k training and test.
K rolls over cross validation method method particularly includes: by calling sklearn.model_selection.KFold to roll over by k
Intersection marks off training set and test set.
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model output is pre-
Measured value;
Referring to fig. 2, LSTM (Long Short-Term Memory) is shot and long term memory network, is a kind of time circulation mind
Through network, it is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.LSTM is different from RNN's
Place is that it in the algorithm and joined " processor " judged whether information is useful, this processor interactive construction quilt
Referred to as cell.
Referring to Fig. 3, the internal structure of LSTM, it has evaded the problem of gradient explosion and gradient disappear in standard RNN, so
Its pace of learning is faster.The internal structure of LSTM is specific as follows:
1) forget door: the first step is what information is determination data can lose from cell state in input LSTM, this is determined
Surely become by one and forget gate layer and complete, this layer is by the forgetting of the header length in cell state, this layer of formula are as follows:
ft=σ (Wf·[ht-1,xt]+bf), wherein WfAnd bfStudy, h are trained as parametert-1Indicate be
The output of a upper cell, xtWhat is indicated is the input of current cell, and σ is sigmoid function, ftFor the output valve for forgeing door;
2) input gate: the layer is divided into two parts, and first part is to generate the new information to be updated, including two small minds
Through network layer:
it=σ (Wi·[ht-1,xt]+bi)
Wherein WiAnd biStudy, h are trained as parametert-1That indicate is the output of a upper cell, xtTable
What is shown is the input of current cell, and σ is sigmoid function, and tanh is tanh function, itWithFor two small minds in input gate
Output valve through network;
Update cell state:
Wherein ftFor the output valve for forgeing door, Ct-1For the cell state of a upper cell, itFor sigmoid layers in input gate
Output valve,For tanh layers in input gate of output valve;
3) out gate: the layer is based on cell state, what value of output determined, is come firstly, cell runs one sigmoid layers
Determine that output is gone out in that part of cell state.
Ot=σ (Wo[ht-1,xt]+bo)
Wherein WoAnd boStudy, h are trained as parametert-1That indicate is the output of a upper cell, xtTable
That show is the input of current cell, OtFor the sigmoid layer output valve of out gate
Then, by cell state handled by tanh and (obtain a value between -1 to 1) and by it and
Sigmoid output is multiplied, and final cell can only export the part for determining output.
ht=Ot*tanh(Ct)
Wherein, CtFor the cell state of cell, OtIt is exported for the sigmoid layer of out gate, htFor the output valve of out gate.
The present invention creates multiple LSTM networks (its number is identical as fuzzy clustering number), referring to fig. 4, by time series collection S
In training set be input to each LSTM network, by the multi-level mapping of cell unit to output layer, use the degree of membership of sample
That is these weights are weighted with corresponding output valve and are made by the weight that fuzzy division time series collection T is exported as each LSTM network
With obtain Fusion Model, Fusion Model exports final model predication value.Detailed process is as follows: assuming that fuzzy clustering has n
Cluster, the degree of membership of the i-th sample are (ui1,ui2,...,uin), n LSTM network, output is (ai1,ai2,...,ain), then melt
The output of molding type are as follows:
(5) objective function
Since the problem belongs to regression problem, so objective function uses mesh of the conventional mean square error as Fusion Model
Scalar functions, the formula of objective function are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
(6) weight is updated
Referring to fig. 4, LSTM network can automatic reverse propagated error, so only needing obtaining basis after predicted value every time
The objective function of step (5) calculates the global error of Fusion Model, by the global error manual delivery of Fusion Model to each
LSTM network output layer, LSTM network can be automatically by fractional error back transfer to input layer, to automatically update later
All weights among cell unit obtain trained Fusion Model.Detailed process is as follows: according to formula (3.1) it is found that
Fusion Model prediction is weighted to obtain by degree of membership and each LSTM network output valve, so output of the global error by each LSTM network
It is worth each LSTM network of weight accounting back transfer.
Wherein, the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve;Formula (3.2) is the global error of Fusion Model.
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership;Formula (3.3) be global error in degree of membership ratio to front transfer
Fractional error.
By the fractional error back transfer in formula (3.3) give each LSTM network input layer, later each LSTM network according to
Incoming fractional error automatically updates the weight in LSTM network cell unit.
(7) it predicts:
Test set in time series set S is passed in step (6) trained Fusion Model, Fusion Model
Output is final predicted value.
The present invention devises a kind of LSTM based on fuzzy clustering and improves prediction technique, with promoted prediction credibility and
Correctness.The present invention determines clustering relationships by degree of membership, obtains sample and belong to each class by introducing fuzzy clustering algorithm
Uncertainty degree so that cluster result more accurate and flexible, FCM algorithm introduces fuzzy set concept and improves prediction technique
Robustness;The weight that the present invention uses sample degree of membership to export as multiple networks LSTM model, with this by multiple LSTM models
Fusion obtains final prediction network model, the multiple models of Model Fusion set, so that prediction result is more credible, fitting
Fluctuation in share certificate actual scene, and strengthen forecast result of model.
Claims (8)
1. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, which comprises the following steps:
(1) stock certificate data is standardized, obtains standardization stock certificate data collection;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm;
(3) the stock certificate data collection after fuzzy clustering is converted, obtains time series collection S and fuzzy division time series collection
T;
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model exports predicted value;
(5) global error of Fusion Model is calculated, by the total of Fusion Model according to objective function after obtaining predicted value every time
Body error manual delivery gives each LSTM network output layer, and then LSTM network is automatically by fractional error back transfer to input
Layer, to automatically update all weights in cell unit, obtains trained Fusion Model;
(6) test set in time series collection S is passed in step (5) trained Fusion Model, output is final pre-
Measured value.
2. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step
Suddenly (one) detailed process is as follows:
1) the mathematic expectaion μ of each characteristic series of stock certificate data collection is found outkWith standard deviation sk;
2) it is standardized: zi=(xi-μk)/sk, in which: ziFor the variate-value after standardization, xiFor real variable value.
3. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step
Suddenly (two) detailed process is as follows:
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
In formula, uijCorrespond to the degree of membership of i-th of classification for j-th of sample;
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
Wherein, ciFor the cluster centre of i-th of classification;xjFor real variable value;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined1Or it is relative to upper
The knots modification of secondary target function value is less than some threshold epsilon2, then stop;Otherwise step 4) is carried out;
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample;U is degree of membership
Matrix, c1For the cluster centre of first classification, ccFor the cluster centre of c-th of classification,mFor a degree of membership factor;
4) new subordinated-degree matrix U, and return step 2 are calculated with formula (2.4));
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
4. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step
Suddenly (three) detailed process is as follows: with time step be 5 time interval criteria for classifying stock certificate data collection, it is assumed that if when current
Carving is t, then (xt,xt+1,xt+2,xt+3,xt+4) be one with moment t be starting when the moment time series;It is with the t+1 moment
The time series at moment beginning is (xt+1,xt+2,xt+3,xt+4,xt+5), and so on, will standardization stock certificate data collection be processed into when
Between between be divided into 5 time series collection S;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut,ut+1,
ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series;Using the t+1 moment as the time series of initial time
For (ut+1,ut+2,ut+3,ut+4,ut+5), and so on, by subordinated-degree matrix U be converted into time interval be 5 fuzzy division when
Between sequence sets T.
5. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step
Suddenly the detailed process of (four) are as follows: the training set in time series collection S is input to each LSTM network, by the more of cell unit
Layer map to output layer, the weight for using fuzzy division time series collection T to export as each LSTM network, by weight with it is corresponding
Output valve weighting is made and obtains Fusion Model.
6. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 5, which is characterized in that step
Suddenly in (four), it is assumed that fuzzy clustering has n cluster, and the degree of membership of the i-th sample is (ui1,ui2,...,uin), n LSTM network,
Output is (ai1,ai2,...,ain), then the output of Fusion Model are as follows:
7. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that mesh
The formula of scalar functions are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
8. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 6, which is characterized in that step
Suddenly in (five), the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve, i=1,2 ..., m;
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership, i=1,2 ..., m and k=1,2 ..., n.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910527461.7A CN110348608A (en) | 2019-06-18 | 2019-06-18 | A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910527461.7A CN110348608A (en) | 2019-06-18 | 2019-06-18 | A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110348608A true CN110348608A (en) | 2019-10-18 |
Family
ID=68182172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910527461.7A Pending CN110348608A (en) | 2019-06-18 | 2019-06-18 | A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110348608A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111652444A (en) * | 2020-06-05 | 2020-09-11 | 南京机电职业技术学院 | K-means and LSTM-based daily passenger volume prediction method |
CN111881502A (en) * | 2020-07-27 | 2020-11-03 | 中铁二院工程集团有限责任公司 | Bridge state discrimination method based on fuzzy clustering analysis |
CN112149990A (en) * | 2020-09-18 | 2020-12-29 | 南京邮电大学 | Fuzzy supply and demand matching method based on prediction |
CN112307410A (en) * | 2020-09-18 | 2021-02-02 | 天津大学 | Seawater temperature and salinity information time sequence prediction method based on shipborne CTD measurement data |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
CN113223392A (en) * | 2021-05-18 | 2021-08-06 | 信阳农林学院 | Hybrid integration model for PM2.5 hour concentration prediction |
CN113343077A (en) * | 2021-04-30 | 2021-09-03 | 南京大学 | Personalized recommendation method and system integrating user interest time sequence fluctuation |
CN116723136A (en) * | 2023-08-09 | 2023-09-08 | 南京华飞数据技术有限公司 | Network data detection method applying FCM clustering algorithm |
-
2019
- 2019-06-18 CN CN201910527461.7A patent/CN110348608A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111652444A (en) * | 2020-06-05 | 2020-09-11 | 南京机电职业技术学院 | K-means and LSTM-based daily passenger volume prediction method |
CN111652444B (en) * | 2020-06-05 | 2023-07-21 | 南京机电职业技术学院 | K-means and LSTM-based daily guest volume prediction method |
CN111881502A (en) * | 2020-07-27 | 2020-11-03 | 中铁二院工程集团有限责任公司 | Bridge state discrimination method based on fuzzy clustering analysis |
CN112149990A (en) * | 2020-09-18 | 2020-12-29 | 南京邮电大学 | Fuzzy supply and demand matching method based on prediction |
CN112307410A (en) * | 2020-09-18 | 2021-02-02 | 天津大学 | Seawater temperature and salinity information time sequence prediction method based on shipborne CTD measurement data |
CN112149990B (en) * | 2020-09-18 | 2022-07-26 | 南京邮电大学 | Fuzzy supply and demand matching method based on prediction |
CN113159109A (en) * | 2021-03-04 | 2021-07-23 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
CN113159109B (en) * | 2021-03-04 | 2024-03-08 | 北京邮电大学 | Wireless network flow prediction method based on data driving |
CN113343077A (en) * | 2021-04-30 | 2021-09-03 | 南京大学 | Personalized recommendation method and system integrating user interest time sequence fluctuation |
CN113223392A (en) * | 2021-05-18 | 2021-08-06 | 信阳农林学院 | Hybrid integration model for PM2.5 hour concentration prediction |
CN116723136A (en) * | 2023-08-09 | 2023-09-08 | 南京华飞数据技术有限公司 | Network data detection method applying FCM clustering algorithm |
CN116723136B (en) * | 2023-08-09 | 2023-11-03 | 南京华飞数据技术有限公司 | Network data detection method applying FCM clustering algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110348608A (en) | A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm | |
CN102521656B (en) | Integrated transfer learning method for classification of unbalance samples | |
CN106886846A (en) | A kind of bank outlets' excess reserve Forecasting Methodology that Recognition with Recurrent Neural Network is remembered based on shot and long term | |
CN106022954B (en) | Multiple BP neural network load prediction method based on grey correlation degree | |
CN110059852A (en) | A kind of stock yield prediction technique based on improvement random forests algorithm | |
CN110298663A (en) | Based on the wide fraudulent trading detection method learnt deeply of sequence | |
CN109063911A (en) | A kind of Load aggregation body regrouping prediction method based on gating cycle unit networks | |
CN112884056A (en) | Optimized LSTM neural network-based sewage quality prediction method | |
CN110321361A (en) | Test question recommendation and judgment method based on improved LSTM neural network model | |
CN109143408B (en) | Dynamic region combined short-time rainfall forecasting method based on MLP | |
CN108447057A (en) | SAR image change detection based on conspicuousness and depth convolutional network | |
CN113393057A (en) | Wheat yield integrated prediction method based on deep fusion machine learning model | |
CN110516733A (en) | A kind of Recognition of Weil Logging Lithology method based on the more twin support vector machines of classification of improvement | |
CN116468138A (en) | Air conditioner load prediction method, system, electronic equipment and computer storage medium | |
CN115600729A (en) | Grid load prediction method considering multiple attributes | |
CN109271424A (en) | A kind of parameter adaptive clustering method based on density | |
CN115760380A (en) | Enterprise credit assessment method and system integrating electricity utilization information | |
CN106056167A (en) | Normalization possibilistic fuzzy entropy clustering method based on Gaussian kernel hybrid artificial bee colony algorithm | |
CN109919374A (en) | Prediction of Stock Price method based on APSO-BP neural network | |
Akinwale Adio et al. | Translated Nigeria stock market price using artificial neural network for effective prediction | |
CN113344589A (en) | Intelligent identification method for collusion behavior of power generation enterprise based on VAEGMM model | |
CN112102135A (en) | College poverty and poverty precise subsidy model based on LSTM neural network | |
CN116956160A (en) | Data classification prediction method based on self-adaptive tree species algorithm | |
CN109143355B (en) | Semi-supervised global optimization seismic facies quantitative analysis method based on SOM | |
CN116089801A (en) | Medical data missing value repairing method based on multiple confidence degrees |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191018 |
|
RJ01 | Rejection of invention patent application after publication |