CN110348608A - A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm - Google Patents

A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm Download PDF

Info

Publication number
CN110348608A
CN110348608A CN201910527461.7A CN201910527461A CN110348608A CN 110348608 A CN110348608 A CN 110348608A CN 201910527461 A CN201910527461 A CN 201910527461A CN 110348608 A CN110348608 A CN 110348608A
Authority
CN
China
Prior art keywords
time series
fusion model
lstm
fuzzy clustering
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910527461.7A
Other languages
Chinese (zh)
Inventor
曲桦
赵季红
李佳琪
张艳鹏
边江
石亚娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910527461.7A priority Critical patent/CN110348608A/en
Publication of CN110348608A publication Critical patent/CN110348608A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Technology Law (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)

Abstract

A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, fuzzy clustering is carried out to the stock certificate data collection after standardization, time series is then converted to, obtains time series collection S and fuzzy division time series collection T, to obtain Fusion Model, Fusion Model exports predicted value;After obtaining predicted value, calculate the global error of Fusion Model, give the global error manual delivery of Fusion Model to each LSTM network output layer, LSTM network is automatically by fractional error back transfer to input layer later, all weights in cell unit are updated, trained Fusion Model is obtained;Test set in time series collection S is passed to trained Fusion Model, exports final predicted value.The present invention, for the uncertainty of different clusters, can effectively reduce the error between the prediction and true value of LSTM network by each sample data in fitting reality, so that prediction result and actual conditions are more nearly and accuracy improves.

Description

A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
Technical field
The present invention relates to Prediction of Stock Price, in particular to a kind of prediction side that LSTM is improved based on fuzzy clustering algorithm Method.
Background technique
Prediction of Stock Price is according to stock market, including share price historical transactional information and the relevant market information of stock Predictive behavior of the development to future stock price ups and downs.Since stock market is in the significance of business and financial field, stock Ticket price expectation all attracts extensive concern all the time.
Due to the fluctuation and uncertainty of stock itself, Stock Price Fluctuation is a highly complex nonlinear system System, the adjustment of stock are promoted not in accordance with uniform time course, and with the progradation of itself, stock price is carried out The field that many researchers pay close attention to is also become when prediction is to preferably be selected stocks and be selected to obtain maximum return.When Preceding more common Prediction of Stock Index method includes conventional machines learning method and financial time series analysis, such as supporting vector A variety of methods such as machine, decision tree, regression analysis, arma modeling.This kind of algorithm interpretation is strong, but artificial parameter is excessive, excessively quasi- Close and the problems such as poor fitting always exist among such algorithm, and such algorithm due to itself structure it is simple, often one Model can only cannot analyze the correlation between more stock, can not analyze in market and exist for single stock modeling Message face driving factors bring influence.Although the deep learning method risen in recent years wants weak in the interpretation of modeling Learn in conventional machines, but the structure for itself imitating human brain neuron allows it to complete the analysis of various complexity with flying colors Task, Conjoint Analysis and text and the fusion decision of stock information etc. in this just comprising multiply ticket.Pass through deep learning Model come predict shares changing tendency itself have two big advantages: first is that in model artificial parameter often do not directly affect prediction effect Fruit reduces influence of the human factor to prediction result;Second is that deep learning usually will be due to traditional machine on generalization ability Device study, confirmation has been obtained in these in some matches.The application being fruitful for deep learning on finance data Research is also less, how to select effective strategy for largely stock is analyzed and obtained in stock certificate data a large amount of in market To every stock prediction result and guarantee that precision controlling risk needs further to be studied.
Stock market is in current development, and the type of industry is more clear, and index of correlation is also perfected.In this way Basis under, by apply cluster algorithm, can to the industry tendency and future development emphasis in stock market, carry out More accurately analyze.Clustering is a kind of method for effectively rationally sorting out data, it can send out from mass data Now lie in data structure therein.
The characteristics of clustering algorithm is all rigid for the division of data at present, hard plot is stroke that each sample is stringent It assigns in a class, however true data acquisition system has each class the subjection degree of ambiguity, it is therefore desirable to Yi Zhongju Class algorithm is more in line with the cluster situation of truthful data, to realize the prediction of stock price.
Summary of the invention
It is an object of the invention to solve the problems, such as the price expectation in stock market, provide a kind of based on fuzzy clustering algorithm The prediction technique of LSTM is improved, which introduces fuzzy clustering algorithm, carries out fuzzy clustering to the data serialized and obtains Summation is weighted to the LSTM network output by fusion to subordinated-degree matrix, and using subordinated-degree matrix, finally obtains stock Admission fee lattice predicted value, can fluctuating characteristic and scene in effective simulated stock trend so that prediction result it is more accurate and Meet reality.
In order to achieve the above object, the invention adopts the following technical scheme:
A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, which comprises the following steps:
(1) stock certificate data is standardized, obtains standardization stock certificate data collection;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm;
(3) the stock certificate data collection after fuzzy clustering is converted, obtains time series collection S and fuzzy division time sequence Column collection T;
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model output is pre- Measured value;
(5) global error of Fusion Model is calculated, by Fusion Model according to objective function after obtaining predicted value every time Global error manual delivery give each LSTM network output layer, then LSTM network automatically by fractional error back transfer extremely Input layer obtains trained Fusion Model to automatically update all weights in cell unit;
(6) test set in time series collection S is passed in step (5) trained Fusion Model, output is most Whole predicted value.
A further improvement of the present invention lies in that detailed process is as follows for step (1):
1) the mathematic expectaion μ of each characteristic series of stock certificate data collection is found outkWith standard deviation sk
2) it is standardized: zi=(xik)/sk, in which: ziFor the variate-value after standardization, xiFor real variable Value.
A further improvement of the present invention lies in that detailed process is as follows for step (2):
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
In formula, uijCorrespond to the degree of membership of i-th of classification for j-th of sample;
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
Wherein, ciFor the cluster centre of i-th of classification;xjFor real variable value;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined1Or it is opposite It is less than some threshold epsilon in the knots modification of last time target function value2, then stop;Otherwise step 4) is carried out;
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample;U is Subordinated-degree matrix, c1For the cluster centre of first classification, ccFor the cluster centre of c-th of classification,mFor a degree of membership factor;
4) new subordinated-degree matrix U, and return step 2 are calculated with formula (2.4));
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
A further improvement of the present invention lies in that detailed process is as follows for step (3): being drawn with the time interval that time step is 5 Divide standardization stock certificate data collection, it is assumed that if current time is t, (xt,xt+1,xt+2,xt+3,xt+4) be one be with moment t The time series at moment when starting;It is the time series of initial time for (x using the t+1 momentt+1,xt+2,xt+3,xt+4,xt+5), with This analogizes, and will standardize stock certificate data collection to be processed into the time series collection S for being 5 with time interval;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut, ut+1,ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series;Using the t+1 moment as the time of initial time Sequence is (ut+1,ut+2,ut+3,ut+4,ut+5), and so on, converting subordinated-degree matrix U to time interval is that 5 fuzzy is drawn Sequence sets T between timesharing.
A further improvement of the present invention lies in that the detailed process of step (4) are as follows: the training set in time series collection S is defeated Enter to each LSTM network, by the multi-level mapping of cell unit to output layer, uses fuzzy division time series collection T as often Weight and corresponding output valve weighting work and, are obtained Fusion Model by the weight of a LSTM network output.
A further improvement of the present invention lies in that in step (4), it is assumed that fuzzy clustering has n cluster, the degree of membership of the i-th sample For (ui1,ui2,...,uin), n LSTM network, output is (ai1,ai2,...,ain), then the output of Fusion Model are as follows:
A further improvement of the present invention lies in that the formula of objective function are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
A further improvement of the present invention lies in that in step (5), the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve, i=1,2 ..., m;
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership, i=1,2 ..., m and k=1,2 ..., n.
Compared with prior art, the invention has the following advantages:
The present invention improves the prediction technique of LSTM by introducing based on fuzzy clustering, and the present invention is using FCM algorithm to data Fuzzy clustering is carried out, compared to " the hard cluster " being divided into a class that each sample is stringent is played, fuzzy cluster analysis will gather Each cluster that class generates regards fuzzy set as, determines clustering relationships by degree of membership, is a kind of flexible division, obtains sample Belong to the degree of uncertainty of each class, so that cluster result more accurate and flexible, FCM algorithm reference fuzzy set concept is improved The robustness of model prediction;The weight that the present invention uses sample degree of membership to export as multiple LSTM network models, will be more with this A LSTM Model Fusion obtains final prediction network model, the multiple models of Model Fusion set, so that prediction result more may be used Letter, strengthens the prediction effect of model.Fuzzy clustering determines that each sample data is under the jurisdiction of each class by membership function Degree, rather than a data object is referred to rigidly in certain cluster.Based on fuzzy clustering algorithm to LSTM (Long Short-Term Memory) network improves, and each sample data, can for the uncertainty of different clusters in fitting reality The error between the prediction and true value of LSTM network is effectively reduced, so that prediction result and actual conditions are more nearly and accurately Degree improves.
Detailed description of the invention
Fig. 1 is FCM algorithm flow chart;
Fig. 2 is LSTM structure chart;
Fig. 3 is the improved model figure of the LSTM based on fuzzy clustering algorithm;
Fig. 4 is prediction technique flow chart.
Specific embodiment
In order to which the contents of the present invention, effect and advantage is more clearly understood, with reference to the accompanying drawings and examples to this Invention is described in detail.
The present invention is LSTM (Long Short-Term Memory) improved prediction technique based on fuzzy clustering, is passed through Stock certificate data is standardized and converts time series for standardized data (timestep of time series is 5 days, be used as a time series within every five days), can effective acceleration model training effectiveness, and eliminate model for data volume The hypersensitivity of grade, converts time series for data and to joined temporal correlation in training pattern, be more in line with stock Admission fee lattice this characteristic relevant to the previous data of history;By introducing fuzzy set and concept, all sample datas are used into FCM Algorithm carries out fuzzy clustering, obtains cluster by-product-subordinated-degree matrix, the meter of the final predicted value of degree of membership in the present invention Played significant role in calculation, while having utilized fuzzy clustering, eliminate a sample that tradition clusters firmly can only hardness belong to one The drawbacks of a classification, fuzzy clustering make sample have certain subjection degree for each classification, make its cluster result more Add the cluster situation for meeting real world;It, can effectively will be multiple by using the subordinated-degree matrix that is obtained by fuzzy clustering The LSTM network integration reinforces prediction effect to together;The present invention keeps degree of membership weight constant during updating weight, will Overall error gives the output of each LSTM according to degree of membership weight proportion back transfer, and each LSTM is allowed to obtain accurate portion Point error and the weight in its internal cell is automatically updated to front transfer, finally minimize objective function, reduce model Global error so that prediction result is more nearly true value, and has better generalization ability to unknown sample.
It is of the invention that detailed process is as follows:
(1) data normalization
Since the time interval of data collection is larger, the passage of stock price at any time is changed in magnitude, if Mass data value is not the same magnitude, and very big influence can be caused to the precision of model prediction, and in addition LSTM is to input number According to scale it is very sensitive, especially with sigmoid or tanh activation function when, data normalization is one and is done well Method.It converts function are as follows: x*=(x- μ)/σ, wherein μ is the mean value of all sample datas, and σ is the standard of all sample datas Difference.
The step of stock certificate data is standardized, standardized data collection is obtained is as follows:
1) the mathematic expectaion μ of stock certificate data collection (TAIEX) each characteristic series is found outkWith standard deviation sk
2) it is standardized: zi=(xik)/sk, in which: ziFor the variate-value after standardization, xiFor real variable Value;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm
Cluster is so that the sample time similarity for being divided into same cluster is maximum, and the similarity between different clusters is minimum, Fuzzy clustering establishes uncertain description of the sample to classification, can more reflect objective world.Referring to Fig. 1, FCM algorithm is by n Sample xi(i=1,2 ..., n) is divided into four ambiguity groups, and seeks every group of cluster centre, so that the target of non-similarity index Function reaches minimum.Constraint condition is the sum of the degree of membership that a sample belongs to all classes, and is 1;
In formula, uijThe degree of membership for corresponding to i-th of classification for j-th of sample, between 0,1.
Objective function J (U, the c of FCM algorithm1,...,cc) are as follows:
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample, and It is a Weighted Index;U is subordinated-degree matrix, c1For the cluster centre of first classification, ccIn cluster for c-th of classification The heart,mFor a degree of membership factor, generally 2;
Formula 2.2, which is multiplied at a distance from current sample to each class center by the degree of membership of all current samples, to be formed.
Wherein, ciIt is a Weighted Index for the cluster centre of i-th of classification;xjFor real variable value;
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
The step of fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm are as follows:
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined11Generally 0.01) or it relative to last time target function value knots modification be less than some threshold epsilon22It generally takes 0.001), then stops;It is no Then (i.e. if target function value is greater than some threshold epsilon determined1Or it is greater than relative to the knots modification of last time target function value Some threshold epsilon2) carry out step 4);
4) new subordinated-degree matrix U, return step 2 are calculated with formula (2.4)).
C cluster centre point one fuzzy partition matrix U of vector sum is obtained by above-mentioned steps, what this matrix indicated is Each sample point belongs to the degree of membership of each class, also referred to as subordinated-degree matrix U.
(3) time series is converted by the stock certificate data collection after standardization, when obtaining time series collection S and fuzzy division Between sequence sets T;
Criteria for classifying stock certificate data collection is divided with the time interval that timestep (time step) is 5, it is assumed that if current Moment is t, then (xt,xt+1,xt+2,xt+3,xt+4) be one with moment t be starting when the moment time series (xtFor normalized number According to a sample record of concentration);It is the time series of initial time for (x using the t+1 momentt+1,xt+2,xt+3,xt+4,xt+5), with This analogizes, and will standardize stock certificate data collection to be processed into the time series collection S for being 5 with time interval, each sequence is LSTM's Primary input;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut, ut+1,ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series (utFor the galley proof in subordinated-degree matrix U This record);It is the time series of initial time for (u using the t+1 momentt+1,ut+2,ut+3,ut+4,ut+5), and so on, it will be subordinate to The fuzzy division time series collection T that it is 5 with time interval that degree matrix U, which is converted into,.
Then time series collection S is divided into data set and test set two parts using k folding cross validation method, uses k Time series collection S is divided into k disjoint subsets by folding cross validation method, it is assumed that the time series number in time series collection S Amount is p, then each subset has p/k sequence, corresponding disjoint subset is { s1,s2,...,sk};K is natural number. K-1 subset { S is selected from training subset every time1,S2,...,Sj-1,Sj+1,...,SkUnion, that is, only stay every time Next subset Sj, so as to carry out k training and test.
K rolls over cross validation method method particularly includes: by calling sklearn.model_selection.KFold to roll over by k Intersection marks off training set and test set.
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model output is pre- Measured value;
Referring to fig. 2, LSTM (Long Short-Term Memory) is shot and long term memory network, is a kind of time circulation mind Through network, it is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.LSTM is different from RNN's Place is that it in the algorithm and joined " processor " judged whether information is useful, this processor interactive construction quilt Referred to as cell.
Referring to Fig. 3, the internal structure of LSTM, it has evaded the problem of gradient explosion and gradient disappear in standard RNN, so Its pace of learning is faster.The internal structure of LSTM is specific as follows:
1) forget door: the first step is what information is determination data can lose from cell state in input LSTM, this is determined Surely become by one and forget gate layer and complete, this layer is by the forgetting of the header length in cell state, this layer of formula are as follows:
ft=σ (Wf·[ht-1,xt]+bf), wherein WfAnd bfStudy, h are trained as parametert-1Indicate be The output of a upper cell, xtWhat is indicated is the input of current cell, and σ is sigmoid function, ftFor the output valve for forgeing door;
2) input gate: the layer is divided into two parts, and first part is to generate the new information to be updated, including two small minds Through network layer:
it=σ (Wi·[ht-1,xt]+bi)
Wherein WiAnd biStudy, h are trained as parametert-1That indicate is the output of a upper cell, xtTable What is shown is the input of current cell, and σ is sigmoid function, and tanh is tanh function, itWithFor two small minds in input gate Output valve through network;
Update cell state:
Wherein ftFor the output valve for forgeing door, Ct-1For the cell state of a upper cell, itFor sigmoid layers in input gate Output valve,For tanh layers in input gate of output valve;
3) out gate: the layer is based on cell state, what value of output determined, is come firstly, cell runs one sigmoid layers Determine that output is gone out in that part of cell state.
Ot=σ (Wo[ht-1,xt]+bo)
Wherein WoAnd boStudy, h are trained as parametert-1That indicate is the output of a upper cell, xtTable That show is the input of current cell, OtFor the sigmoid layer output valve of out gate
Then, by cell state handled by tanh and (obtain a value between -1 to 1) and by it and Sigmoid output is multiplied, and final cell can only export the part for determining output.
ht=Ot*tanh(Ct)
Wherein, CtFor the cell state of cell, OtIt is exported for the sigmoid layer of out gate, htFor the output valve of out gate.
The present invention creates multiple LSTM networks (its number is identical as fuzzy clustering number), referring to fig. 4, by time series collection S In training set be input to each LSTM network, by the multi-level mapping of cell unit to output layer, use the degree of membership of sample That is these weights are weighted with corresponding output valve and are made by the weight that fuzzy division time series collection T is exported as each LSTM network With obtain Fusion Model, Fusion Model exports final model predication value.Detailed process is as follows: assuming that fuzzy clustering has n Cluster, the degree of membership of the i-th sample are (ui1,ui2,...,uin), n LSTM network, output is (ai1,ai2,...,ain), then melt The output of molding type are as follows:
(5) objective function
Since the problem belongs to regression problem, so objective function uses mesh of the conventional mean square error as Fusion Model Scalar functions, the formula of objective function are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
(6) weight is updated
Referring to fig. 4, LSTM network can automatic reverse propagated error, so only needing obtaining basis after predicted value every time The objective function of step (5) calculates the global error of Fusion Model, by the global error manual delivery of Fusion Model to each LSTM network output layer, LSTM network can be automatically by fractional error back transfer to input layer, to automatically update later All weights among cell unit obtain trained Fusion Model.Detailed process is as follows: according to formula (3.1) it is found that Fusion Model prediction is weighted to obtain by degree of membership and each LSTM network output valve, so output of the global error by each LSTM network It is worth each LSTM network of weight accounting back transfer.
Wherein, the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve;Formula (3.2) is the global error of Fusion Model.
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership;Formula (3.3) be global error in degree of membership ratio to front transfer Fractional error.
By the fractional error back transfer in formula (3.3) give each LSTM network input layer, later each LSTM network according to Incoming fractional error automatically updates the weight in LSTM network cell unit.
(7) it predicts:
Test set in time series set S is passed in step (6) trained Fusion Model, Fusion Model Output is final predicted value.
The present invention devises a kind of LSTM based on fuzzy clustering and improves prediction technique, with promoted prediction credibility and Correctness.The present invention determines clustering relationships by degree of membership, obtains sample and belong to each class by introducing fuzzy clustering algorithm Uncertainty degree so that cluster result more accurate and flexible, FCM algorithm introduces fuzzy set concept and improves prediction technique Robustness;The weight that the present invention uses sample degree of membership to export as multiple networks LSTM model, with this by multiple LSTM models Fusion obtains final prediction network model, the multiple models of Model Fusion set, so that prediction result is more credible, fitting Fluctuation in share certificate actual scene, and strengthen forecast result of model.

Claims (8)

1. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm, which comprises the following steps:
(1) stock certificate data is standardized, obtains standardization stock certificate data collection;
(2) fuzzy clustering is carried out to the stock certificate data collection after standardization using FCM algorithm;
(3) the stock certificate data collection after fuzzy clustering is converted, obtains time series collection S and fuzzy division time series collection T;
(4) according to time series collection S and fuzzy division time series collection T, Fusion Model is obtained, Fusion Model exports predicted value;
(5) global error of Fusion Model is calculated, by the total of Fusion Model according to objective function after obtaining predicted value every time Body error manual delivery gives each LSTM network output layer, and then LSTM network is automatically by fractional error back transfer to input Layer, to automatically update all weights in cell unit, obtains trained Fusion Model;
(6) test set in time series collection S is passed in step (5) trained Fusion Model, output is final pre- Measured value.
2. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step Suddenly (one) detailed process is as follows:
1) the mathematic expectaion μ of each characteristic series of stock certificate data collection is found outkWith standard deviation sk
2) it is standardized: zi=(xik)/sk, in which: ziFor the variate-value after standardization, xiFor real variable value.
3. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step Suddenly (two) detailed process is as follows:
1) random number with value between 0,1 initializes subordinated-degree matrix U, it is made to meet the constraint condition in formula (2.1);
In formula, uijCorrespond to the degree of membership of i-th of classification for j-th of sample;
2) c cluster centre c is calculated with formula (2.3)j, j=1,2 ..., c;
Wherein, ciFor the cluster centre of i-th of classification;xjFor real variable value;
3) according to formula (2.2) calculating target function, if target function value is less than some threshold epsilon determined1Or it is relative to upper The knots modification of secondary target function value is less than some threshold epsilon2, then stop;Otherwise step 4) is carried out;
In formula, dij=| | ci-xi| | the Euclidean distance between ith cluster center and j-th of data sample;U is degree of membership Matrix, c1For the cluster centre of first classification, ccFor the cluster centre of c-th of classification,mFor a degree of membership factor;
4) new subordinated-degree matrix U, and return step 2 are calculated with formula (2.4));
Wherein, dkjFor the Euclidean distance between k-th of cluster centre and j-th of data sample.
4. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step Suddenly (three) detailed process is as follows: with time step be 5 time interval criteria for classifying stock certificate data collection, it is assumed that if when current Carving is t, then (xt,xt+1,xt+2,xt+3,xt+4) be one with moment t be starting when the moment time series;It is with the t+1 moment The time series at moment beginning is (xt+1,xt+2,xt+3,xt+4,xt+5), and so on, will standardization stock certificate data collection be processed into when Between between be divided into 5 time series collection S;
Subordinated-degree matrix U is equally divided with the interval that time step is 5, it is assumed that if current time is t, (ut,ut+1, ut+2,ut+3,ut+4) be one with moment t be starting when the moment time series;Using the t+1 moment as the time series of initial time For (ut+1,ut+2,ut+3,ut+4,ut+5), and so on, by subordinated-degree matrix U be converted into time interval be 5 fuzzy division when Between sequence sets T.
5. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that step Suddenly the detailed process of (four) are as follows: the training set in time series collection S is input to each LSTM network, by the more of cell unit Layer map to output layer, the weight for using fuzzy division time series collection T to export as each LSTM network, by weight with it is corresponding Output valve weighting is made and obtains Fusion Model.
6. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 5, which is characterized in that step Suddenly in (four), it is assumed that fuzzy clustering has n cluster, and the degree of membership of the i-th sample is (ui1,ui2,...,uin), n LSTM network, Output is (ai1,ai2,...,ain), then the output of Fusion Model are as follows:
7. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 1, which is characterized in that mesh The formula of scalar functions are as follows:
Wherein m is sample size, yiFor sample true value,For output valve.
8. a kind of prediction technique for improving LSTM based on fuzzy clustering algorithm according to claim 6, which is characterized in that step Suddenly in (five), the global error of Fusion Model are as follows:
In formula, yiFor sample true value,For output valve, i=1,2 ..., m;
Fractional error is as follows:
In formula, uikIt is sample xiIt is under the jurisdiction of class ckDegree of membership, i=1,2 ..., m and k=1,2 ..., n.
CN201910527461.7A 2019-06-18 2019-06-18 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm Pending CN110348608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910527461.7A CN110348608A (en) 2019-06-18 2019-06-18 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910527461.7A CN110348608A (en) 2019-06-18 2019-06-18 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm

Publications (1)

Publication Number Publication Date
CN110348608A true CN110348608A (en) 2019-10-18

Family

ID=68182172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910527461.7A Pending CN110348608A (en) 2019-06-18 2019-06-18 A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm

Country Status (1)

Country Link
CN (1) CN110348608A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652444A (en) * 2020-06-05 2020-09-11 南京机电职业技术学院 K-means and LSTM-based daily passenger volume prediction method
CN111881502A (en) * 2020-07-27 2020-11-03 中铁二院工程集团有限责任公司 Bridge state discrimination method based on fuzzy clustering analysis
CN112149990A (en) * 2020-09-18 2020-12-29 南京邮电大学 Fuzzy supply and demand matching method based on prediction
CN112307410A (en) * 2020-09-18 2021-02-02 天津大学 Seawater temperature and salinity information time sequence prediction method based on shipborne CTD measurement data
CN113159109A (en) * 2021-03-04 2021-07-23 北京邮电大学 Wireless network flow prediction method based on data driving
CN113223392A (en) * 2021-05-18 2021-08-06 信阳农林学院 Hybrid integration model for PM2.5 hour concentration prediction
CN113343077A (en) * 2021-04-30 2021-09-03 南京大学 Personalized recommendation method and system integrating user interest time sequence fluctuation
CN116723136A (en) * 2023-08-09 2023-09-08 南京华飞数据技术有限公司 Network data detection method applying FCM clustering algorithm

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652444A (en) * 2020-06-05 2020-09-11 南京机电职业技术学院 K-means and LSTM-based daily passenger volume prediction method
CN111652444B (en) * 2020-06-05 2023-07-21 南京机电职业技术学院 K-means and LSTM-based daily guest volume prediction method
CN111881502A (en) * 2020-07-27 2020-11-03 中铁二院工程集团有限责任公司 Bridge state discrimination method based on fuzzy clustering analysis
CN112149990A (en) * 2020-09-18 2020-12-29 南京邮电大学 Fuzzy supply and demand matching method based on prediction
CN112307410A (en) * 2020-09-18 2021-02-02 天津大学 Seawater temperature and salinity information time sequence prediction method based on shipborne CTD measurement data
CN112149990B (en) * 2020-09-18 2022-07-26 南京邮电大学 Fuzzy supply and demand matching method based on prediction
CN113159109A (en) * 2021-03-04 2021-07-23 北京邮电大学 Wireless network flow prediction method based on data driving
CN113159109B (en) * 2021-03-04 2024-03-08 北京邮电大学 Wireless network flow prediction method based on data driving
CN113343077A (en) * 2021-04-30 2021-09-03 南京大学 Personalized recommendation method and system integrating user interest time sequence fluctuation
CN113223392A (en) * 2021-05-18 2021-08-06 信阳农林学院 Hybrid integration model for PM2.5 hour concentration prediction
CN116723136A (en) * 2023-08-09 2023-09-08 南京华飞数据技术有限公司 Network data detection method applying FCM clustering algorithm
CN116723136B (en) * 2023-08-09 2023-11-03 南京华飞数据技术有限公司 Network data detection method applying FCM clustering algorithm

Similar Documents

Publication Publication Date Title
CN110348608A (en) A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
CN102521656B (en) Integrated transfer learning method for classification of unbalance samples
CN106886846A (en) A kind of bank outlets' excess reserve Forecasting Methodology that Recognition with Recurrent Neural Network is remembered based on shot and long term
CN106022954B (en) Multiple BP neural network load prediction method based on grey correlation degree
CN110059852A (en) A kind of stock yield prediction technique based on improvement random forests algorithm
CN110298663A (en) Based on the wide fraudulent trading detection method learnt deeply of sequence
CN109063911A (en) A kind of Load aggregation body regrouping prediction method based on gating cycle unit networks
CN112884056A (en) Optimized LSTM neural network-based sewage quality prediction method
CN110321361A (en) Test question recommendation and judgment method based on improved LSTM neural network model
CN109143408B (en) Dynamic region combined short-time rainfall forecasting method based on MLP
CN108447057A (en) SAR image change detection based on conspicuousness and depth convolutional network
CN113393057A (en) Wheat yield integrated prediction method based on deep fusion machine learning model
CN110516733A (en) A kind of Recognition of Weil Logging Lithology method based on the more twin support vector machines of classification of improvement
CN116468138A (en) Air conditioner load prediction method, system, electronic equipment and computer storage medium
CN115600729A (en) Grid load prediction method considering multiple attributes
CN109271424A (en) A kind of parameter adaptive clustering method based on density
CN115760380A (en) Enterprise credit assessment method and system integrating electricity utilization information
CN106056167A (en) Normalization possibilistic fuzzy entropy clustering method based on Gaussian kernel hybrid artificial bee colony algorithm
CN109919374A (en) Prediction of Stock Price method based on APSO-BP neural network
Akinwale Adio et al. Translated Nigeria stock market price using artificial neural network for effective prediction
CN113344589A (en) Intelligent identification method for collusion behavior of power generation enterprise based on VAEGMM model
CN112102135A (en) College poverty and poverty precise subsidy model based on LSTM neural network
CN116956160A (en) Data classification prediction method based on self-adaptive tree species algorithm
CN109143355B (en) Semi-supervised global optimization seismic facies quantitative analysis method based on SOM
CN116089801A (en) Medical data missing value repairing method based on multiple confidence degrees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191018

RJ01 Rejection of invention patent application after publication