CN110059852A - A kind of stock yield prediction technique based on improvement random forests algorithm - Google Patents
A kind of stock yield prediction technique based on improvement random forests algorithm Download PDFInfo
- Publication number
- CN110059852A CN110059852A CN201910180723.7A CN201910180723A CN110059852A CN 110059852 A CN110059852 A CN 110059852A CN 201910180723 A CN201910180723 A CN 201910180723A CN 110059852 A CN110059852 A CN 110059852A
- Authority
- CN
- China
- Prior art keywords
- data
- stock
- prediction
- oob
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Finance (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Technology Law (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of based on the stock yield prediction technique for improving random forests algorithm, when the present invention is for stock yield classification prediction, the difficulty of parameter selection existing for random forest and classification performance problem, the shortcomings that RF algorithm itself can not identify and select more efficient feature, optimize feature selection mechanism in conjunction with particle swarm algorithm, in Long-term change trend initial stage unconspicuous situation, filter out optimal characteristics, and RF algorithm is inputted as attribute, propose the mixed method of PSO-GRID-RF stock trend prediction;The present invention reduces character subset, rejects the duplicate characteristic attribute of unrelated or effect, reduces the dimension of input, reduce the time of stock trend prediction;Under more attributive character environment, efficient feature selection method is proposed, while introducing grid-search algorithms optimization random forest parameter and the accuracy rate of stock trend prediction is greatly improved to improve the classification estimated performance of random forest.
Description
Technical field
The invention belongs to finance data mining technical fields, for random forest in the classification forecasting research of stock yield
Existing parameter selection is difficult and classification performance problem, proposes a kind of feature selecting based on particle swarm algorithm and grid is searched
The new algorithm of the parameter of rope algorithm optimization random forest.Feature selecting is carried out to training set by particle swarm algorithm, rejects removal
Redundancy index in index system introduces grid-search algorithms optimization random forest parameter to reduce input dimension, thus
Improve the classification estimated performance of random forest.
Background technique
In stock market, for investor, the prediction of stock price tendency is always popular problem.Accurate judgement
With the variation tendency for holding entire stock market, the phenomenon that can not only reducing blind investment in stock market, for improving stock
The rationality degree realistic meaning with higher of investor in city more can formulate related economic policy for country and provide reference.
Domestic and foreign scholars conduct in-depth research Prediction of Stock Price, propose various prediction techniques.Application now
Method there are mainly two types of, Fundamental Analysis and technology analysis.The first kind is based on to bases such as company's growth and profitability
The considerations of this factor.Second class is the mathematical analysis based on past stock certificate data, and this simple analysis is by observing stock
Movement tendency figure is predicted.More complicated analysis is using complicated statistical method and machine learning algorithm.
Time series analysis is to be applied to the method for Prediction of Stock Price at first, establishes arma modeling to stock opening price
Carry out short-term forecast.Due to being influenced by various factors, stock price shows nonlinear change, based on linear model when
Between sequence analysis cannot be well reflected stock non-linear change tendencies, precision of prediction is low, using limited.With artificial intelligence
The rise of technology, BP neural network is because its powerful non-linear mapping capability is widely used in base in Prediction of Stock Price
In the Prediction of Stock Price model SPPM of BP neural network, multiple neural network models are established to stock price and carry out prediction mind
Good effect is achieved in nonlinear Prediction of Stock Index through network, but it is unstable to exist simultaneously learning and memory, convergence speed
The problem of degree is slow, is easily trapped into local optimum.
Random forests algorithm (Random Forest) is answered in financial field as a kind of sorting technique
With compared to support vector machines (Support Vector Machine) and artificial neural network (Artificial Neural
Networks), RF obtains better result in stock trend prediction.Random forests algorithm is a kind of model combination, is applied to
Original achievement is obtained on different fields.There is training speed is fast, model generalization ability is strong etc. based on random forests algorithm
Advantage applies to the algorithm in stock advance-decline prediction, can be avoided the deficiency of above-mentioned prediction model.Random forest method prediction master
If first being screened to the Raw performance system of foundation, it is updated to using the achievement data after screening as variation random gloomy
Lin Zhong, variable exports ups and downs situation in response.But existing method is lacking the model optimization of random forest itself, cannot
Further promote prediction accuracy.
Summary of the invention
The present invention is directed to the deficiency of the technology in the classification forecasting research of stock yield, propose it is a kind of based on improve with
The stock yield prediction technique of machine forest algorithm.
A kind of stock yield prediction technique based on improvement random forests algorithm, specifically includes the following steps:
Step 1: data acquisition obtains stock day data by website;
Split data into training set, verifying collection, test set
Step 2: it obtains data and carries out exponential smoothing:
S0=Y0T=0 (1)
St=α * Yt+(1-α)*St-1 t>0 (2)
In formula: StIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;
St-1Time is the smooth value of t-1, and α is the exponential smoothing factor, 0 < α < 1.
Exponential smoothing eliminates the randomness or noise of the variation from historical data, and model is enable easily to identify for a long time
Upward price trend.
Step 3: feature extraction: according to exponential smoothing result computing technique index, smooth time series data calculates special
Matrix is levied, is used to judge the technical indicator of stock trend ups and downs as feature investee.
Step 4:PSO algorithm carries out feature selecting:
Determine input of the necessary influence index as model, output of the necessary response variable as model, because
Building stock index system is to carry out the basis of subsequent evaluation and comprehensive analysis, is mentioned so we carry out feature to technical indicator
It takes.We are using technical indicator as the particle in particle swarm algorithm, and the initial velocity of particle and position are all random in PSO algorithm
Distribution, locally optimal solution PidbestIt is the optimal location of particle in the case of current iteration, globally optimal solution PgdbestIt is entire population
Optimal location.Postulated particle group hunting Spatial Dimension is D, shares m particle, then particle is x in the position in spacei=[xi1,
xi2,…,xiD], speed vi=[vi1,vi2,…,viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: VkThe speed that certain corresponding particle local extremum is tieed up in kth, XkParticle local value kth ties up optimal location,
The local optimum position of population when representing kth time iterative process,The overall situation of population when representing kth time iterative process
Optimal location, S () indicate sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is by grain
Sub- speed is mapped between [0,1], and compared with random number, the location status of more new particle, c1,c2It is Studying factors, and is positive
Number, w is inertia weight, rand1,rand2∈ [0,1], is uniformly distributed at random;
Step 5: setting decision condition:
Set decision condition: if the number of iterations is more than maximum number of iterations, fitness is then jumped out and is followed lower than the value of setting
Ring.
Step 6: feature selecting:
The binary coding that step 4 population feature selecting obtains is used for trend prediction as input feature vector, wherein 1 table
Show it is selected, 0 indicate be not selected;
Step 7: output optimal characteristics:
If meeting step 5, the output with conditions optimal characteristics of setting, otherwise return step 4;
Step 8: building data matrix:
Data matrix is constructed according to the optimal characteristics that step 7 is selected;
Step 9: training set, verifying collection cross validation:
By training set, verifying collection carries out tune ginseng using cross validation, and 90% is used for training pattern, and 10% for verifying mould
Type.Parameter optimization, the depth including tree, stochastic regime, the variable of tree node are carried out to random forest using grid-search algorithms
Number, the number of tree, OOB false segmentation rate and variable importance are estimated to promote prediction accuracy, so that prediction model is obtained, so that
Model has degree of well adapting to data.
Step 10: the foundation of stock exchange signal, that is, data label:
The data matrix that step 8 is constructed is input to random forests algorithm and is trained as training data, building transaction
Signal Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number.The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price.
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
Step 11: training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and is calculated with grid search
Method carries out parameter optimization to new optimal characteristics data set, and is compared with practical stock trend, obtains becoming for Prediction of Stock Index
Gesture and the accuracy of prediction.
Step 12: model evaluation:
According in random forests algorithm assorting process, classification prediction result can be indicated with confusion matrix, such as the following table 1 institute
Show:
1 confusion matrix of table
It is predicted as+1 | It is predicted as 0 | It is predicted as -1 | |
True is+1 | TP | FZ1 | FN1 |
True is 0 | FP1 | TZ | FN2 |
True is -1 | FP2 | FZ2 | TN |
It is the negative class correctly classified that wherein TP, which is 0, the TN that+1, the TZ correctly to classify correctly classifies, and FP1 is 0 class mistake point
For+1 class, FP2 is that -1 class mistake is divided into+1 class, and FZ1 is that+1 class mistake is divided into 0 class, and FZ2 is that -1 class mistake is divided into 0 class, FN1 is+
1 class mistake is divided into -1 class, and FN2 is that 0 class mistake is divided into -1 class, and FP FP1+FP2, FN are (for FN1+FN2, FZ FZ1+FZ2, N=
NTP+NFN+NFP+NTN+NTZ indicates sample total.
Correct probability is predicted in accuracy rate Accuracy expression test set, positive class is originally in recall rate Recall expression
Sample predictions pair probability, precision ratio Precision indicates correct probability in the sample of all classes that are predicted to be positive, calculates
Formula difference is as follows:
Recall=TP/ (TP+FN) (10)
Precision=TP/ (TP+FP) (11)
The integrated performance index that F is made of the weighted average of sensitivity and precision ratio, F value more level off to 1 expression point
Class result is better, and formula is as follows:
The above parameter is the another aspect obtained from chaos matrix, in random forest generating process, is used
Bootstrap method generates training set, due to being to have the duplicate sampling put back to, compared with initial data, only about 63%
Data are repeated extraction, and remainder data is not in that wherein remainder data is exactly the outer data OOB of bag, are estimated using data outside bag
Count the generalization ability of random forests algorithm, referred to as OOB estimation;As unit of one tree, the accuracy that is arrived with OOB Data Detection
For OOBscore, the error detected is exactly the outer error OOB of bagerror, by the OOB of all treeserrorIt is averaged exactly random forest
OOB'error, OOB'errorThe smaller generalization ability for illustrating RF is stronger;Fitness value Fitness is by F and OOB'errorComposition, value are got over
Small better, formula is as follows:
OOBerror=1-OOBscore (13)
Fitness=OOB'error+(1-F) (14)
Step 13: it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, classified
As a result, using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is obtained
Classification results, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.
The present invention has the beneficial effect that:
(1) present invention proposes that efficient feature selection method selects best features conduct by PSO algorithm global search
Input variable is input to RF algorithm, reduces character subset, eliminates the duplicate characteristic attribute of unrelated or effect, reduces stock
The time of ticket prediction, the accuracy rate of stock trend prediction is greatly improved.
(2) present invention training set use cross validation, effectively consider stock price timing dependence, effectively improve with
The accuracy rate of machine forest classified model.
(3) stock yield is a stationary sequence, uses stock yield as label, than using closing price to mark as input
Label more can price reflection trend, can effectively improve the accuracy rate of stock trend prediction.
(4) parameter optimization is carried out using grid-search algorithms when the present invention carries out parameter training to random forest, is effectively kept away
Exempt from parameter selection difficulty problem when random forests algorithm is predicted, chooses optimal parameter, improve the accuracy rate of trend prediction.
Detailed description of the invention
Fig. 1 is the stock yield research framework figure for improving random forest;
Fig. 2 is random forest classification non-directed graph;
Fig. 3 is binary coding schematic diagram;
Fig. 4 is PSO algorithm flow chart;
Fig. 5 is random forest ballot classification method flow chart in stock trend prediction.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.
As shown in Fig. 1~5, the present invention is based on particle swarm algorithm, grid-search algorithms, random forests algorithm receives stock
The stock trend forecasting method of beneficial rate research.
The invention proposes a kind of stock yield research method of efficient feature selection method under more attributive character environment,
Since attributive character scale is big, and continuous variable is belonged to, so decision tree is generated using CART algorithm, specific formula is as follows:
Flow chart of the present invention is as shown in Figure 1, the specific steps are as follows:
(1): data acquisition:
Stock day data are obtained by website, the stock certificate data source that the present invention uses is the websites such as Yahoo (Yahoo), packet
Include the opening price of stock exchange, closing price, exchange hand, highest price, lowest price etc. downloads as csv file, and split data into instruction
Practice collection, verifying collection, test set.
(2): it obtains data and carries out exponential smoothing:
S0=Y0T=0 (3)
St=α * Yt+(1-α)*St-1 t>0 (4)
S in formulatIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;
St-1Time is the smooth value of t-1, and α is the exponential smoothing factor, 0 < α < 1.
Exponential smoothing eliminates the randomness or noise of the variation from historical data, and model is enable easily to identify for a long time
Upward price trend.
(3): feature extraction;
According to exponential smoothing result computing technique index, index will consider the various aspects of the market behavior, the tool of index value
Body numerical value and mutual relationship, directly reflection stock market's state in which, provide direction for our operation behavior.Index is anti-
The thing reflected is directly can't see from market report mostly.Technical indicator is that smooth time series data calculates feature square
Investee's receiving is used to judge the technical indicator of stock trend ups and downs as feature by battle array.
(4): PSO algorithm carries out feature selecting:
The step of Fig. 4 is PSO algorithm flow chart, and population carries out feature selecting.In pso algorithm, optimization problem is turned
Turn to a point in d dimension space, referred to as particle, the quality of particle current location is assessed by objective function, objective function according to
The position of particle calculates corresponding fitness.Particle is flown in search space with certain speed, this speed is according to its sheet
The flying experience of body and the flying experience of companion dynamically adjust, and are then used to calculate the new position of particle.Optimizing Search one
In population composed by the particle that group's random initializtion is formed, carry out in an iterative manner, until meeting certain termination condition,
Such as reach specified the number of iterations.
The initial velocity of particle and position are all randomly assigned in PSO algorithm, locally optimal solution PidbestIt is current iteration feelings
The optimal location of particle under condition, globally optimal solution PgdbestIt is the optimal location of entire population.Postulated particle group hunting Spatial Dimension
For D, m particle is shared, then particle is x in the position in spacei=[xi1,xi2,…,xiD], speed vi=[vi1,vi2,…,
viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: the speed that certain corresponding particle local extremum is tieed up in kth, particle local value kth tie up optimal location, represent kth
The local optimum position of population when secondary iterative process, the global optimum position of population, S when representing kth time iterative process
() indicates sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is to map particle rapidity
To between [0,1], and compared with random number, the location status of more new particle is Studying factors, and is positive number, is inertia power
Weight, it is uniformly distributed at random;
(5): setting decision condition:
Set decision condition: as shown in Figure 4, if the number of iterations is more than maximum number of iterations, fitness is small for condition judgement
In setting value, then circulation is jumped out.
(6): feature selecting:
Binary coding defines whether some feature is selected as input feature vector for trend prediction, as shown in figure 3, its
In 1 indicate selected, 0 indicates not to be selected;
(7): output optimal characteristics:
If meeting (5), the output with conditions optimal characteristics of setting, otherwise (4) are returned to;
(8): building data matrix:
Data matrix is constructed according to the optimal characteristics that (7) are selected;
(9): training set and cross validation assemble:
Training set and cross validation assemble: training set is subjected to tune ginseng using cross validation, 90% is used for training pattern,
10% for verifying model.Parameter optimization, the depth including tree, random like are carried out to random forest using grid-search algorithms
State, the variable number of tree node, the number of tree, OOB false segmentation rate and variable importance estimation etc. promote prediction accuracy, thus
Prediction model is obtained, so that model has degree of well adapting to data.
(10): stock exchange signal is established:
By the data matrix of (8) building as training data, it is input to random forests algorithm and is trained, building transaction letter
Number Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number.The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price.
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
(11): random forest carries out classification prediction:
In life, we are all based on the cognition of things the judgement and classification of feature, can whether for example passing through viviparous
Judge mammal.Random forest is exactly to be illustrated in figure 2 random forest classification non-directed graph using such thought.In tree
At each node, next layer of leaf node is gone out by certain regular splitting according to the performance of feature, the leaf node of terminal is
For final classification results.The key of random forest study is selection optimal dividing attribute.It is divided with layer-by-layer, decision tree branches
The sample class that node is included can gradually reach unanimity, i.e., node split when to make the information gain after node split most
Greatly.
Random forest builds every decision tree according to following two-stage process.Specifically, the first step is known as " row sampling ", from complete
It samples with being put back in body training sample, obtains a Bootstrap data set.Second step is known as " column sampling ", from whole M
M feature (m is less than M), with m feature of Bootstrap data set for new training set, training one are randomly choosed in feature
Decision tree.It is final classification that classification prediction, which is one of classification or classification that most polls are launched from N decision tree, such as Fig. 5 stock
In ticket trend prediction shown in random forest ballot classification method flow chart.Random Forest model building can achieve reduction over-fitting
The effect of probability.In random forest, although each tree is only divided using m ratio characteristics, individually from the point of view of classifying quality
It is remarkably but more stable instead after combining.It might as well be understood that, each decision tree is exactly one and is versed in
The expert of some narrow field (choosing m from the M factor allows each tree to learn), random forest, which then includes many, to be proficient in not
The expert of same domain can be gone to treat it with different angles, final vote is tied to a new problem (new data set)
Fruit.
(12): grid-search algorithms principle:
Grid data service is a kind of exhaustive search method of specified parameter value, and each combination is then used for random forest instruction
Practice, and assesses performance using cross validation.After fitting function attempts all parameter combinations, a suitable classifier is returned,
And it is automatically adjusted to optimal parameter combination.
(13): training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and is calculated with grid search
Method carries out parameter optimization to new optimal characteristics data set, and is compared with practical stock trend, obtains becoming for Prediction of Stock Index
Gesture and the accuracy of prediction.
(14): it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, classified
As a result, using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is obtained
Classification results, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.
Claims (1)
1. a kind of based on the stock yield prediction technique for improving random forests algorithm, which is characterized in that specifically include following step
It is rapid:
Step 1: data acquisition obtains stock day data by website;
Split data into training set, verifying collection, test set;
Step 2: it obtains data and carries out exponential smoothing:
S0=Y0T=0 (1)
St=α * Yt+(1-α)*St-1 t>0 (2)
S in formulatIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;St-1Time
For the smooth value of t-1, α is the exponential smoothing factor, 0 < α < 1;Step 3: feature extraction
According to exponential smoothing result computing technique index, smooth time series data calculates eigenmatrix, and investee is used
To judge the technical indicator of stock trend ups and downs as feature;
Step 4:PSO algorithm carries out feature selecting
Using technical indicator as the particle in particle swarm algorithm, the initial velocity of particle is randomly assigned with position in PSO algorithm, office
Portion optimal solution PidbestIt is the optimal location of particle in the case of current iteration, globally optimal solution PgdbestIt is the optimal position of entire population
It sets;Postulated particle group hunting Spatial Dimension is D, shares m particle, then particle is x in the position in spacei=[xi1,xi2,…,
xiD], speed vi=[vi1,vi2,…,viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: VkThe speed that certain corresponding particle local extremum is tieed up in kth, XkParticle local value kth ties up optimal location,It represents
The local optimum position of population when kth time iterative process,The global optimum of population when representing kth time iterative process
Position, S () indicates sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is by particle speed
Degree is mapped between [0,1], and compared with random number, the location status of more new particle, c1,c2It is Studying factors, and is positive number, w
It is inertia weight, rand1,rand2∈ [0,1], is uniformly distributed at random;
Step 5: setting decision condition:
Set decision condition: if the number of iterations is more than maximum number of iterations, fitness then jumps out circulation lower than the value of setting;
Step 6: feature selecting:
The binary coding that step 4 population feature selecting obtains is used for trend prediction as input feature vector, wherein 1 indicates quilt
It chooses, 0 indicates not to be selected;
Step 7: output optimal characteristics:
If meeting step 5, the output with conditions optimal characteristics of setting, otherwise return step 4;
Step 8: building data matrix:
According to the data matrix for the optimal characteristics building input random forest that step 7 is selected;
Step 9: training set and verifying collection carry out cross validation:
For the prediction accuracy for improving random forest, by training set, verifying collection carries out tune ginseng using cross validation, and 90% for instructing
Practice model, 10% for verifying model;Using grid-search algorithms to random forest progress parameter optimization, the depth including tree,
Stochastic regime, the variable number of tree node, the number of tree, OOB false segmentation rate and variable importance estimation to promote prediction accuracy,
To obtain prediction model, so that model has degree of well adapting to and higher precision to data;
Step 10: the foundation of stock exchange signal, that is, data label:
The data matrix that step 8 is constructed is input to random forests algorithm and is trained as training data, constructs buying signals
Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number;The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price;
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
Step 11: training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and with grid-search algorithms pair
New optimal characteristics data set carries out parameter optimization, and is compared with practical stock trend, obtain the trend of Prediction of Stock Index with
And the accuracy of prediction;
Step 12: model evaluation:
According in random forests algorithm assorting process, classification prediction result can be indicated with confusion matrix, as shown in table 1 below:
1 confusion matrix of table
It is the negative class correctly classified that wherein TP, which is 0, the TN that+1, the TZ correctly to classify correctly classifies, and FP1 is that 0 class mistake is divided into+1
Class, FP2 are that -1 class mistake is divided into+1 class, and FZ1 is that+1 class mistake is divided into 0 class, and FZ2 is that -1 class mistake is divided into 0 class, and FN1 is+1 class
Mistake is divided into -1 class, and FN2 is that 0 class mistake is divided into -1 class, and FP FP1+FP2, FN are (for FN1+FN2, FZ FZ1+FZ2, N=NTP
+ NFN+NFP+NTN+NTZ indicates sample total;
Correct probability is predicted in accuracy rate Accuracy expression test set, recall rate Recall indicates the sample for being originally positive class
The probability of this prediction pair, precision ratio Precision indicate correct probability in the sample of all classes that are predicted to be positive, calculation formula
It is as follows respectively:
Recall=TP/ (TP+FN) (11)
Precision=TP/ (TP+FP) (12)
The integrated performance index that F is made of the weighted average of sensitivity and precision ratio, F value more level off to 1 presentation class knot
Fruit is better, and formula is as follows:
The above parameter is that the another aspect obtained from chaos matrix uses bootstrap in random forest generating process
Method generates training set, and due to being to have the duplicate sampling put back to, compared with initial data, only about 63% data are repeated
It extracts, remainder data is not in that wherein remainder data is exactly the outer data OOB of bag, estimates random forest using data outside bag
The estimation of the generalization ability of algorithm, referred to as OOB;As unit of one tree, with OOB Data Detection to accuracy be OOBscore,
The error detected is exactly the outer error OOB of bagerror, by the OOB of all treeserrorBe averaged be exactly random forest OOB'error,
OOB'errorThe smaller generalization ability for illustrating RF is stronger;Fitness value Fitness is by F and OOB'errorComposition, value is the smaller the better,
Formula is as follows:
OOBerror=1-OOBscore (14)
Fitness=OOB'error+(1-F) (15)
Step 13: it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, obtain classification results,
Using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is classified
As a result, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910180723.7A CN110059852A (en) | 2019-03-11 | 2019-03-11 | A kind of stock yield prediction technique based on improvement random forests algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910180723.7A CN110059852A (en) | 2019-03-11 | 2019-03-11 | A kind of stock yield prediction technique based on improvement random forests algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110059852A true CN110059852A (en) | 2019-07-26 |
Family
ID=67316787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910180723.7A Pending CN110059852A (en) | 2019-03-11 | 2019-03-11 | A kind of stock yield prediction technique based on improvement random forests algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110059852A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659719A (en) * | 2019-09-19 | 2020-01-07 | 江南大学 | Aluminum profile flaw detection method |
CN110766222A (en) * | 2019-10-22 | 2020-02-07 | 太原科技大学 | Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method |
CN110967713A (en) * | 2019-12-10 | 2020-04-07 | 南京邮电大学 | Single-satellite interference source positioning method based on grid search particle swarm algorithm |
CN111199426A (en) * | 2019-12-31 | 2020-05-26 | 上海昌投网络科技有限公司 | WeChat public number ROI estimation method and device based on random forest model |
CN111209960A (en) * | 2020-01-06 | 2020-05-29 | 天津工业大学 | CSI system multipath classification method based on improved random forest algorithm |
CN112182221A (en) * | 2020-10-12 | 2021-01-05 | 哈尔滨工程大学 | Knowledge retrieval optimization method based on improved random forest |
CN112686296A (en) * | 2020-12-29 | 2021-04-20 | 昆明理工大学 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
CN113283472A (en) * | 2021-04-20 | 2021-08-20 | 南京大学 | Data feature selection method based on zero-order optimization |
CN113298107A (en) * | 2020-11-08 | 2021-08-24 | 北京工业大学 | Waste mobile phone identification method based on differential evolution algorithm-deep forest algorithm |
CN113468794A (en) * | 2020-12-29 | 2021-10-01 | 重庆大学 | Temperature and humidity prediction and reverse optimization method for small-sized closed space |
CN113505730A (en) * | 2021-07-26 | 2021-10-15 | 全景智联(武汉)科技有限公司 | Model evaluation method, device, equipment and storage medium based on mass data |
WO2024031332A1 (en) * | 2022-08-09 | 2024-02-15 | 深圳市富途网络科技有限公司 | Stock trend analysis method and apparatus based on machine learning |
-
2019
- 2019-03-11 CN CN201910180723.7A patent/CN110059852A/en active Pending
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659719A (en) * | 2019-09-19 | 2020-01-07 | 江南大学 | Aluminum profile flaw detection method |
CN110659719B (en) * | 2019-09-19 | 2022-02-08 | 江南大学 | Aluminum profile flaw detection method |
CN110766222A (en) * | 2019-10-22 | 2020-02-07 | 太原科技大学 | Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method |
CN110766222B (en) * | 2019-10-22 | 2023-09-19 | 太原科技大学 | PM2.5 concentration prediction method based on particle swarm parameter optimization and random forest |
CN110967713B (en) * | 2019-12-10 | 2021-12-03 | 南京邮电大学 | Single-satellite interference source positioning method based on grid search particle swarm algorithm |
CN110967713A (en) * | 2019-12-10 | 2020-04-07 | 南京邮电大学 | Single-satellite interference source positioning method based on grid search particle swarm algorithm |
CN111199426A (en) * | 2019-12-31 | 2020-05-26 | 上海昌投网络科技有限公司 | WeChat public number ROI estimation method and device based on random forest model |
CN111199426B (en) * | 2019-12-31 | 2023-09-12 | 上海昌投网络科技有限公司 | WeChat public signal ROI estimation method and device based on random forest model |
CN111209960A (en) * | 2020-01-06 | 2020-05-29 | 天津工业大学 | CSI system multipath classification method based on improved random forest algorithm |
CN111209960B (en) * | 2020-01-06 | 2024-01-05 | 天津工业大学 | CSI system multipath classification method based on improved random forest algorithm |
CN112182221A (en) * | 2020-10-12 | 2021-01-05 | 哈尔滨工程大学 | Knowledge retrieval optimization method based on improved random forest |
CN112182221B (en) * | 2020-10-12 | 2022-04-05 | 哈尔滨工程大学 | Knowledge retrieval optimization method based on improved random forest |
CN113298107A (en) * | 2020-11-08 | 2021-08-24 | 北京工业大学 | Waste mobile phone identification method based on differential evolution algorithm-deep forest algorithm |
CN113298107B (en) * | 2020-11-08 | 2024-05-28 | 北京工业大学 | Waste mobile phone identification method based on differential evolution algorithm-depth forest algorithm |
CN113468794A (en) * | 2020-12-29 | 2021-10-01 | 重庆大学 | Temperature and humidity prediction and reverse optimization method for small-sized closed space |
CN112686296B (en) * | 2020-12-29 | 2022-07-01 | 昆明理工大学 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
CN112686296A (en) * | 2020-12-29 | 2021-04-20 | 昆明理工大学 | Octane loss value prediction method based on particle swarm optimization random forest parameters |
CN113283472A (en) * | 2021-04-20 | 2021-08-20 | 南京大学 | Data feature selection method based on zero-order optimization |
CN113505730A (en) * | 2021-07-26 | 2021-10-15 | 全景智联(武汉)科技有限公司 | Model evaluation method, device, equipment and storage medium based on mass data |
WO2024031332A1 (en) * | 2022-08-09 | 2024-02-15 | 深圳市富途网络科技有限公司 | Stock trend analysis method and apparatus based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059852A (en) | A kind of stock yield prediction technique based on improvement random forests algorithm | |
CN103166830B (en) | A kind of Spam Filtering System of intelligent selection training sample and method | |
CN109118013A (en) | A kind of management data prediction technique, readable storage medium storing program for executing and forecasting system neural network based | |
CN111148118A (en) | Flow prediction and carrier turn-off method and system based on time sequence | |
Maknickienė et al. | Application of neural network for forecasting of exchange rates and forex trading | |
CN103105246A (en) | Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm | |
CN111723523B (en) | Estuary surplus water level prediction method based on cascade neural network | |
CN109143408B (en) | Dynamic region combined short-time rainfall forecasting method based on MLP | |
CN110348608A (en) | A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm | |
CN107220841A (en) | A kind of clustering system based on business data | |
CN105956798A (en) | Sparse random forest-based method for assessing running state of distribution network device | |
CN110210974A (en) | A kind of insider trading discriminating conduct based on particle group optimizing Incremental support vector machine | |
CN109002839A (en) | Efficient feature selection method under a kind of more attributive character environment | |
Zhang et al. | Grade prediction of student academic performance with multiple classification models | |
CN116702132A (en) | Network intrusion detection method and system | |
Goni et al. | Graduate admission chance prediction using deep neural network | |
CN115018357A (en) | Farmer portrait construction method and system for production performance improvement | |
CN116993548A (en) | Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM | |
Sun | Real estate evaluation model based on genetic algorithm optimized neural network | |
Gökçe et al. | Performance comparison of simple regression, random forest and XGBoost algorithms for forecasting electricity demand | |
Ullah et al. | Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids | |
Sugumar et al. | A technique to stock market prediction using fuzzy clustering and artificial neural networks | |
CN108537663A (en) | One B shareB trend forecasting method | |
Yu et al. | Loan Approval Prediction Improved by XGBoost Model Based on Four-Vector Optimization Algorithm | |
CN115936773A (en) | Internet financial black product identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190726 |