CN110059852A - A kind of stock yield prediction technique based on improvement random forests algorithm - Google Patents

A kind of stock yield prediction technique based on improvement random forests algorithm Download PDF

Info

Publication number
CN110059852A
CN110059852A CN201910180723.7A CN201910180723A CN110059852A CN 110059852 A CN110059852 A CN 110059852A CN 201910180723 A CN201910180723 A CN 201910180723A CN 110059852 A CN110059852 A CN 110059852A
Authority
CN
China
Prior art keywords
data
stock
prediction
oob
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910180723.7A
Other languages
Chinese (zh)
Inventor
方昕
陈玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910180723.7A priority Critical patent/CN110059852A/en
Publication of CN110059852A publication Critical patent/CN110059852A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of based on the stock yield prediction technique for improving random forests algorithm, when the present invention is for stock yield classification prediction, the difficulty of parameter selection existing for random forest and classification performance problem, the shortcomings that RF algorithm itself can not identify and select more efficient feature, optimize feature selection mechanism in conjunction with particle swarm algorithm, in Long-term change trend initial stage unconspicuous situation, filter out optimal characteristics, and RF algorithm is inputted as attribute, propose the mixed method of PSO-GRID-RF stock trend prediction;The present invention reduces character subset, rejects the duplicate characteristic attribute of unrelated or effect, reduces the dimension of input, reduce the time of stock trend prediction;Under more attributive character environment, efficient feature selection method is proposed, while introducing grid-search algorithms optimization random forest parameter and the accuracy rate of stock trend prediction is greatly improved to improve the classification estimated performance of random forest.

Description

A kind of stock yield prediction technique based on improvement random forests algorithm
Technical field
The invention belongs to finance data mining technical fields, for random forest in the classification forecasting research of stock yield Existing parameter selection is difficult and classification performance problem, proposes a kind of feature selecting based on particle swarm algorithm and grid is searched The new algorithm of the parameter of rope algorithm optimization random forest.Feature selecting is carried out to training set by particle swarm algorithm, rejects removal Redundancy index in index system introduces grid-search algorithms optimization random forest parameter to reduce input dimension, thus Improve the classification estimated performance of random forest.
Background technique
In stock market, for investor, the prediction of stock price tendency is always popular problem.Accurate judgement With the variation tendency for holding entire stock market, the phenomenon that can not only reducing blind investment in stock market, for improving stock The rationality degree realistic meaning with higher of investor in city more can formulate related economic policy for country and provide reference.
Domestic and foreign scholars conduct in-depth research Prediction of Stock Price, propose various prediction techniques.Application now Method there are mainly two types of, Fundamental Analysis and technology analysis.The first kind is based on to bases such as company's growth and profitability The considerations of this factor.Second class is the mathematical analysis based on past stock certificate data, and this simple analysis is by observing stock Movement tendency figure is predicted.More complicated analysis is using complicated statistical method and machine learning algorithm.
Time series analysis is to be applied to the method for Prediction of Stock Price at first, establishes arma modeling to stock opening price Carry out short-term forecast.Due to being influenced by various factors, stock price shows nonlinear change, based on linear model when Between sequence analysis cannot be well reflected stock non-linear change tendencies, precision of prediction is low, using limited.With artificial intelligence The rise of technology, BP neural network is because its powerful non-linear mapping capability is widely used in base in Prediction of Stock Price In the Prediction of Stock Price model SPPM of BP neural network, multiple neural network models are established to stock price and carry out prediction mind Good effect is achieved in nonlinear Prediction of Stock Index through network, but it is unstable to exist simultaneously learning and memory, convergence speed The problem of degree is slow, is easily trapped into local optimum.
Random forests algorithm (Random Forest) is answered in financial field as a kind of sorting technique With compared to support vector machines (Support Vector Machine) and artificial neural network (Artificial Neural Networks), RF obtains better result in stock trend prediction.Random forests algorithm is a kind of model combination, is applied to Original achievement is obtained on different fields.There is training speed is fast, model generalization ability is strong etc. based on random forests algorithm Advantage applies to the algorithm in stock advance-decline prediction, can be avoided the deficiency of above-mentioned prediction model.Random forest method prediction master If first being screened to the Raw performance system of foundation, it is updated to using the achievement data after screening as variation random gloomy Lin Zhong, variable exports ups and downs situation in response.But existing method is lacking the model optimization of random forest itself, cannot Further promote prediction accuracy.
Summary of the invention
The present invention is directed to the deficiency of the technology in the classification forecasting research of stock yield, propose it is a kind of based on improve with The stock yield prediction technique of machine forest algorithm.
A kind of stock yield prediction technique based on improvement random forests algorithm, specifically includes the following steps:
Step 1: data acquisition obtains stock day data by website;
Split data into training set, verifying collection, test set
Step 2: it obtains data and carries out exponential smoothing:
S0=Y0T=0 (1)
St=α * Yt+(1-α)*St-1 t>0 (2)
In formula: StIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;
St-1Time is the smooth value of t-1, and α is the exponential smoothing factor, 0 < α < 1.
Exponential smoothing eliminates the randomness or noise of the variation from historical data, and model is enable easily to identify for a long time Upward price trend.
Step 3: feature extraction: according to exponential smoothing result computing technique index, smooth time series data calculates special Matrix is levied, is used to judge the technical indicator of stock trend ups and downs as feature investee.
Step 4:PSO algorithm carries out feature selecting:
Determine input of the necessary influence index as model, output of the necessary response variable as model, because Building stock index system is to carry out the basis of subsequent evaluation and comprehensive analysis, is mentioned so we carry out feature to technical indicator It takes.We are using technical indicator as the particle in particle swarm algorithm, and the initial velocity of particle and position are all random in PSO algorithm Distribution, locally optimal solution PidbestIt is the optimal location of particle in the case of current iteration, globally optimal solution PgdbestIt is entire population Optimal location.Postulated particle group hunting Spatial Dimension is D, shares m particle, then particle is x in the position in spacei=[xi1, xi2,…,xiD], speed vi=[vi1,vi2,…,viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: VkThe speed that certain corresponding particle local extremum is tieed up in kth, XkParticle local value kth ties up optimal location, The local optimum position of population when representing kth time iterative process,The overall situation of population when representing kth time iterative process Optimal location, S () indicate sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is by grain Sub- speed is mapped between [0,1], and compared with random number, the location status of more new particle, c1,c2It is Studying factors, and is positive Number, w is inertia weight, rand1,rand2∈ [0,1], is uniformly distributed at random;
Step 5: setting decision condition:
Set decision condition: if the number of iterations is more than maximum number of iterations, fitness is then jumped out and is followed lower than the value of setting Ring.
Step 6: feature selecting:
The binary coding that step 4 population feature selecting obtains is used for trend prediction as input feature vector, wherein 1 table Show it is selected, 0 indicate be not selected;
Step 7: output optimal characteristics:
If meeting step 5, the output with conditions optimal characteristics of setting, otherwise return step 4;
Step 8: building data matrix:
Data matrix is constructed according to the optimal characteristics that step 7 is selected;
Step 9: training set, verifying collection cross validation:
By training set, verifying collection carries out tune ginseng using cross validation, and 90% is used for training pattern, and 10% for verifying mould Type.Parameter optimization, the depth including tree, stochastic regime, the variable of tree node are carried out to random forest using grid-search algorithms Number, the number of tree, OOB false segmentation rate and variable importance are estimated to promote prediction accuracy, so that prediction model is obtained, so that Model has degree of well adapting to data.
Step 10: the foundation of stock exchange signal, that is, data label:
The data matrix that step 8 is constructed is input to random forests algorithm and is trained as training data, building transaction Signal Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number.The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price.
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
Step 11: training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and is calculated with grid search Method carries out parameter optimization to new optimal characteristics data set, and is compared with practical stock trend, obtains becoming for Prediction of Stock Index Gesture and the accuracy of prediction.
Step 12: model evaluation:
According in random forests algorithm assorting process, classification prediction result can be indicated with confusion matrix, such as the following table 1 institute Show:
1 confusion matrix of table
It is predicted as+1 It is predicted as 0 It is predicted as -1
True is+1 TP FZ1 FN1
True is 0 FP1 TZ FN2
True is -1 FP2 FZ2 TN
It is the negative class correctly classified that wherein TP, which is 0, the TN that+1, the TZ correctly to classify correctly classifies, and FP1 is 0 class mistake point For+1 class, FP2 is that -1 class mistake is divided into+1 class, and FZ1 is that+1 class mistake is divided into 0 class, and FZ2 is that -1 class mistake is divided into 0 class, FN1 is+ 1 class mistake is divided into -1 class, and FN2 is that 0 class mistake is divided into -1 class, and FP FP1+FP2, FN are (for FN1+FN2, FZ FZ1+FZ2, N= NTP+NFN+NFP+NTN+NTZ indicates sample total.
Correct probability is predicted in accuracy rate Accuracy expression test set, positive class is originally in recall rate Recall expression Sample predictions pair probability, precision ratio Precision indicates correct probability in the sample of all classes that are predicted to be positive, calculates Formula difference is as follows:
Recall=TP/ (TP+FN) (10)
Precision=TP/ (TP+FP) (11)
The integrated performance index that F is made of the weighted average of sensitivity and precision ratio, F value more level off to 1 expression point Class result is better, and formula is as follows:
The above parameter is the another aspect obtained from chaos matrix, in random forest generating process, is used Bootstrap method generates training set, due to being to have the duplicate sampling put back to, compared with initial data, only about 63% Data are repeated extraction, and remainder data is not in that wherein remainder data is exactly the outer data OOB of bag, are estimated using data outside bag Count the generalization ability of random forests algorithm, referred to as OOB estimation;As unit of one tree, the accuracy that is arrived with OOB Data Detection For OOBscore, the error detected is exactly the outer error OOB of bagerror, by the OOB of all treeserrorIt is averaged exactly random forest OOB'error, OOB'errorThe smaller generalization ability for illustrating RF is stronger;Fitness value Fitness is by F and OOB'errorComposition, value are got over Small better, formula is as follows:
OOBerror=1-OOBscore (13)
Fitness=OOB'error+(1-F) (14)
Step 13: it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, classified As a result, using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is obtained Classification results, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.
The present invention has the beneficial effect that:
(1) present invention proposes that efficient feature selection method selects best features conduct by PSO algorithm global search Input variable is input to RF algorithm, reduces character subset, eliminates the duplicate characteristic attribute of unrelated or effect, reduces stock The time of ticket prediction, the accuracy rate of stock trend prediction is greatly improved.
(2) present invention training set use cross validation, effectively consider stock price timing dependence, effectively improve with The accuracy rate of machine forest classified model.
(3) stock yield is a stationary sequence, uses stock yield as label, than using closing price to mark as input Label more can price reflection trend, can effectively improve the accuracy rate of stock trend prediction.
(4) parameter optimization is carried out using grid-search algorithms when the present invention carries out parameter training to random forest, is effectively kept away Exempt from parameter selection difficulty problem when random forests algorithm is predicted, chooses optimal parameter, improve the accuracy rate of trend prediction.
Detailed description of the invention
Fig. 1 is the stock yield research framework figure for improving random forest;
Fig. 2 is random forest classification non-directed graph;
Fig. 3 is binary coding schematic diagram;
Fig. 4 is PSO algorithm flow chart;
Fig. 5 is random forest ballot classification method flow chart in stock trend prediction.
Specific embodiment
Present invention will be further explained below with reference to the attached drawings and examples.
As shown in Fig. 1~5, the present invention is based on particle swarm algorithm, grid-search algorithms, random forests algorithm receives stock The stock trend forecasting method of beneficial rate research.
The invention proposes a kind of stock yield research method of efficient feature selection method under more attributive character environment, Since attributive character scale is big, and continuous variable is belonged to, so decision tree is generated using CART algorithm, specific formula is as follows:
Flow chart of the present invention is as shown in Figure 1, the specific steps are as follows:
(1): data acquisition:
Stock day data are obtained by website, the stock certificate data source that the present invention uses is the websites such as Yahoo (Yahoo), packet Include the opening price of stock exchange, closing price, exchange hand, highest price, lowest price etc. downloads as csv file, and split data into instruction Practice collection, verifying collection, test set.
(2): it obtains data and carries out exponential smoothing:
S0=Y0T=0 (3)
St=α * Yt+(1-α)*St-1 t>0 (4)
S in formulatIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;
St-1Time is the smooth value of t-1, and α is the exponential smoothing factor, 0 < α < 1.
Exponential smoothing eliminates the randomness or noise of the variation from historical data, and model is enable easily to identify for a long time Upward price trend.
(3): feature extraction;
According to exponential smoothing result computing technique index, index will consider the various aspects of the market behavior, the tool of index value Body numerical value and mutual relationship, directly reflection stock market's state in which, provide direction for our operation behavior.Index is anti- The thing reflected is directly can't see from market report mostly.Technical indicator is that smooth time series data calculates feature square Investee's receiving is used to judge the technical indicator of stock trend ups and downs as feature by battle array.
(4): PSO algorithm carries out feature selecting:
The step of Fig. 4 is PSO algorithm flow chart, and population carries out feature selecting.In pso algorithm, optimization problem is turned Turn to a point in d dimension space, referred to as particle, the quality of particle current location is assessed by objective function, objective function according to The position of particle calculates corresponding fitness.Particle is flown in search space with certain speed, this speed is according to its sheet The flying experience of body and the flying experience of companion dynamically adjust, and are then used to calculate the new position of particle.Optimizing Search one In population composed by the particle that group's random initializtion is formed, carry out in an iterative manner, until meeting certain termination condition, Such as reach specified the number of iterations.
The initial velocity of particle and position are all randomly assigned in PSO algorithm, locally optimal solution PidbestIt is current iteration feelings The optimal location of particle under condition, globally optimal solution PgdbestIt is the optimal location of entire population.Postulated particle group hunting Spatial Dimension For D, m particle is shared, then particle is x in the position in spacei=[xi1,xi2,…,xiD], speed vi=[vi1,vi2,…, viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: the speed that certain corresponding particle local extremum is tieed up in kth, particle local value kth tie up optimal location, represent kth The local optimum position of population when secondary iterative process, the global optimum position of population, S when representing kth time iterative process () indicates sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is to map particle rapidity To between [0,1], and compared with random number, the location status of more new particle is Studying factors, and is positive number, is inertia power Weight, it is uniformly distributed at random;
(5): setting decision condition:
Set decision condition: as shown in Figure 4, if the number of iterations is more than maximum number of iterations, fitness is small for condition judgement In setting value, then circulation is jumped out.
(6): feature selecting:
Binary coding defines whether some feature is selected as input feature vector for trend prediction, as shown in figure 3, its In 1 indicate selected, 0 indicates not to be selected;
(7): output optimal characteristics:
If meeting (5), the output with conditions optimal characteristics of setting, otherwise (4) are returned to;
(8): building data matrix:
Data matrix is constructed according to the optimal characteristics that (7) are selected;
(9): training set and cross validation assemble:
Training set and cross validation assemble: training set is subjected to tune ginseng using cross validation, 90% is used for training pattern, 10% for verifying model.Parameter optimization, the depth including tree, random like are carried out to random forest using grid-search algorithms State, the variable number of tree node, the number of tree, OOB false segmentation rate and variable importance estimation etc. promote prediction accuracy, thus Prediction model is obtained, so that model has degree of well adapting to data.
(10): stock exchange signal is established:
By the data matrix of (8) building as training data, it is input to random forests algorithm and is trained, building transaction letter Number Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number.The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price.
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
(11): random forest carries out classification prediction:
In life, we are all based on the cognition of things the judgement and classification of feature, can whether for example passing through viviparous Judge mammal.Random forest is exactly to be illustrated in figure 2 random forest classification non-directed graph using such thought.In tree At each node, next layer of leaf node is gone out by certain regular splitting according to the performance of feature, the leaf node of terminal is For final classification results.The key of random forest study is selection optimal dividing attribute.It is divided with layer-by-layer, decision tree branches The sample class that node is included can gradually reach unanimity, i.e., node split when to make the information gain after node split most Greatly.
Random forest builds every decision tree according to following two-stage process.Specifically, the first step is known as " row sampling ", from complete It samples with being put back in body training sample, obtains a Bootstrap data set.Second step is known as " column sampling ", from whole M M feature (m is less than M), with m feature of Bootstrap data set for new training set, training one are randomly choosed in feature Decision tree.It is final classification that classification prediction, which is one of classification or classification that most polls are launched from N decision tree, such as Fig. 5 stock In ticket trend prediction shown in random forest ballot classification method flow chart.Random Forest model building can achieve reduction over-fitting The effect of probability.In random forest, although each tree is only divided using m ratio characteristics, individually from the point of view of classifying quality It is remarkably but more stable instead after combining.It might as well be understood that, each decision tree is exactly one and is versed in The expert of some narrow field (choosing m from the M factor allows each tree to learn), random forest, which then includes many, to be proficient in not The expert of same domain can be gone to treat it with different angles, final vote is tied to a new problem (new data set) Fruit.
(12): grid-search algorithms principle:
Grid data service is a kind of exhaustive search method of specified parameter value, and each combination is then used for random forest instruction Practice, and assesses performance using cross validation.After fitting function attempts all parameter combinations, a suitable classifier is returned, And it is automatically adjusted to optimal parameter combination.
(13): training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and is calculated with grid search Method carries out parameter optimization to new optimal characteristics data set, and is compared with practical stock trend, obtains becoming for Prediction of Stock Index Gesture and the accuracy of prediction.
(14): it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, classified As a result, using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is obtained Classification results, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.

Claims (1)

1. a kind of based on the stock yield prediction technique for improving random forests algorithm, which is characterized in that specifically include following step It is rapid:
Step 1: data acquisition obtains stock day data by website;
Split data into training set, verifying collection, test set;
Step 2: it obtains data and carries out exponential smoothing:
S0=Y0T=0 (1)
St=α * Yt+(1-α)*St-1 t>0 (2)
S in formulatIndicate the smooth value of time t, YtIndicate the actual value of time t;
S0Indicate data smoothing value when t=0, Y0Indicate the actual value of t=0, t indicates to obtain the number of days of stock day;St-1Time For the smooth value of t-1, α is the exponential smoothing factor, 0 < α < 1;Step 3: feature extraction
According to exponential smoothing result computing technique index, smooth time series data calculates eigenmatrix, and investee is used To judge the technical indicator of stock trend ups and downs as feature;
Step 4:PSO algorithm carries out feature selecting
Using technical indicator as the particle in particle swarm algorithm, the initial velocity of particle is randomly assigned with position in PSO algorithm, office Portion optimal solution PidbestIt is the optimal location of particle in the case of current iteration, globally optimal solution PgdbestIt is the optimal position of entire population It sets;Postulated particle group hunting Spatial Dimension is D, shares m particle, then particle is x in the position in spacei=[xi1,xi2,…, xiD], speed vi=[vi1,vi2,…,viD], i=1 ... m, calculation formula is as follows:
Adjustment space position
In formula: VkThe speed that certain corresponding particle local extremum is tieed up in kth, XkParticle local value kth ties up optimal location,It represents The local optimum position of population when kth time iterative process,The global optimum of population when representing kth time iterative process Position, S () indicates sigmoid function, and using speed as the variable of sigmoid function, adjustment space position is by particle speed Degree is mapped between [0,1], and compared with random number, the location status of more new particle, c1,c2It is Studying factors, and is positive number, w It is inertia weight, rand1,rand2∈ [0,1], is uniformly distributed at random;
Step 5: setting decision condition:
Set decision condition: if the number of iterations is more than maximum number of iterations, fitness then jumps out circulation lower than the value of setting;
Step 6: feature selecting:
The binary coding that step 4 population feature selecting obtains is used for trend prediction as input feature vector, wherein 1 indicates quilt It chooses, 0 indicates not to be selected;
Step 7: output optimal characteristics:
If meeting step 5, the output with conditions optimal characteristics of setting, otherwise return step 4;
Step 8: building data matrix:
According to the data matrix for the optimal characteristics building input random forest that step 7 is selected;
Step 9: training set and verifying collection carry out cross validation:
For the prediction accuracy for improving random forest, by training set, verifying collection carries out tune ginseng using cross validation, and 90% for instructing Practice model, 10% for verifying model;Using grid-search algorithms to random forest progress parameter optimization, the depth including tree, Stochastic regime, the variable number of tree node, the number of tree, OOB false segmentation rate and variable importance estimation to promote prediction accuracy, To obtain prediction model, so that model has degree of well adapting to and higher precision to data;
Step 10: the foundation of stock exchange signal, that is, data label:
The data matrix that step 8 is constructed is input to random forests algorithm and is trained as training data, constructs buying signals Yj={ y1, y2 ..., yj }, wherein j=1,2 ..., n are sample number;The specific construction step of buying signals is as follows:
1) day average price p is calculatedj
Wherein CjIndicate stock price data, HjTable stock highest price, LjIndicate lowest price;
2) the following k days income V that count are calculatedj, k=1,2 ..., 10;
3) buying signals y is constructedj
Step 11: training in sample:
The data matrix constructed by optimal characteristics input random forests algorithm model is trained, and with grid-search algorithms pair New optimal characteristics data set carries out parameter optimization, and is compared with practical stock trend, obtain the trend of Prediction of Stock Index with And the accuracy of prediction;
Step 12: model evaluation:
According in random forests algorithm assorting process, classification prediction result can be indicated with confusion matrix, as shown in table 1 below:
1 confusion matrix of table
It is predicted as+1 It is predicted as 0 It is predicted as -1 True is+1 TP FZ1 FN1 True is 0 FP1 TZ FN2 True is -1 FP2 FZ2 TN
It is the negative class correctly classified that wherein TP, which is 0, the TN that+1, the TZ correctly to classify correctly classifies, and FP1 is that 0 class mistake is divided into+1 Class, FP2 are that -1 class mistake is divided into+1 class, and FZ1 is that+1 class mistake is divided into 0 class, and FZ2 is that -1 class mistake is divided into 0 class, and FN1 is+1 class Mistake is divided into -1 class, and FN2 is that 0 class mistake is divided into -1 class, and FP FP1+FP2, FN are (for FN1+FN2, FZ FZ1+FZ2, N=NTP + NFN+NFP+NTN+NTZ indicates sample total;
Correct probability is predicted in accuracy rate Accuracy expression test set, recall rate Recall indicates the sample for being originally positive class The probability of this prediction pair, precision ratio Precision indicate correct probability in the sample of all classes that are predicted to be positive, calculation formula It is as follows respectively:
Recall=TP/ (TP+FN) (11)
Precision=TP/ (TP+FP) (12)
The integrated performance index that F is made of the weighted average of sensitivity and precision ratio, F value more level off to 1 presentation class knot Fruit is better, and formula is as follows:
The above parameter is that the another aspect obtained from chaos matrix uses bootstrap in random forest generating process Method generates training set, and due to being to have the duplicate sampling put back to, compared with initial data, only about 63% data are repeated It extracts, remainder data is not in that wherein remainder data is exactly the outer data OOB of bag, estimates random forest using data outside bag The estimation of the generalization ability of algorithm, referred to as OOB;As unit of one tree, with OOB Data Detection to accuracy be OOBscore, The error detected is exactly the outer error OOB of bagerror, by the OOB of all treeserrorBe averaged be exactly random forest OOB'error, OOB'errorThe smaller generalization ability for illustrating RF is stronger;Fitness value Fitness is by F and OOB'errorComposition, value is the smaller the better, Formula is as follows:
OOBerror=1-OOBscore (14)
Fitness=OOB'error+(1-F) (15)
Step 13: it is tested outside sample:
After determining optimized parameter, then the random forests algorithm model after having trained is tested with test data, obtain classification results, Using all pretreated sample characteristics of test set as the input of model, the T+k predicted value for obtaining each sample is classified As a result, and be compared with practical stock trend, obtain the trend of Prediction of Stock Index and the accuracy of prediction.
CN201910180723.7A 2019-03-11 2019-03-11 A kind of stock yield prediction technique based on improvement random forests algorithm Pending CN110059852A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910180723.7A CN110059852A (en) 2019-03-11 2019-03-11 A kind of stock yield prediction technique based on improvement random forests algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910180723.7A CN110059852A (en) 2019-03-11 2019-03-11 A kind of stock yield prediction technique based on improvement random forests algorithm

Publications (1)

Publication Number Publication Date
CN110059852A true CN110059852A (en) 2019-07-26

Family

ID=67316787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910180723.7A Pending CN110059852A (en) 2019-03-11 2019-03-11 A kind of stock yield prediction technique based on improvement random forests algorithm

Country Status (1)

Country Link
CN (1) CN110059852A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659719A (en) * 2019-09-19 2020-01-07 江南大学 Aluminum profile flaw detection method
CN110766222A (en) * 2019-10-22 2020-02-07 太原科技大学 Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method
CN110967713A (en) * 2019-12-10 2020-04-07 南京邮电大学 Single-satellite interference source positioning method based on grid search particle swarm algorithm
CN111199426A (en) * 2019-12-31 2020-05-26 上海昌投网络科技有限公司 WeChat public number ROI estimation method and device based on random forest model
CN111209960A (en) * 2020-01-06 2020-05-29 天津工业大学 CSI system multipath classification method based on improved random forest algorithm
CN112182221A (en) * 2020-10-12 2021-01-05 哈尔滨工程大学 Knowledge retrieval optimization method based on improved random forest
CN112686296A (en) * 2020-12-29 2021-04-20 昆明理工大学 Octane loss value prediction method based on particle swarm optimization random forest parameters
CN113283472A (en) * 2021-04-20 2021-08-20 南京大学 Data feature selection method based on zero-order optimization
CN113298107A (en) * 2020-11-08 2021-08-24 北京工业大学 Waste mobile phone identification method based on differential evolution algorithm-deep forest algorithm
CN113468794A (en) * 2020-12-29 2021-10-01 重庆大学 Temperature and humidity prediction and reverse optimization method for small-sized closed space
CN113505730A (en) * 2021-07-26 2021-10-15 全景智联(武汉)科技有限公司 Model evaluation method, device, equipment and storage medium based on mass data
WO2024031332A1 (en) * 2022-08-09 2024-02-15 深圳市富途网络科技有限公司 Stock trend analysis method and apparatus based on machine learning

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659719A (en) * 2019-09-19 2020-01-07 江南大学 Aluminum profile flaw detection method
CN110659719B (en) * 2019-09-19 2022-02-08 江南大学 Aluminum profile flaw detection method
CN110766222A (en) * 2019-10-22 2020-02-07 太原科技大学 Particle swarm parameter optimization and random forest based PM2.5 concentration prediction method
CN110766222B (en) * 2019-10-22 2023-09-19 太原科技大学 PM2.5 concentration prediction method based on particle swarm parameter optimization and random forest
CN110967713B (en) * 2019-12-10 2021-12-03 南京邮电大学 Single-satellite interference source positioning method based on grid search particle swarm algorithm
CN110967713A (en) * 2019-12-10 2020-04-07 南京邮电大学 Single-satellite interference source positioning method based on grid search particle swarm algorithm
CN111199426A (en) * 2019-12-31 2020-05-26 上海昌投网络科技有限公司 WeChat public number ROI estimation method and device based on random forest model
CN111199426B (en) * 2019-12-31 2023-09-12 上海昌投网络科技有限公司 WeChat public signal ROI estimation method and device based on random forest model
CN111209960A (en) * 2020-01-06 2020-05-29 天津工业大学 CSI system multipath classification method based on improved random forest algorithm
CN111209960B (en) * 2020-01-06 2024-01-05 天津工业大学 CSI system multipath classification method based on improved random forest algorithm
CN112182221A (en) * 2020-10-12 2021-01-05 哈尔滨工程大学 Knowledge retrieval optimization method based on improved random forest
CN112182221B (en) * 2020-10-12 2022-04-05 哈尔滨工程大学 Knowledge retrieval optimization method based on improved random forest
CN113298107A (en) * 2020-11-08 2021-08-24 北京工业大学 Waste mobile phone identification method based on differential evolution algorithm-deep forest algorithm
CN113298107B (en) * 2020-11-08 2024-05-28 北京工业大学 Waste mobile phone identification method based on differential evolution algorithm-depth forest algorithm
CN113468794A (en) * 2020-12-29 2021-10-01 重庆大学 Temperature and humidity prediction and reverse optimization method for small-sized closed space
CN112686296B (en) * 2020-12-29 2022-07-01 昆明理工大学 Octane loss value prediction method based on particle swarm optimization random forest parameters
CN112686296A (en) * 2020-12-29 2021-04-20 昆明理工大学 Octane loss value prediction method based on particle swarm optimization random forest parameters
CN113283472A (en) * 2021-04-20 2021-08-20 南京大学 Data feature selection method based on zero-order optimization
CN113505730A (en) * 2021-07-26 2021-10-15 全景智联(武汉)科技有限公司 Model evaluation method, device, equipment and storage medium based on mass data
WO2024031332A1 (en) * 2022-08-09 2024-02-15 深圳市富途网络科技有限公司 Stock trend analysis method and apparatus based on machine learning

Similar Documents

Publication Publication Date Title
CN110059852A (en) A kind of stock yield prediction technique based on improvement random forests algorithm
CN103166830B (en) A kind of Spam Filtering System of intelligent selection training sample and method
CN109118013A (en) A kind of management data prediction technique, readable storage medium storing program for executing and forecasting system neural network based
CN111148118A (en) Flow prediction and carrier turn-off method and system based on time sequence
Maknickienė et al. Application of neural network for forecasting of exchange rates and forex trading
CN103105246A (en) Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm
CN111723523B (en) Estuary surplus water level prediction method based on cascade neural network
CN109143408B (en) Dynamic region combined short-time rainfall forecasting method based on MLP
CN110348608A (en) A kind of prediction technique for improving LSTM based on fuzzy clustering algorithm
CN107220841A (en) A kind of clustering system based on business data
CN105956798A (en) Sparse random forest-based method for assessing running state of distribution network device
CN110210974A (en) A kind of insider trading discriminating conduct based on particle group optimizing Incremental support vector machine
CN109002839A (en) Efficient feature selection method under a kind of more attributive character environment
Zhang et al. Grade prediction of student academic performance with multiple classification models
CN116702132A (en) Network intrusion detection method and system
Goni et al. Graduate admission chance prediction using deep neural network
CN115018357A (en) Farmer portrait construction method and system for production performance improvement
CN116993548A (en) Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM
Sun Real estate evaluation model based on genetic algorithm optimized neural network
Gökçe et al. Performance comparison of simple regression, random forest and XGBoost algorithms for forecasting electricity demand
Ullah et al. Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids
Sugumar et al. A technique to stock market prediction using fuzzy clustering and artificial neural networks
CN108537663A (en) One B shareB trend forecasting method
Yu et al. Loan Approval Prediction Improved by XGBoost Model Based on Four-Vector Optimization Algorithm
CN115936773A (en) Internet financial black product identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190726