CN115392553A - Method for predicting stock price by using similarity of stock historical K-line data - Google Patents

Method for predicting stock price by using similarity of stock historical K-line data Download PDF

Info

Publication number
CN115392553A
CN115392553A CN202210976776.1A CN202210976776A CN115392553A CN 115392553 A CN115392553 A CN 115392553A CN 202210976776 A CN202210976776 A CN 202210976776A CN 115392553 A CN115392553 A CN 115392553A
Authority
CN
China
Prior art keywords
stock
close
stocks
similarity
price
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210976776.1A
Other languages
Chinese (zh)
Inventor
庞小锋
谢博驰
邓亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Expo Finance And Information Technology Co ltd
Original Assignee
Wuhan Expo Finance And Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Expo Finance And Information Technology Co ltd filed Critical Wuhan Expo Finance And Information Technology Co ltd
Priority to CN202210976776.1A priority Critical patent/CN115392553A/en
Publication of CN115392553A publication Critical patent/CN115392553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for predicting stock prices by utilizing similarity of historical K-line data of stocks, which relates to the field of processing of investment information of securities and comprises the following steps: s1, preprocessing stock historical K line data; s2, calculating the similarity among all stocks, and selecting a plurality of stocks with the highest similarity as similar stocks aiming at the target stocks; s3, training by using historical K-line data of similar stocks to obtain a model; and S4, predicting the price of the target stock for a plurality of days in the future by using the model.

Description

Method for predicting stock price by using similarity of stock historical K-line data
Technical Field
The invention relates to the field of processing of securities investment information, in particular to a method for predicting stock prices by utilizing the similarity of historical K-line data of stocks.
Background
In the prior patent CN 201810137791.0-a similar K-line prediction method and device, one or more stocks with the highest similarity are searched according to historical K-line data of the stocks, and the similarity of two sections of K-lines is measured by using a correlation coefficient, so as to realize the prediction of future trends of the stock K-line. The traditional method has approximately correct prediction trend, but is not ideal for fluctuation prediction, and is reflected in fluctuation amplitude difference and the like.
The existing patent CN 202010187881.8-similar K-line search method and search system for stock trend prediction can match candidate stocks with K-line similarity in different dimensions, but does not provide a prediction method.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method for predicting the stock price by using the similarity of the historical K-line data of the stocks, thereby improving the prediction effect.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows: a method for predicting stock prices by using similarity of historical K-line data of stocks comprises the following steps:
s1, preprocessing stock historical K line data;
s2, calculating the similarity among all stocks, and selecting a plurality of stocks with the highest similarity as similar stocks aiming at the target stocks;
s3, training by using historical K-line data of similar stocks to obtain a model;
and S4, predicting the price of the target stock for a plurality of days in the future by using the model.
On the basis of the technical scheme, the step S1 comprises the following steps:
s101, calculating ROC of High, low, open, close and Vol;
step S102, calculating MACD and RSI;
s103, normalizing the characteristic values of High, low, open, close, vol, MACD and RSI; the normalization is calculated as follows:
Figure RE-GDA0003881508950000021
eigenvector X = (X) 1 ,x 2 ,…,x n );
Wherein n represents the number of days, x n Representing the characteristic value of the nth day; mean (X) represents the mean of X, std (X) represents the standard deviation of the feature vector; the data of the stock similarity calculation basis is the feature value after the closing price is normalized;
and S104, performing dimension reduction processing on the normalized characteristic value, and converting the characteristic value of the floating point number into an integer in a specific range.
On the basis of the above technical solution, in step S101, ROC is calculated according to the following formula:
Figure RE-GDA0003881508950000022
wherein P (N) represents the stock price of the current day, P (0) represents the stock price of N days ago, ROC (N) represents the change rate of N days; ROC was calculated for High, low, open, close, and Vol, respectively.
On the basis of the above technical solution, in step S102,
MACD was calculated as follows:
(1) Calculating the EMA;
Figure RE-GDA0003881508950000023
wherein, close represents EMA for closing price, N represents days, P represents N Represents the closing price of the day, EMA (Close, N) represents the EMA of the day, EMA (Close, N-1) represents the EMA of the previous day;
(2) Calculating DIF and DEA;
DIF=EMA(12)-EMA(26);
description of the drawings: subtracting the slow exponential moving average line (26 days EMA) from the fast exponential moving average line (12 days EMA) to obtain DIF;
DEA=EMA(DIF,9);
description of the drawings: DIF denotes EMA for DIF, the EMA value for DIF on day 9 is equal to DEA;
(3) Calculating the MACD;
MACD(N)=2(DIF–DEA);
RSI is calculated as follows:
Figure RE-GDA0003881508950000031
where RS (N) represents the number of average rises of the closing price for N days divided by the number of average falls of the closing price for N days.
On the basis of the above technical solution, in step S104, the feature value after normalization is subjected to dimension reduction processing by using an SAX method, and the feature value of a floating point number is converted into an integer in a specific range; the SAX conversion process is illustrated as follows:
Figure RE-GDA0003881508950000032
represents the normalized eigenvalue matrix, and represents 7 eigenvalue dimensions laterally: MACD, RSI, high, low, open, close, vol, with vertical representing the time dimension t;
and traversing the characteristic value matrix, and mapping the characteristic values into integers in a specific range according to the numerical value.
On the basis of the above technical solution, the step S2 includes the following steps:
step S201, traversing other stocks S2, selecting normalized eigenvector Y of closing price of target stock S1 and other stocks S2 t,close And X t,close (ii) a Normalized eigenvalue vector Z of closing price t,close =(z 1 ,z 2 ,…,z n ) N represents the number of days, z n A normalized feature value representing a closing price on day n;
step S202. Align Y t,close And X t,close Data length of (2), getThe time point when the target stock S1 and the other stocks S2 have data is obtained as Y t And X t
Step S203, calculating Y by utilizing a co-integration inspection algorithm t And X t The similarity of (c).
On the basis of the above technical solution, in step S203, the calculation method includes:
(1) Co-integral representation variable (Y) t And X t ) There is a long-term stable relationship between them, described by the following co-integration equation:
Y t =α 01 X tt
description of the drawings: y is t And X t In time series (t =1, …, n denotes day t), Y t And X t Y and X values on day t, respectively; alpha is alpha 0 And alpha 1 Parameters of a linear regression model; mu.s t Is the non-equilibrium error; if Y is t And X t If the long-term stability relationship between the two is correct, the non-equilibrium error mu is determined t Should be a stationary time series and have a zero expectation, i.e., an I (0) series with a mean of 0;
(2) According to the co-integration formula, the equation is estimated by an OLS method to obtain:
Figure RE-GDA0003881508950000041
description of the drawings:
Figure RE-GDA0003881508950000042
Figure RE-GDA0003881508950000043
and
Figure RE-GDA0003881508950000044
are each Y t ,α 0 And alpha 1 The fitting value of (a);
Figure RE-GDA0003881508950000045
representing a residual sequence;
(3) Examination of
Figure RE-GDA0003881508950000046
The single integrity of the target stock S1 and the probability p of dissimilarity of other stocks S2 are obtained;
(4) After traversing, the probability p that other stocks S2 are not similar to the target stock S1 is sorted, and k stock tickets with the minimum p are selected as similar stocks.
On the basis of the above technical solution, step S3 includes the following steps:
s301, using characteristic value data of all k similar stocks in a machine learning training process, and predicting the number of days to be p; to predict High, low, open, close, 4 models need to be trained;
Figure RE-GDA0003881508950000047
… …
Figure RE-GDA0003881508950000051
… …
x k y k
X=(x 1 …,x n ,…,x k ) Y=(y 1 ,…,y n ,…,y k )
the training data are illustrated below:
x n n =1,2, …, k represents the SAX-converted eigenvalue of the nth similar stock:
the horizontal direction represents 7 eigenvalue dimensions: MACD, RSI, high, low, open, close, vol; the longitudinal direction represents a time dimension, and the training period is t1; x is the number of n Is an independent variable; x is the number of n There were no differences for High, low, open, close, and Vol, and the data were the same during training;
y n n =1,2, …, k represents the normalized eigenvalue of the nth similar stock:
y n is a dependent variableThe time length is p; y is n Is x n Characteristic data shifted backward in time by t1; y is n The feature value dimensionality selected is consistent with the dimensionality to be detected when the High, low, open and Close are different;
s302, training the training data X and Y respectively aiming at High, low, open and Close by using a GBR algorithm, and circularly calculating for M times to obtain 4 GBR models F M,High (x),F M,Low (x),F M,Open (x) And F M,close (x)。
On the basis of the above technical solution, step S302 includes the following steps:
step S3021, initializing a weak learner;
Figure RE-GDA0003881508950000052
description of the drawings: n represents the number of samples, L (y, F (x)) represents a differentiable loss function;
step S3022, calculating the iterative parameters of the learner:
(a) Pseudo-residuals (pseudo-residuals) were calculated using negative gradients:
Figure RE-GDA0003881508950000061
description of the drawings: m =1, …, M denotes the mth cycle; i =1, …, n denotes the ith sample;
(b) Fitting weak learner h with pseudo-residual m (x) Establishing a destination region R jm (j=1,…,m);
(c) For each end zone (i.e. each leaf), γ:
Figure RE-GDA0003881508950000062
s3023, updating the model by using the iterative parameters of the learner according to the following formula, and circularly calculating M times to obtain a strong learning model;
F m (x)=F m-1 (x)+αγ m
description of the drawings: α represents a learning rate.
On the basis of the above technical solution, step S4 includes the following steps:
s401, selecting test data of a target stock in a test period with a time length of t2;
Figure RE-GDA0003881508950000063
description of the drawings: TX represents the SAX-converted eigenvalues of the stock to be predicted:
the horizontal direction represents 7 eigenvalue dimensions: MACD, RSI, high, low, open, close, vol;
the longitudinal direction represents a time dimension, and the test period is t2;
step S402, utilizing 4 GBR models F M,High (x),F M,Low (x),F M,Open (x) And F M,close (x) And the test data respectively predict High, low, open and Close;
step S403, according to the normalization formula
Figure RE-GDA0003881508950000064
X = std (X) × z + mean (X), which may be denormalized, resulting in the target stock forecasted price.
The invention has the beneficial effects that:
according to the invention, a machine learning gradient boosting regression (gradient regression) algorithm is introduced, and the data is subjected to dimensionality reduction processing by using a symbolic approximation aggregation technology (symbological aggregation approximation), so that the time efficiency of training and predicting of a machine learning model is greatly improved; introducing a co-integration method to compare the similarity of the time sequences, and then training and predicting by using historical K-line data of one or more stocks with the highest similarity to obtain a prediction effect with high accuracy; important technical index data MACD (heterodynia moving average line), RSI (relative strength index) and ROC (change rate index) are used for calculating similar stocks and training a machine learning model, so that the prediction effect is improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of the pretreatment of the present invention;
FIG. 3 is a table of eigenvalue mapping breakpoints of the present invention;
FIG. 4 is a flow chart of the calculation of similar stocks of the present invention;
FIG. 5 is a flow chart of GBR algorithm training data of the present invention;
FIG. 6 is an example diagram of K-line stock price prediction according to the present invention;
FIG. 7 is a diagram of a similar K-line embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout.
The technical scheme and the beneficial effects of the invention are clearer and clearer by further describing the specific embodiment of the invention with the accompanying drawings of the specification. The embodiments described below are exemplary by referring to the drawings, are intended to explain the present invention, and should not be construed as limiting the present invention.
Referring to fig. 1, an embodiment of the present invention provides a method for stock price prediction by using similarity of historical K-line data of stocks, and the general flow is as follows:
1. referring to fig. 2, the stock history K-line data is preprocessed, including a highest price High, a lowest price Low, an opening price Open, a closing price Close, a transaction amount Vol, and the like.
First, calculate ROC for High, low, open, close, and Vol.
ROC is calculated as follows:
Figure RE-GDA0003881508950000081
p (N) represents the stock price of the current day, P (0) represents the stock price of N days ago, ROC (N) represents the change rate of N days; ROC is calculated for High, low, open, close, and Vol, respectively.
(two) MACD and RSI are then calculated.
MACD was calculated as follows:
1) EMA (exponential moving average) was calculated.
Figure RE-GDA0003881508950000082
Description of the invention: close represents EMA for closing price, N represents days, P N Denotes the closing price of the day, EMA (Close, N) denotes the EMA of the day, and EMA (Close, N-1) denotes the EMA of the day before.
2) DIF (dispersion value) and DEA (smooth moving average) are calculated.
DIF=EMA(12)-EMA(26)
It is stated that the DIF is obtained by subtracting the slow exponential moving average line (26 days EMA) from the fast exponential moving average line (12 days EMA).
DEA=EMA(DIF,9)
In the specification, DIF indicates that the EMA is equal to DEA for DIF at day 9.
3) The MACD is calculated.
MACD(N)=2(DIF–DEA)
RSI is calculated as follows:
Figure RE-GDA0003881508950000091
description of the drawings: RS (N) represents the number of average rises of the closing price N days divided by the number of average falls of the closing price N days.
And (III) normalizing the characteristic values of High, low, open, close, vol, MACD and RSI obtained in the last step.
The normalization is calculated as follows:
Figure RE-GDA0003881508950000092
eigenvector X = (X) 1 ,x 2 ,…,x n ) N denotes the number of days, x n The day n characteristic value is shown.
Mean (X) represents the mean value of X, std (X) represents the standard deviation of the feature vector; the data of the stock similarity calculation basis is the feature value after the closing price normalization.
And (IV) finally, carrying out dimensionality reduction on the normalized characteristic value by using an SAX (symbolic approximation aggregation) method, and converting the floating point number characteristic value into an integer in a specific range.
The SAX conversion process is illustrated as follows:
Figure RE-GDA0003881508950000093
Figure RE-GDA0003881508950000094
represents the normalized eigenvalue matrix, and represents 7 eigenvalue dimensions laterally: MACD, RSI, high, low, open, close, vol, vertical denotes the time dimension t.
Figure RE-GDA0003881508950000095
And traversing the characteristic value matrix, and mapping the characteristic values into integers in a specific range according to the numerical value.
The mapping specific process is as follows:
a) Referring to fig. 3, the numbers in the leftmost column indicate the alphabet size As, and the numbers to the right indicate the numeric boundary list Bl = (Bl) corresponding to the mapped integers 1 ,…,Bl As-1 ) And has a length of As-1.
b) Selecting As: the larger As, the more accurate the calculation, but the more computationally intensive; the smaller As, the less accurate the calculation, but the smaller the calculation amount.
c) Go through Bl if the eigenvalue is less than Bl j The eigenvalue is mapped to j; otherwise, the traversal is continued.
d) If no mapping value is found, the default mapping value is As.
The normalized eigenvalues are floating point numbers, which are very time consuming if directly involved in the calculation. After SAX is converted into small integers in a specific range, the calculation amount is greatly reduced, and the efficiency of machine learning training and prediction in the following process is improved.
2. Calculate similar stocks for all stocks: the similarity between all stocks is calculated by using a co-integration (co-integration) checking algorithm, and for each stock, the n stocks with the highest similarity are selected.
Referring to FIG. 4, for stock S1, the flow of computing its similar stocks is as follows:
1) Traversing other stocks S2, and selecting normalized feature vector Y of closing price of S1 and S2 t,close And X t,close
Normalized eigenvalue vector Z of closing price t,close =(z 1 ,z 2 ,…,z n ) N represents the number of days, z n Normalized eigenvalues representing day n closing prices.
2) Alignment Y t,close And X t,close The data length of (S1 and S2) to obtain Y t And X t
3) Computing Y using a covariance check algorithm (Engle-Granger covariance check) t And X t The similarity (dissimilarity probability) of (c) is specifically described as follows:
co-integral representing variable (Y) t And X t ) There is a long-term stable relationship between them, described by the following co-ordinated formula.
Y t =α 01 X tt
Description of the formula: y is t And X t In time series (t =1, …, n denotes day t), Y t And X t Y and X values on day t, respectively; alpha is alpha 0 And alpha 1 Parameters of a linear regression model; mu.s t Is an unbalanced error; if Y is t And X t The long-term stability relationship between them is correct, then mu t Should be a stationary time series and have zero expected value, i.e. an I (0) series with a mean value of 0.
Firstly, estimating the equation by utilizing an OLS (orthogonal least squares) method according to a co-integration formula to obtain:
Figure RE-GDA0003881508950000111
and
Figure RE-GDA0003881508950000112
description of the formula:
Figure RE-GDA0003881508950000113
Figure RE-GDA0003881508950000114
and
Figure RE-GDA0003881508950000115
are each Y t ,α 0 And alpha 1 The fitting value of (a);
Figure RE-GDA0003881508950000116
representing the residual sequence.
(III) then testing
Figure RE-GDA0003881508950000117
The probability p that S1 and S2 are dissimilar is obtained.
Figure RE-GDA0003881508950000118
If it is used
Figure RE-GDA0003881508950000119
For the stationary sequence I (0), the variable Y is considered t ,X t Is (1,1) order synergy; otherwise, consider Y t ,X t No coordination relationship exists.
Figure RE-GDA00038815089500001110
Specifically, for
Figure RE-GDA00038815089500001111
Performing ADF (amplified Dickey-Fuller test) test on
Figure RE-GDA00038815089500001112
A unit root test was performed. If the sequence is stable, there is no unit root; otherwise, there will be a unit root.
Figure RE-GDA00038815089500001113
The H0 hypothesis of ADF test is that there is a unit root, the confidence of the hypothesis is calculated to obtain Y t And X t The probability of no co-integration relationship is also the probability that S1 and S2 are dissimilar.
4) And after traversing, sequencing the probability p that other stocks are not similar to the S1, and selecting k stock tickets with the minimum p as similar stocks.
3. Training with historical K-line data of similar stocks: the characteristic value data of the similar stocks after SAX conversion is divided into a training period and a testing period according to time, and data of the training period is trained by utilizing a machine learning Gradient Boosting Regression (GBR) algorithm to obtain a gradient boosting regression model.
GBR is a technique for learning from its errors by integrating a bank of relatively poor learning algorithms to achieve good accuracy, weak learners including linear regression, logistic regression, polynomial regression, stepwise regression, ridge regression, lasso regression, elastic regression.
1) The eigenvalue data for all (k) similar stocks is used in the machine learning training process, predicting the number of days p.
To predict High, low, open, close, 4 models need to be trained.
Figure RE-GDA0003881508950000121
… …
Figure RE-GDA0003881508950000122
… …
x k y k
X=(x 1 …,x n ,…,x k ) Y=(y 1 ,…,y n ,…,y k )
The training data are illustrated below:
Figure RE-GDA0003881508950000123
x n (n =1,2, …, k) represents the SAX-transformed eigenvalues of the nth similar stock: MACD, RSI, high, low, open, close, vol. The vertical direction represents the time dimension, and the training period is t1.x is the number of n Is an independent variable. x is the number of n There were no differences for High, low, open, close, and Vol, and the data were the same during training.
Figure RE-GDA0003881508950000124
y n (n =1,2, …, k) represents the normalized eigenvalue of the nth similar stock: y is n Is a dependent variable and has a time length p. y is n Is x n Characteristic data shifted backward in time by t1. y is n For High, low, open, and Close, the selected eigenvalue dimension is consistent with the dimension to be detected, for example, when Close is predicted, the normalized eigenvalue of Close is selected as training data.
2) Training the training data X and Y respectively aiming at High, low, open and Close by using a GBR algorithm, and circularly calculating for M times to obtain 4 GBR models F M,High (x), F M,Low (x),F M,Open (x) And F M,close (x)。
Referring to fig. 5, the procedure for obtaining the GBR model by training data is as follows:
first, the weak learner is initialized.
Figure RE-GDA0003881508950000131
Description of the drawings: n represents the number of samples (i.e., the number of similar k-lines k of the previous step), and L (y, F (x)) represents the differentiable loss function.
(II) calculating the iterative parameters of the learner:
a) Pseudo-residuals (pseudo-residuals) were calculated using negative gradients:
Figure RE-GDA0003881508950000132
description of the drawings: m =1, …, M denotes the mth cycle; i =1, …, n denotes the ith sample.
b) Fitting weak learner h with pseudo-residual m (x) Establishing a destination region R jm (j=1,…,m)。
c) For each end zone (i.e. each leaf), γ:
Figure RE-GDA0003881508950000133
thirdly, using the iterative parameters of the learner to update the model according to the following formula, and circularly calculating for M times to obtain a strong learning model F M (x):
F m (x)=F m-1 (x)+αγ m
Description of the formula: α represents a learning rate.
4. Price prediction using the model: GBR model F obtained by utilizing last step of training data M (x) And predicting the test data to obtain the price of the stock for p days in the future.
1) And selecting test data of the test period of the stock to be tested with the time length of t2.
Figure RE-GDA0003881508950000134
The test data are illustrated below:
Figure RE-GDA0003881508950000135
TX represents the characteristic value of stock to be predicted after SAX conversion, and the transverse representation represents 7 characteristic value dimensions: MACD, RSI, high, low, open, close, vol.
The vertical direction represents the time dimension, and the test period is t2.
2) Using 4 GBR models F M,High (x),F M,Low (x),F M,Open (x) And F M,close (x) And the test data predicts High, low, open, close, respectively.
3) And performing inverse normalization on the prediction data to obtain the stock prediction price.
According to a normalized formula
Figure RE-GDA0003881508950000141
X = std (X) × z + mean (X), which can be denormalized, resulting in stock prices.
Referring to fig. 6 and 7, an example of stock price prediction using similar k-line historical data is shown.
Examples are illustrated below:
1. fig. 6 is an example of three-fold stocks.
2. FIG. 7 represents a K-line graph of a boss appliance, one of the similar stocks in triple play.
Example detailed procedures are as follows:
1. firstly, historical K-line data of the three-in-one stock is obtained, the time range is the latest three months, and the K-line data is preprocessed.
2. Then after preprocessing aiming at triple-play, the normalized feature vector Y of the closing price t,close Calculating Y t,close Normalized feature vector X with other stock closing prices t,close The probability p of no concordance relation is selected, and 5 stock tickets with the minimum p are used as similar stocks, including boss electrical appliances, koran pharmaceutical industry and the like.
3. Aiming at the highest price, the lowest price, the opening price and the closing price, SAX conversion characteristic values of similar stocks (Boss electrical appliances, kolun pharmaceutical industry and the like) are divided into a training period (two months) and a testing period (one month) according to time, the prediction days are 3 days, and data of the training period are trained by using GBR algorithm respectively to obtain 4 GBR models.
4. And respectively predicting the price (normalized value) of the stock for 3 days in the future by using the corresponding GBR model and the corresponding test data according to the highest price, the lowest price, the opening price and the closing price, and finally performing inverse normalization to obtain the predicted stock price.
In the description of the specification, reference to the description of "one embodiment", "preferably", "an example", "a specific example" or "some examples", etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention, and schematic representations of the terms in this specification do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The present invention is not limited to the above-described embodiments, and it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements are also considered to be within the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims (10)

1. A method for predicting stock prices by using similarity of historical K-line data of stocks is characterized by comprising the following steps:
s1, preprocessing stock historical K line data;
s2, calculating the similarity among all stocks, and selecting a plurality of stocks with the highest similarity as similar stocks aiming at the target stocks;
s3, training by using historical K-line data of similar stocks to obtain a model;
and S4, predicting the price of the target stock for a plurality of days in the future by using the model.
2. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 1, wherein the step S1 comprises the steps of:
s101, calculating ROC of High, low, open, close and Vol;
step S102, calculating MACD and RSI;
s103, normalizing the characteristic values of High, low, open, close, vol, MACD and RSI; the normalization is calculated as follows:
Figure RE-FDA0003881508940000011
eigenvector X = (X) 1 ,x 2 ,…,x n );
Wherein n represents the number of days, x n Representing day n eigenvalues; mean (X) represents the mean of X, std (X) represents the standard deviation of the feature vector; the data of the stock similarity calculation basis at the back is the feature value after the closing price normalization;
and S104, performing dimension reduction processing on the normalized characteristic value, and converting the characteristic value of the floating point number into an integer in a specific range.
3. The method for forecasting stock prices using the similarity of stock history K-line data as claimed in claim 2, characterized in that: in step S101, ROC is calculated as follows:
Figure RE-FDA0003881508940000012
wherein P (N) represents the stock price of the current day, P (0) represents the stock price of N days ago, ROC (N) represents the change rate of N days; ROC is calculated for High, low, open, close, and Vol, respectively.
4. The method for predicting a stock price using the similarity of the stock history K-line data as claimed in claim 2, wherein, in step S102,
MACD was calculated as follows:
(1) Calculating the EMA;
Figure RE-FDA0003881508940000021
wherein, close represents EMA for closing price, N represents days, P represents N Represents the closing price of the day, EMA (Close, N) represents the EMA of the day, EMA (Close, N-1) represents the EMA of the previous day;
(2) Calculating DIF and DEA;
DIF=EMA(12)-EMA(26);
description of the drawings: subtracting the slow exponential moving average line 26 days EMA from the fast exponential moving average line 12 days EMA to obtain DIF;
DEA=EMA(DIF,9);
description of the invention: DIF denotes EMA for DIF, EMA value for DIF at day 9 equals DEA;
(3) Calculating the MACD;
MACD(N)=2(DIF–DEA);
RSI is calculated as follows:
Figure RE-FDA0003881508940000022
where RS (N) represents the number of average rises of the closing price for N days divided by the number of average falls of the closing price for N days.
5. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 2, wherein: in step S104, the normalized eigenvalue is subjected to dimension reduction by using an SAX method, and the floating point eigenvalue is converted into an integer in a specific range; the SAX conversion process is illustrated as follows:
Figure RE-FDA0003881508940000031
represents the normalized eigenvalue matrix, and represents 7 eigenvalue dimensions laterally: MACD, RSI, high, low, open, close, vol, with vertical representing the time dimension t;
and traversing the characteristic value matrix, and mapping the characteristic values into integers in a specific range according to the numerical value.
6. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 1, wherein the step S2 comprises the steps of:
step S201, traversing other stocks S2, selecting normalized eigenvector Y of closing price of target stock S1 and other stocks S2 t,close And X t,close (ii) a Normalized eigenvalue vector Z of closing price t,close =(z 1 ,z 2 ,…,z n ) N represents the number of days, z n A normalized feature value representing a closing price on day n;
step S202, aligning Y t,close And X t,close The data length of the target stock S1 and the time point when the other stocks S2 have data are taken to obtain Y t And X t
Step S203, calculating Y by utilizing a co-integration inspection algorithm t And X t The similarity of (c).
7. The method for predicting stock prices using the similarity of the stock history K-line data as set forth in claim 6, wherein the calculating method comprises:
(1) Co-integral representation variable Y t And X t There is a long-term stable relationship between them, described by the following co-integration equation:
Y t =α 01 X tt
description of the drawings: y is t And X t Time series t =1, …, n denotes day t, Y t And X t Y and X values on day t, respectively; alpha is alpha 0 And alpha 1 Parameters of a linear regression model; mu.s t Is an unbalanced error; if Y is t And X t If the long-term stability relationship between the two is correct, the non-equilibrium error mu is determined t Is a stationary time series and has a zero expected value, i.e., is an I (0) series with a mean value of 0;
(2) According to the co-integration formula, the equation is estimated by an OLS method to obtain:
Figure RE-FDA0003881508940000041
and
Figure RE-FDA0003881508940000042
description of the invention:
Figure RE-FDA0003881508940000043
and
Figure RE-FDA0003881508940000044
are each Y t ,α 0 And alpha 1 The fitting value of (a);
Figure RE-FDA0003881508940000045
representing a residual sequence;
(3) Examination of
Figure RE-FDA0003881508940000046
The single integrity of the target stock S1 and the probability p of dissimilarity of other stocks S2 are obtained;
(4) After traversing, the probability p that other stocks S2 are not similar to the target stock S1 is sorted, and k stock tickets with the minimum p are selected as similar stocks.
8. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 1, wherein the step S3 comprises the steps of:
s301, using characteristic value data of all k similar stocks in a machine learning training process, and predicting the number of days to be p; to predict High, low, open, close, 4 models need to be trained;
Figure RE-FDA0003881508940000047
the training data are illustrated below:
x n n =1,2, …, k represents the SAX-converted eigenvalue of the nth similar stock:
the horizontal direction represents 7 eigenvalue dimensions: MACD, RSI, high, low, open, close, vol; the longitudinal direction represents a time dimension, and the training period is t1; x is a radical of a fluorine atom n Is an independent variable; x is the number of n There are no differences for High, low, open, close, and Vol, the data is the same during training;
y n n =1,2, …, k denotes the normalized eigenvalue of the nth similar stock:
y n is a dependent variable with a time length of p; y is n Is x n Characteristic data shifted backward in time by t1; y is n The feature value dimensionality selected is consistent with the dimensionality to be detected when the High, low, open and Close are different;
s302, training the training data X and Y respectively aiming at High, low, open and Close by using a GBR algorithm, and circularly calculating for M times to obtain 4 GBR models F M,High (x),F M,Low (x),F M,Open (x) And F M,close (x)。
9. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 1, wherein the step S302 comprises the steps of:
step S3021, initializing a weak learner;
Figure RE-FDA0003881508940000051
description of the invention: n represents the number of samples, L (y, F (x)) represents a differentiable loss function;
step S3022, calculating the iterative parameters of the learner:
(a) The pseudo-residual is calculated using the negative gradient:
Figure RE-FDA0003881508940000052
description of the drawings: m =1, …, M denotes the mth cycle; i =1, …, n denotes the ith sample;
(b) Fitting weak learner h with pseudo-residual m (x) Establishing a destination region R jm ,j=1,…,m;
(c) For each endpoint zone, γ:
Figure RE-FDA0003881508940000053
s3023, updating the model by using the iterative parameters of the learner according to the following formula, and circularly calculating M times to obtain a strong learning model;
F m (x)=F m-1 (x)+αγ m
description of the drawings: α represents a learning rate.
10. The method for predicting a stock price using the similarity of the stock history K-line data as set forth in claim 1, wherein the step S4 comprises the steps of:
s401, selecting test data of a target stock in a test period with a time length of t2;
Figure RE-FDA0003881508940000061
description of the drawings: TX represents the SAX-converted eigenvalues of the stock to be predicted:
the horizontal represents 7 eigenvalue dimensions: MACD, RSI, high, low, open,
Close,Vol;
the longitudinal direction represents a time dimension, and the test period is t2;
step S402, utilizing 4 GBR models F M,High (x),F M,Low (x),F M,Open (x) And F M,close (x) And the test data respectively predict High, low, open and Close;
step S403, according to the normalization formula
Figure RE-FDA0003881508940000062
X = std (X) × z + mean (X), which can be denormalized, resulting in the target stock forecasted price.
CN202210976776.1A 2022-08-15 2022-08-15 Method for predicting stock price by using similarity of stock historical K-line data Pending CN115392553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210976776.1A CN115392553A (en) 2022-08-15 2022-08-15 Method for predicting stock price by using similarity of stock historical K-line data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210976776.1A CN115392553A (en) 2022-08-15 2022-08-15 Method for predicting stock price by using similarity of stock historical K-line data

Publications (1)

Publication Number Publication Date
CN115392553A true CN115392553A (en) 2022-11-25

Family

ID=84118115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210976776.1A Pending CN115392553A (en) 2022-08-15 2022-08-15 Method for predicting stock price by using similarity of stock historical K-line data

Country Status (1)

Country Link
CN (1) CN115392553A (en)

Similar Documents

Publication Publication Date Title
CN107765347B (en) Short-term wind speed prediction method based on Gaussian process regression and particle filtering
CN109376892B (en) Equipment state prediction method based on life cycle stage of equipment
CN111680870B (en) Comprehensive evaluation method for quality of target motion trail
CN110751339A (en) Method and device for predicting corrosion rate of pipeline and computer equipment
CN116448419A (en) Zero sample bearing fault diagnosis method based on depth model high-dimensional parameter multi-target efficient optimization
CN110163743A (en) A kind of credit-graded approach based on hyperparameter optimization
CN114239397A (en) Soft measurement modeling method based on dynamic feature extraction and local weighted deep learning
CN115115416B (en) Commodity sales predicting method
CN105046203B (en) The adaptive hierarchy clustering method of satellite telemetering data based on angle DTW distances
CN109359388A (en) A kind of Complex simulation systems credibility evaluation method
CN117686442B (en) Method, system, medium and equipment for detecting diffusion concentration of chloride ions
CN105894138A (en) Optimum weighted composite prediction method for shipment amount of manufacturing industry
Ding et al. Dirichlet process mixture models with shrinkage prior
CN111967168A (en) Optimization design method for accelerated degradation test scheme
CN115392553A (en) Method for predicting stock price by using similarity of stock historical K-line data
CN116151108A (en) Method and device for predicting residual life of aluminum electrolysis cell
CN110673470B (en) Industrial non-stationary process soft measurement modeling method based on local weighting factor model
CN114491699A (en) Three-dimensional CAD software usability quantification method and device based on expansion interval number
Gong et al. Confidence calibration for intent detection via hyperspherical space and rebalanced accuracy-uncertainty loss
CN113506021A (en) Index dimensionless processing method for comprehensive evaluation
CN113035363A (en) Probability density weighted genetic metabolic disease screening data mixed sampling method
CN110580494A (en) Data analysis method based on quantile logistic regression
CN114386196B (en) Method for evaluating mechanical property prediction accuracy of plate strip
CN115184859B (en) Method for eliminating ranging and angle measurement errors under construction of non-line-of-sight propagation scene
Thant et al. Impact of Normalization Techniques in Microarray Data Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination