AU2018101131A4 - A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm - Google Patents

A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm Download PDF

Info

Publication number
AU2018101131A4
AU2018101131A4 AU2018101131A AU2018101131A AU2018101131A4 AU 2018101131 A4 AU2018101131 A4 AU 2018101131A4 AU 2018101131 A AU2018101131 A AU 2018101131A AU 2018101131 A AU2018101131 A AU 2018101131A AU 2018101131 A4 AU2018101131 A4 AU 2018101131A4
Authority
AU
Australia
Prior art keywords
stock
data
value
error
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2018101131A
Inventor
Yitao Liu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2018101131A priority Critical patent/AU2018101131A4/en
Application granted granted Critical
Publication of AU2018101131A4 publication Critical patent/AU2018101131A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

Abstract There is provided a parameter-optimization processing based on the minimum prediction error for choosing the optimal parameter k for KNN, comprising: defining the range of k; running KNN algorithm using every different k-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding k-value, which is the best k-value. Step 1: The acquisition of the financial stock data Step 2: Ihe design of the KN N algorithm Step 3: The parameter optimization of the KNN algorithm Step 4: The real-time prediction through the method Fig.1

Description

A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm
Field of the invention [1] This invention relates to information processing, particularly, a real-time accurate stock prediction system based on KNN algorithm.
Background of the invention [2] With the development of society and economy, more and more people start to improve their personal asset wealth by investing in stocks. Stock is a kind of negotiable securities, it is a voucher issued by incorporated company that certifies the share held by a shareholder (Reference [1]). And the so-called stock investment is that people use their capital to buy and sell stocks, gain profits through the price difference of the stocks between buying and selling in the stock market (Reference [2]). However, stock investing is very risky. In general, the risks of losing money and making money are equally high. The reason attributes to the complexity of stock market and the uncertainty of stock price. Stock market is a secondary market based on primary market. Therefore, it is more complex than primary market, and has larger function and influence (Reference [3]). Stock market is closely related to a region’s economy and society, and the factors that affect price are more complicated. The state of operation, the outlook of industry, and the region’s overall economy are all constantly affecting the price of the stock. Social and political factors can also change the trend of the price hugely. In addition, the psychological factor is also a part that cannot be ignored. Such complicated factors determine the high risk and uncertainty of stock market.
[3] Therefore, for investors, the continuous analysis and accurate prediction for stock price are extremely important. By far, there are two main methods for stock analyzing: Fundamental Analysis and Technical Analysis. Fundamental Analysis mainly focuses on the intrinsic value of the enterprise, by detailly analyzing the factors which determine the value of company and the price of stock such as the macroeconomic situation, the future of industry development, the business situation of the company and so on, people can roughly estimate the long-term investment value and margin of safety of the listed companies, compare with the current stock price, and form the corresponding investment proposal (Reference [4]). However, there are some limitations in Fundamental Analysis. As mentioned above, Fundamental Analysis has high requirements for investors. For ordinary investors, the access of getting policies and information from industry and companies is limited, and the institutions normally do not publish their research report, so the source of information is relatively closed. What’s more, because there are a lot of cheaters and misleading message in society, investors are also required to have strong information processing ability. Comparing with big capitalists, ordinary investors are at a disadvantage in capital amount, message accessing, and knowledge composition, so it is not quite likely to get stable excess profit on stock investment by using the Fundamental Analysis (Reference [5]).
[4] Therefore, if an ordinary investor wants to master the change pattern of stock, it must depend the Technical Analysis. By analyzing the history pattern of price and many different technical indexes, people can achieve their goals. As a technique based on real data, Technical Analysis has several main advantages. Firstly, it is straightforward. It is the biggest advantage. Ordinary investors usually cannot pick the useful information from the complex data, but Technical Analysis can help with it and present the information in a straightforward way. Secondly, it is multifarious. Because different investors have different investment goals, Technical Analysis can exactly satisfy those requests with different type. Last, Technical Analysis has unity. Although stocks from different companies have different prices and trends, they have unity from the data point of view. By comparison, investors can well judge whether to buy or to sell different stocks.
[5] Over the past decades, the rapid development of Technical Analysis benefits from the increasing maturity of big data, cloud computing, machine learning and so on. As the development of theory and technology of machine learning, the analysis and prediction toward stock market become more advanced. More and more researchers hope to use machine learning algorithm to deeply analyze and excavate stock market and get the rule of falling and rising In recent years, researching stock through machine learning algorithm is a popular topic for scholars both at home and abroad. Some people start with the influence of news reports (Reference [6], [7]), and social media (Reference [8], [9]) to the market, some other people build their models directly from the historical prices (Reference [5]). Although these algorithms are different, they all fully utilize the computer’s ability of fast calculating and dealing complex data.
[6] Therefore, this invention adopts the machine learning algorithm (KNN algorithm) to solve the accurate prediction of stock price.
SUMMARY OF THE INVENTION
[7] There is provided a parameter-optimization processing based on the minimum prediction error for choosing the optimal parameter k for KNN, comprising: defining the range of /; running KNN algorithm using every different /-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding /-value, which is the best /-value.
[8] The parameter-optimization processing is designed to improve our KNN algorithm. With the best parameter /, KNN algorithm is able to give its best prediction.
[9] This invention is designed to focus on the real-time accurate prediction of stock price. Based on the idea of Technical Analysis, it uses the technology of machine learning to analyze and predict stock price. It is useful and convenient for ordinary investor, because it can predict stock price accurately without requiring too much professional information and knowledge. Only certain stock data is needed, and it will provide some reliable prediction. The method used in this invention is based on KNN algorithm. KNN algorithm, also known as K-Nearest Neighbor algorithm, is a valid method for classification. It measures the distance between new input and known data, and choose k closest data. Based on these k neighbors, the algorithm determines the category of new data. In this case, because stock price has no category, the invention will use the algorithm of regression to calculate the average prices of these k neighbors, and return the result as predicted price of new stock. In this invention, there are five basic steps to achieve its goal: 1) Calculate the Euclidean distances between the new data and the known data from the database; 2) Sort the Euclidean distances in an ascending order; 3) Select k closest (smallest Euclidean distance) data to the new data; 4) Use the algorithm of regression to find the average prices of the k closest prices; 5) Return this value as the predicted price of the new stock.
[ 10] Also, in order to get a better prediction, a parameter-optimization program is designed. In KNN algorithm, choosing new data’s k neighbors is important. Different k-values would bring us different predicted prices, then what is the best k-value? In order to find the best k-value, parameter-optimization program is needed. After running the parameter-optimization program, we finally find the best k-value, in other words, the best parameter.
[11] Using the optimized k-value, the invention is able to predict stock price accurately.
Brief description of the drawings [12] Features of the present disclosure are illustrated by way of non-limiting examples, and like numerals indicate like elements, in which:
Fig. 1 illustrates a real-time stock prediction method based on the KNN algorithm in accordance with the present invention;
Fig. 2 illustrates a process of parameter optimization of the KNN algorithm in accordance with the present invention;
Fig. 3 illustrates an implementation process in accordance with the present invention;
Fig. 4 illustrates a format of the data in accordance with the present invention;
Fig. 5(a) illustrates tendency of predicted prices in accordance with the present invention; and Fig. 5(b) illustrates tendency of actual prices.
Best modes of the invention [13] In order to solve the shortcomings of the existing technology, this invention offers a real-time stock prediction method that based on the KNN algorithm. By thoroughly excavating the effective information from the stock data, it can effectively solve the problem about predicting stock price accurately and precisely, and provides scientific basis of stock investment for ordinary investors.
[ 14] The technical program of the invention is realized below.
[15] A real-time stock prediction method based on the KNN algorithm, including: the acquisition of the financial stock data, the design of the KNN algorithm, the parameter optimization of the KNN algorithm (flow chart), and the real-time prediction through the method.
[16] Fig. 1 illustrates a real-time stock prediction method based on the KNN algorithm in accordance with the present invention.
[17] Step 1: The acquisition of the financial stock data: the invention uses the method of web crawler to obtain stock data.
[18] Step 2: The design of the KNN algorithm: K-Nearest Neighbor algorithm, short as KNN algorithm, uses the measuring of the distance of different characteristic values to do the classification. Its idea is simple: if the most samples of a new sample’s most similar k samples belong to a certain category, then this new sample also belongs to that category. In general, it is to input a new data, then select the first k data from the sample database that is most similar to the new input data, then choose the category where most selected data belong to. Its algorithm can be roughly divided into 5 steps: 1) Calculate the distances between the current point and the known points from the database; 2) Sort the distances in an ascending order; 3) Select / nearest points from the current point; 4) Determine the frequency of occurrence of the category of each selected points; 5) Return the category with the highest frequency as the predicted category of the current point.
[19] In particular to this invention, because it is designed to predict stock price, there are some particularity in this case. Therefore, based on the 5 steps above, the invention has some difference in algorithm. Because the final output is a data about price rather than a label, this invention adopts the algorithm of regression instead of classification.
[20] First, the sample database is obtained from the stock’s historical data. Assume the size of the sample database is m, then the samples can be expressed like this:
(1) [21] In formula (1), X t represents the data of the stock, so we can build a matrix about the data of the stock:
(2) ja represents the price of the stock, so we can build a matrix about the price of the stock:
(3) [22] Assume each stock has n characteristic values, then:
(4) [23] The goal of the invention is to input a new stock X , then predict and export its price y . The specific steps of the method are as follows:
Step 2.1: Calculate the distances between the current data and the known data from the database:
Step 2.1.1: Combine X in formula (2) and Xf in formula (4), set the database X. Each row represents a stock’s data, each stock has n characteristic values:
(5)
Step 2.1.2: Input a new stock data X : X = [x ,,x 2,...,x ], duplicate its data by m rows and form a m x n matrix, in convenient to find the distances between X and all stocks Xf in database:
(6)
Step 2.1.3: Finding Euclidean distance: Euclidean metric is a commonly used distance definition. It refers the real distance between two points in n-dimensional space, or the natural length of a vector (the distance from the point to the origin). Assume there are two points x and y in a n -dimensional space, then the formula of Euclidean distance is:
(7)
The following 4 steps are the applications of formula (7) in this invention:
Step 2.1.3.1: Calculate the difference of each characteristic values between new data and every data in the database. Subtract T in equation (6) by X in equation (5):
(8)
Step 2.1.3.2: Square every element of matrix T -X in equation (8):
(9)
Step 2.1.3.3: Find the sum of each row of Q in equation (9). Each sum represents
a value that relates every characteristic value of new data with data in database:
(10)
Step 2.1.3.4: Find the square root of each row of d in (10), then we finally get the Euclidean distance D:
(Π)
Step 2.2: Sort the Euclidean distances in an ascending order:
Step 2.2.1: Convert matrix D in equation (11) into a list:
(12)
Step 2.2.2: Sort the list D in an ascending order, get a new list s :
(13)
List S represents the original indexes of elements in D in a sorted way. The elements in S represents the indexes of elements in D.
Step 2.3: Select k closest (smallest Euclidean distance) data to the new data:
Step 2.3.1: From list 5 in equation (13), select k smallest values Sl,S2,...,Sk (it is, the indexes of the k smallest distances in list D of equation (12)).
Step 2.3.2: Convert the price matrix y in equation (3) into a list:
(14)
Step 2.3.3: According the selected values we found in Step 2.3.1 (Sl,S2,...,Sk) as indexes, find the corresponding values in list y of equation (14). These values make up a new list yk. These values are the k closest prices to the new stock’s price:
(15)
Step 2.4: Use the algorithm of regression to find the average value of the k closest values (prices). Add all elements in list yk in equation (15), then divide by/:
(16)
Step 2.5: Return this average value as the predicted price (y ) of the new stock: y„ = yavg (17) [24] Step 3: The process of parameter optimization of the KNN algorithm is shown in Fig. 2. While applying the KNN algorithm, there is an important question: How to determine the value of /. What is the most reasonable range of /? In order to solve this problem, we designed some procedures. First, pick some stock data with known prices, use different /-values, and run the KNN algorithm. Next, compare the predicted prices with the stock’s actual prices, calculate the error, and find one smallest error. According to this error, find out the corresponding /-value that is being used in KNN algorithm to get the error. This /-value is the best /-value.
[25] Step 3.1: Initialize the program.
[26] Step 3.2: Define the range of k. kmm <k< kmn. The minimum value £min cannot be smaller than 1, the maximum value £max cannot be bigger than the size of sample database m.
[27] Step 3.3: Input the first k-value: k = kmm .
[28] Step 3.4: Pick some stock data with known prices [Xpi,yp^, input data Xp., and use the current /<-value to run KNN algorithm, get a predicted price y.
[29] Step 3.5: Test the error between predicted price y and actual price y : Δ = ypc-yp (18)
Store the Δ value into a list E .
[30] Step 3.6: Change the current k-value: k = k +1.
[31] Step 3.7: Decide: if the current k-value is not bigger than the preseted maximum kmM . If the result is yes, then jump back to Step 3.4. If the result is no, then continue to execute the next step.
[32] Step 3.8: From list E , find the smallest Δ : Δηιιη . According to this Δηιιη , find the corresponding k-value that is being used to get Δηιιη. This k-value is the best k-value.
[33] Step 3.9: End.
[34] Step 4: The real-time prediction through the method: Using the parameter optimized by Step 3 and running KNN algorithm, the real-time stock information can be accurately predicted in real time.
[35] The implementation process is shown in Fig. 3. The steps in Fig. 3 are described below.
[36] Step 1: Use pandasdatareader to download stock data from https://fmance.yahoo.com/quote/MSFT?ltr=l, the format of the data is shown in Fig. 4: [37] Step 2: Pick 1323 data from column Open’, ‘High’, ‘Low’, ‘Close’. Separate the picked data into two parts: Train Data and Test Data, with the ratio of 7:3. Train Data is used for parameter optimization. Test Data is used for real-time prediction. The number of Train Data is 926, and the number of Test Data is 397.
[38] Step 3: Separate Train Data into two parts: History Data and Optimization Data, with the ratio of 4:1. History Data is used for the information from the environment, in other word, it is the database. Optimization Data is used for finding the best parameter. The number of History Data is 740, and the number of Optimization Data: 186.
[39] Step 4: Parameter optimization, the basic steps are as follows:
Step 4.1: Use History Data as database X, its size m = 740.
Step 4.2: Input Optimization Data as new inputs X , the total number of new inputs is 186.
Step 4.3: Set the range of k: 2 < k < 186 , calculate the prediction error when / = 2,3,...,
For every single stock, the formula of prediction error is shown in formula (18). However, there are 186 stocks, so the formula for this case has a little bit difference. Like formula (18), y represents predicted price, y represents actual price, n represents the number of data: « = 186, then the formula of prediction error is:
(19)
Store all the prediction errors Δ in a list E .
Step 4.4: From list E , find the smallest prediction error Δηιιιι. According to Amin , find the corresponding /-value. This /-value is the best parameter.
[40] Step 5: Input all the data from Test Data, run KNN algorithm with optimized parameter, get the prediction results. The tendency of predicted prices is shown in Fig. 5(a), the tendency of actual prices is shown in Fig. 5(b). It is noteworthy that the two charts are very similar. In other words, the prediction result is pretty reliable.
[41] Also, by using formula (19) in parameter-optimization, the prediction error for Test Data can be calculated. The prediction error for Test Data: Δ = 0.01. The prediction error is relatively small, so it proves that the prediction is valid and reliable.
Reference:
[1] Xiaosheng Su, Caicai Ding, Jiachen Du. Basic Knowledge of the Securities Market. Tsinghua University Press: 84.
[2] https://baike.baidu.com/item/%E7%82%92%E8%82%Al/524402?fr=aladdin [3] https://baike.baidu.com/item/%E8%82%Al%E7%A5%A8%E5%B8%82%E5%9C%BA/233854?fr=aladdin [4] https://baike.baidu.com/item/%E5%9F%BA%E6%9C%AC%E5%88%86%E6%9E%90 [5] Z Hao. A Stock Prediction System Based on Data Mining. 2017.
[6] Robert P. Schumaker, Hsinchu Chen. A Quantitative Stock Prediction System based on Financial News[J]. Information Processing &amp; Management, 2009(5): 571-583.
[7] X Li, H Xie, Y Song, Q Li. Intelligent Systems IEEE[J], Does Summarization Help Stock Prediction? ANews Impace Analysis [J]. Intelligent Systems IEEE, 2015(3): 26-34.
[8] M Skuza, A Romanowski. Sentiment analysis of Twitter data within bid data distributed environment for stock prediction[C]. Conference on Computer Science &amp; Information Systems, 2015:1349-1354.
[9] Z Chen, X Du. Study of Stock Prediction Based on Social Network[C]. International Conference on Social Computing, 2013(1): 913-916.

Claims (2)

1. A parameter-optimization processing based on the minimum prediction error for chooing the optimal parameter k for KNN, comprising: defining the range of k; running KNN algorithm using every different A'-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding A'-value, which is the best /<-value.
AU2018101131A 2018-08-11 2018-08-11 A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm Ceased AU2018101131A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018101131A AU2018101131A4 (en) 2018-08-11 2018-08-11 A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2018101131A AU2018101131A4 (en) 2018-08-11 2018-08-11 A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm

Publications (1)

Publication Number Publication Date
AU2018101131A4 true AU2018101131A4 (en) 2018-09-13

Family

ID=63452279

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018101131A Ceased AU2018101131A4 (en) 2018-08-11 2018-08-11 A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm

Country Status (1)

Country Link
AU (1) AU2018101131A4 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949852A (en) * 2020-08-31 2020-11-17 东华理工大学 Macroscopic economy analysis method and system based on internet big data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949852A (en) * 2020-08-31 2020-11-17 东华理工大学 Macroscopic economy analysis method and system based on internet big data

Similar Documents

Publication Publication Date Title
Tsai et al. Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches
Li et al. The effect of news and public mood on stock movements
Kalra et al. Efficacy of news sentiment for stock market prediction
Won et al. Using genetic algorithm based knowledge refinement model for dividend policy forecasting
Stanisic et al. Corporate bankruptcy prediction in the Republic of Serbia
Huang et al. Financial speculation or capital investment? Evidence from relationship between corporate financialization and green technology innovation
Soroushyar Auditor characteristics and the financial reporting quality: the moderating role of the client business strategy
Baser et al. Gold commodity price prediction using tree-based prediction models
Li et al. Optimization of investment strategies through machine learning
Chen et al. Prioritizing real estate enterprises based on credit risk assessment: an integrated multi-criteria group decision support framework
Colapinto et al. Goal programming for financial portfolio management: a state-of-the-art review
AU2018101131A4 (en) A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm
Tirea et al. Intelligent stock market analysis system-a fundamental and macro-economical analysis approach
Mimovic et al. A Multicriteria Decision Making Approach to Performance Evaluation of Mutua Funds: A Case Study in Serbia
Wei et al. Forecasting and trading Bitcoin with machine learning techniques and a hybrid volatility/sentiment leverage
Du et al. Design and Implementation of China Financial Risk Monitoring and Early Warning System Based on Deep Learning
Udagawa Mining stock price changes for profitable trade using candlestick chart patterns
Chen The apply of ID3 in stock analysis
Thanathamathee et al. Discovering Future Earnings Patterns through FP-Growth and ECLAT Algorithms with Optimized Discretization
Rong [Retracted] Dynamic Cause Analysis of Quantitative Investment Using Grey Correlation Analysis
Dou et al. Establishment and Analysis of Multi-Factor Stock Selection Model Based on Support Vector Machine in CSI 300 Index Constituent Stocks Market
Hu et al. Research on the rules of ESG performance and value creation based on rough sets
Kim New technologies and stock returns
Celina et al. Algorithm-Driven Predictive Analysis of Blue-Chip Stocks in the Murky Indian Environment
Li et al. XGBoost-based Survival Analysis in Business Risk Prediction

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry