AU2018101131A4

AU2018101131A4 - A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm

Info

Publication number: AU2018101131A4
Application number: AU2018101131A
Authority: AU
Inventors: Yitao Liu
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-08-11
Filing date: 2018-08-11
Publication date: 2018-09-13
Anticipated expiration: 2026-08-11

Abstract

Abstract There is provided a parameter-optimization processing based on the minimum prediction error for choosing the optimal parameter k for KNN, comprising: defining the range of k; running KNN algorithm using every different k-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding k-value, which is the best k-value. Step 1: The acquisition of the financial stock data Step 2: Ihe design of the KN N algorithm Step 3: The parameter optimization of the KNN algorithm Step 4: The real-time prediction through the method Fig.1

Description

A Real-time Accurate Stock Price Prediction System Based on KNN Algorithm

Field of the invention [1] This invention relates to information processing, particularly, a real-time accurate stock prediction system based on KNN algorithm.

Background of the invention [2] With the development of society and economy, more and more people start to improve their personal asset wealth by investing in stocks. Stock is a kind of negotiable securities, it is a voucher issued by incorporated company that certifies the share held by a shareholder (Reference [1]). And the so-called stock investment is that people use their capital to buy and sell stocks, gain profits through the price difference of the stocks between buying and selling in the stock market (Reference [2]). However, stock investing is very risky. In general, the risks of losing money and making money are equally high. The reason attributes to the complexity of stock market and the uncertainty of stock price. Stock market is a secondary market based on primary market. Therefore, it is more complex than primary market, and has larger function and influence (Reference [3]). Stock market is closely related to a region’s economy and society, and the factors that affect price are more complicated. The state of operation, the outlook of industry, and the region’s overall economy are all constantly affecting the price of the stock. Social and political factors can also change the trend of the price hugely. In addition, the psychological factor is also a part that cannot be ignored. Such complicated factors determine the high risk and uncertainty of stock market.

[3] Therefore, for investors, the continuous analysis and accurate prediction for stock price are extremely important. By far, there are two main methods for stock analyzing: Fundamental Analysis and Technical Analysis. Fundamental Analysis mainly focuses on the intrinsic value of the enterprise, by detailly analyzing the factors which determine the value of company and the price of stock such as the macroeconomic situation, the future of industry development, the business situation of the company and so on, people can roughly estimate the long-term investment value and margin of safety of the listed companies, compare with the current stock price, and form the corresponding investment proposal (Reference [4]). However, there are some limitations in Fundamental Analysis. As mentioned above, Fundamental Analysis has high requirements for investors. For ordinary investors, the access of getting policies and information from industry and companies is limited, and the institutions normally do not publish their research report, so the source of information is relatively closed. What’s more, because there are a lot of cheaters and misleading message in society, investors are also required to have strong information processing ability. Comparing with big capitalists, ordinary investors are at a disadvantage in capital amount, message accessing, and knowledge composition, so it is not quite likely to get stable excess profit on stock investment by using the Fundamental Analysis (Reference [5]).

[4] Therefore, if an ordinary investor wants to master the change pattern of stock, it must depend the Technical Analysis. By analyzing the history pattern of price and many different technical indexes, people can achieve their goals. As a technique based on real data, Technical Analysis has several main advantages. Firstly, it is straightforward. It is the biggest advantage. Ordinary investors usually cannot pick the useful information from the complex data, but Technical Analysis can help with it and present the information in a straightforward way. Secondly, it is multifarious. Because different investors have different investment goals, Technical Analysis can exactly satisfy those requests with different type. Last, Technical Analysis has unity. Although stocks from different companies have different prices and trends, they have unity from the data point of view. By comparison, investors can well judge whether to buy or to sell different stocks.

[5] Over the past decades, the rapid development of Technical Analysis benefits from the increasing maturity of big data, cloud computing, machine learning and so on. As the development of theory and technology of machine learning, the analysis and prediction toward stock market become more advanced. More and more researchers hope to use machine learning algorithm to deeply analyze and excavate stock market and get the rule of falling and rising In recent years, researching stock through machine learning algorithm is a popular topic for scholars both at home and abroad. Some people start with the influence of news reports (Reference [6], [7]), and social media (Reference [8], [9]) to the market, some other people build their models directly from the historical prices (Reference [5]). Although these algorithms are different, they all fully utilize the computer’s ability of fast calculating and dealing complex data.

[6] Therefore, this invention adopts the machine learning algorithm (KNN algorithm) to solve the accurate prediction of stock price.

SUMMARY OF THE INVENTION

[7] There is provided a parameter-optimization processing based on the minimum prediction error for choosing the optimal parameter k for KNN, comprising: defining the range of /; running KNN algorithm using every different /-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding /-value, which is the best /-value.

[8] The parameter-optimization processing is designed to improve our KNN algorithm. With the best parameter /, KNN algorithm is able to give its best prediction.

[9] This invention is designed to focus on the real-time accurate prediction of stock price. Based on the idea of Technical Analysis, it uses the technology of machine learning to analyze and predict stock price. It is useful and convenient for ordinary investor, because it can predict stock price accurately without requiring too much professional information and knowledge. Only certain stock data is needed, and it will provide some reliable prediction. The method used in this invention is based on KNN algorithm. KNN algorithm, also known as K-Nearest Neighbor algorithm, is a valid method for classification. It measures the distance between new input and known data, and choose k closest data. Based on these k neighbors, the algorithm determines the category of new data. In this case, because stock price has no category, the invention will use the algorithm of regression to calculate the average prices of these k neighbors, and return the result as predicted price of new stock. In this invention, there are five basic steps to achieve its goal: 1) Calculate the Euclidean distances between the new data and the known data from the database; 2) Sort the Euclidean distances in an ascending order; 3) Select k closest (smallest Euclidean distance) data to the new data; 4) Use the algorithm of regression to find the average prices of the k closest prices; 5) Return this value as the predicted price of the new stock.

[ 10] Also, in order to get a better prediction, a parameter-optimization program is designed. In KNN algorithm, choosing new data’s k neighbors is important. Different k-values would bring us different predicted prices, then what is the best k-value? In order to find the best k-value, parameter-optimization program is needed. After running the parameter-optimization program, we finally find the best k-value, in other words, the best parameter.

[11] Using the optimized k-value, the invention is able to predict stock price accurately.

Brief description of the drawings [12] Features of the present disclosure are illustrated by way of non-limiting examples, and like numerals indicate like elements, in which:

Fig. 1 illustrates a real-time stock prediction method based on the KNN algorithm in accordance with the present invention;

Fig. 2 illustrates a process of parameter optimization of the KNN algorithm in accordance with the present invention;

Fig. 3 illustrates an implementation process in accordance with the present invention;

Fig. 4 illustrates a format of the data in accordance with the present invention;

Fig. 5(a) illustrates tendency of predicted prices in accordance with the present invention; and Fig. 5(b) illustrates tendency of actual prices.

Best modes of the invention [13] In order to solve the shortcomings of the existing technology, this invention offers a real-time stock prediction method that based on the KNN algorithm. By thoroughly excavating the effective information from the stock data, it can effectively solve the problem about predicting stock price accurately and precisely, and provides scientific basis of stock investment for ordinary investors.

[ 14] The technical program of the invention is realized below.

[15] A real-time stock prediction method based on the KNN algorithm, including: the acquisition of the financial stock data, the design of the KNN algorithm, the parameter optimization of the KNN algorithm (flow chart), and the real-time prediction through the method.

[16] Fig. 1 illustrates a real-time stock prediction method based on the KNN algorithm in accordance with the present invention.

[17] Step 1: The acquisition of the financial stock data: the invention uses the method of web crawler to obtain stock data.

[18] Step 2: The design of the KNN algorithm: K-Nearest Neighbor algorithm, short as KNN algorithm, uses the measuring of the distance of different characteristic values to do the classification. Its idea is simple: if the most samples of a new sample’s most similar k samples belong to a certain category, then this new sample also belongs to that category. In general, it is to input a new data, then select the first k data from the sample database that is most similar to the new input data, then choose the category where most selected data belong to. Its algorithm can be roughly divided into 5 steps: 1) Calculate the distances between the current point and the known points from the database; 2) Sort the distances in an ascending order; 3) Select / nearest points from the current point; 4) Determine the frequency of occurrence of the category of each selected points; 5) Return the category with the highest frequency as the predicted category of the current point.

[19] In particular to this invention, because it is designed to predict stock price, there are some particularity in this case. Therefore, based on the 5 steps above, the invention has some difference in algorithm. Because the final output is a data about price rather than a label, this invention adopts the algorithm of regression instead of classification.

[20] First, the sample database is obtained from the stock’s historical data. Assume the size of the sample database is m, then the samples can be expressed like this:

(1) [21] In formula (1), X t represents the data of the stock, so we can build a matrix about the data of the stock:

(2) ja represents the price of the stock, so we can build a matrix about the price of the stock:

(3) [22] Assume each stock has n characteristic values, then:

(4) [23] The goal of the invention is to input a new stock X , then predict and export its price y . The specific steps of the method are as follows:

Step 2.1: Calculate the distances between the current data and the known data from the database:

Step 2.1.1: Combine X in formula (2) and Xf in formula (4), set the database X. Each row represents a stock’s data, each stock has n characteristic values:

(5)

Step 2.1.2: Input a new stock data X : X = [x ,,x 2,...,x ], duplicate its data by m rows and form a m x n matrix, in convenient to find the distances between X and all stocks Xf in database:

(6)

Step 2.1.3: Finding Euclidean distance: Euclidean metric is a commonly used distance definition. It refers the real distance between two points in n-dimensional space, or the natural length of a vector (the distance from the point to the origin). Assume there are two points x and y in a n -dimensional space, then the formula of Euclidean distance is:

(7)

The following 4 steps are the applications of formula (7) in this invention:

Step 2.1.3.1: Calculate the difference of each characteristic values between new data and every data in the database. Subtract T in equation (6) by X in equation (5):

(8)

Step 2.1.3.2: Square every element of matrix T -X in equation (8):

(9)

Step 2.1.3.3: Find the sum of each row of Q in equation (9). Each sum represents

a value that relates every characteristic value of new data with data in database:

(10)

Step 2.1.3.4: Find the square root of each row of d in (10), then we finally get the Euclidean distance D:

(Π)

Step 2.2: Sort the Euclidean distances in an ascending order:

Step 2.2.1: Convert matrix D in equation (11) into a list:

(12)

Step 2.2.2: Sort the list D in an ascending order, get a new list s :

(13)

List S represents the original indexes of elements in D in a sorted way. The elements in S represents the indexes of elements in D.

Step 2.3: Select k closest (smallest Euclidean distance) data to the new data:

Step 2.3.1: From list 5 in equation (13), select k smallest values Sl,S2,...,Sk (it is, the indexes of the k smallest distances in list D of equation (12)).

Step 2.3.2: Convert the price matrix y in equation (3) into a list:

(14)

Step 2.3.3: According the selected values we found in Step 2.3.1 (Sl,S2,...,Sk) as indexes, find the corresponding values in list y of equation (14). These values make up a new list yk. These values are the k closest prices to the new stock’s price:

(15)

Step 2.4: Use the algorithm of regression to find the average value of the k closest values (prices). Add all elements in list yk in equation (15), then divide by/:

(16)

Step 2.5: Return this average value as the predicted price (y ) of the new stock: y„ = yavg (17) [24] Step 3: The process of parameter optimization of the KNN algorithm is shown in Fig. 2. While applying the KNN algorithm, there is an important question: How to determine the value of /. What is the most reasonable range of /? In order to solve this problem, we designed some procedures. First, pick some stock data with known prices, use different /-values, and run the KNN algorithm. Next, compare the predicted prices with the stock’s actual prices, calculate the error, and find one smallest error. According to this error, find out the corresponding /-value that is being used in KNN algorithm to get the error. This /-value is the best /-value.

[25] Step 3.1: Initialize the program.

[26] Step 3.2: Define the range of k. kmm <k< kmn. The minimum value £min cannot be smaller than 1, the maximum value £max cannot be bigger than the size of sample database m.

[27] Step 3.3: Input the first k-value: k = kmm .

[28] Step 3.4: Pick some stock data with known prices [Xpi,yp^, input data Xp., and use the current /<-value to run KNN algorithm, get a predicted price y.

[29] Step 3.5: Test the error between predicted price y and actual price y : Δ = ypc-yp (18)

Store the Δ value into a list E .

[30] Step 3.6: Change the current k-value: k = k +1.

[31] Step 3.7: Decide: if the current k-value is not bigger than the preseted maximum kmM . If the result is yes, then jump back to Step 3.4. If the result is no, then continue to execute the next step.

[32] Step 3.8: From list E , find the smallest Δ : Δηιιη . According to this Δηιιη , find the corresponding k-value that is being used to get Δηιιη. This k-value is the best k-value.

[33] Step 3.9: End.

[34] Step 4: The real-time prediction through the method: Using the parameter optimized by Step 3 and running KNN algorithm, the real-time stock information can be accurately predicted in real time.

[35] The implementation process is shown in Fig. 3. The steps in Fig. 3 are described below.

[36] Step 1: Use pandasdatareader to download stock data from https://fmance.yahoo.com/quote/MSFT?ltr=l, the format of the data is shown in Fig. 4: [37] Step 2: Pick 1323 data from column Open’, ‘High’, ‘Low’, ‘Close’. Separate the picked data into two parts: Train Data and Test Data, with the ratio of 7:3. Train Data is used for parameter optimization. Test Data is used for real-time prediction. The number of Train Data is 926, and the number of Test Data is 397.

[38] Step 3: Separate Train Data into two parts: History Data and Optimization Data, with the ratio of 4:1. History Data is used for the information from the environment, in other word, it is the database. Optimization Data is used for finding the best parameter. The number of History Data is 740, and the number of Optimization Data: 186.

[39] Step 4: Parameter optimization, the basic steps are as follows:

Step 4.1: Use History Data as database X, its size m = 740.

Step 4.2: Input Optimization Data as new inputs X , the total number of new inputs is 186.

Step 4.3: Set the range of k: 2 < k < 186 , calculate the prediction error when / = 2,3,...,

For every single stock, the formula of prediction error is shown in formula (18). However, there are 186 stocks, so the formula for this case has a little bit difference. Like formula (18), y represents predicted price, y represents actual price, n represents the number of data: « = 186, then the formula of prediction error is:

(19)

Store all the prediction errors Δ in a list E .

Step 4.4: From list E , find the smallest prediction error Δηιιιι. According to Amin , find the corresponding /-value. This /-value is the best parameter.

[40] Step 5: Input all the data from Test Data, run KNN algorithm with optimized parameter, get the prediction results. The tendency of predicted prices is shown in Fig. 5(a), the tendency of actual prices is shown in Fig. 5(b). It is noteworthy that the two charts are very similar. In other words, the prediction result is pretty reliable.

[41] Also, by using formula (19) in parameter-optimization, the prediction error for Test Data can be calculated. The prediction error for Test Data: Δ = 0.01. The prediction error is relatively small, so it proves that the prediction is valid and reliable.

Reference:

[1] Xiaosheng Su, Caicai Ding, Jiachen Du. Basic Knowledge of the Securities Market. Tsinghua University Press: 84.

[2] https://baike.baidu.com/item/%E7%82%92%E8%82%Al/524402?fr=aladdin [3] https://baike.baidu.com/item/%E8%82%Al%E7%A5%A8%E5%B8%82%E5%9C%BA/233854?fr=aladdin [4] https://baike.baidu.com/item/%E5%9F%BA%E6%9C%AC%E5%88%86%E6%9E%90 [5] Z Hao. A Stock Prediction System Based on Data Mining. 2017.

[6] Robert P. Schumaker, Hsinchu Chen. A Quantitative Stock Prediction System based on Financial News[J]. Information Processing & Management, 2009(5): 571-583.

[7] X Li, H Xie, Y Song, Q Li. Intelligent Systems IEEE[J], Does Summarization Help Stock Prediction? ANews Impace Analysis [J]. Intelligent Systems IEEE, 2015(3): 26-34.

[8] M Skuza, A Romanowski. Sentiment analysis of Twitter data within bid data distributed environment for stock prediction[C]. Conference on Computer Science & Information Systems, 2015:1349-1354.

[9] Z Chen, X Du. Study of Stock Prediction Based on Social Network[C]. International Conference on Social Computing, 2013(1): 913-916.

Claims

1. A parameter-optimization processing based on the minimum prediction error for chooing the optimal parameter k for KNN, comprising: defining the range of k; running KNN algorithm using every different A'-value in defined range to get different predicted prices; comparing the predicted prices with their actual prices; calculating the error between them; and finding the smallest error, according to this error, finding the corresponding A'-value, which is the best /<-value.