AU2018100320A4 - A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators - Google Patents

A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators Download PDF

Info

Publication number
AU2018100320A4
AU2018100320A4 AU2018100320A AU2018100320A AU2018100320A4 AU 2018100320 A4 AU2018100320 A4 AU 2018100320A4 AU 2018100320 A AU2018100320 A AU 2018100320A AU 2018100320 A AU2018100320 A AU 2018100320A AU 2018100320 A4 AU2018100320 A4 AU 2018100320A4
Authority
AU
Australia
Prior art keywords
sentimental
stock
indicators
model
predict
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2018100320A
Inventor
Jiajian Ji
Zelin Li
Yifan LIU
Zengchang Qin
Bingxin Wang
Kai ZHENG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liu Yifan Miss
Original Assignee
Liu Yifan Miss
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liu Yifan Miss filed Critical Liu Yifan Miss
Priority to AU2018100320A priority Critical patent/AU2018100320A4/en
Application granted granted Critical
Publication of AU2018100320A4 publication Critical patent/AU2018100320A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Abstract The present invention discloses a system for stock volatility prediction by using Long Short-Term Memory (LSTM) neural networks with sentimental indicators. By using these networks, the invention achieves a good performance at long term sequence. Besides, the employed sentimental indicators which we extracted and calculated from the online forum, improved the predict accuracy as well. The main steps include: get the posts from the online forum by using crawler; build a sentimental analysis model through the human labeled posts; calculated sentimental indicators by the predicted sentimental tendency and generate sentimental time series; build a predict model by using LSTM to predict the volatility of the stock by giving a specific stock number.

Description

DESCRIPTION
TITLE A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators
FIELD OF THE INVENTION
The present invention generally relates to a stock volatility predict system with sentimental indicators, especially based on a method by using Long Short-Term Memory (LSTM) neural networks.
BACKGROUND OF THE INVENTION
People in the past consider predicting stock as impossible. Early research of stock had begun in 1900s, when Bachelier first applied statistical methods to stock data, and found that the mathematical expectation of stock fluctuations to be zero. People in the earlier did a lot of works to predict the volatility, however, the result was not ideal. In 1970, Fama proposed The Efficient Market Hypothesis (EMH), what’s more, he claimed that the price of the future stock is totally unpredictable because existing share prices always incorporate and reflect all relevant information.
However, since the investors of stock cannot be rational and knowing complete information about the stock market, they may do some impulsive judgment. The Simon’s bounded rationality and Kahneman’s prospect theory illustrates that investors are rather susceptible to herd instinct, and can easily make impulsive judgments. This can always make the market inefficient. We solve the problem of how to receive more useful information from the market by collecting data from some forums because the high development of social networking applications. For example, Bollen et al. did research on the correlations between Tweets and Dow Jones Industrial Average (DJIA), using tracking tool Opinion Finder (OF) and Google-Profile of Mood States (GPOMS). By considering such information, the prediction accuracy was raised by 13%. Si et al. proposed a method to leverage topic sentiment from Twitter to help predict the stock volatility, while O’Conner have established a strong relation between a brand’s popularity and its tweets with its stock price.
The data we collect is different from those researches above. Instead of using data from social media, we use data directly from investors who posted comments in the sub-forum of a particular stock. We can gather very valid pubic opinions using this data. The data is pre-processed and manually labeled, it will be open to public for research purposes. We develop two sentiment classification models using Word2Vec and logistic regression. Both models are combined with Long Short-Term Memory (LSTM) neural networks to capture temporal information of stock prices as well as sentimental information. As a result, we found that the model with sentimental information outperforms the one without. This research shows a nontrivial fact that using posted comments will improve model performance in financial time-series prediction. The related references are: [1] E. Fama, Efficient market hypothesis: A review of theory and empirical work, Journal of Finance 25 (1970) 383-417.
[2] M. Rzepczynski, Beyond greed and fear: Understanding behavioral finance and the psychology of investing, OUP Catalogue (78) (2007) 99-101.
[3] X. Zhang, H. Fuehres, P. A. Gloor, X. Zhang, H. Fuehres, Predicting stock market indicators through twitter “1 hope it is not as bad as I fear", Socialand Behavioral Sciences 26 (26) (2011) 55-62.
[4] D. Hirshleifer, T. Shumway, Good day sunshine: Stock returns and the weather, Journal of Finance 58 (3) (2003) 1009-1032.
[5] J. Bollen, H. Mao, X. Zeng, Twitter mood predicts the stock market, Journal of Computational Science 2 (1) (2010) 1-8.
SUMMARY OF THE INVENTION
The purpose of this invention is to use the sentiment model to predict the volatility of stock more accurate. The present invention disclosed a method of using LSTM neural networks to do sentiment classification and then build stock prediction model so that investors can make investing strategies.
We gain the data, the posts, from East Money Forum, which is the largest and most influential Chinese stock forum. It has one sub-forum for each individual stock. We select 11 stocks, which contain at least 500 posts in their sub-forums. After crawling them from the Internet, we need to delete some useless information in one sentence, like commas or semicolons. Since each Chinese word general contains about 2-4 Chinese characters, we need to segregate the posts for further analysis. We use a python named Jieba for segregation. It constructs a directed acyclic graph for all possible word combinations, and finds the most possible one based on word frequency with dynamic programming.
We do sentiment classification by using LSTM neural networks. To train the LSTM model, we use sentiment polarity model and word2vec model. The polarity model assigns each word a sentimental weight. The sum of weighted sentiment scores of all terms determines the sentiment polarity of the post. If the sum is greater than 0, the word will be considered as positive, and the sum less than 0 will be considered as negative. The sentiment score of a given post is calculated by:
where x® is the feature value of a given term, N is the number of all the terms in the post. The function f( ) is a sigmoid function, which compresses the result to between 0 and 1:
This is sentiment polarity model is simply a logistic regression model. By tuning the weights, the model can map all texts into either positive (1) or negative (-1). We use Gradient Descent algorithm.
The words vectors model is used to find a vector space in which words share semantic similarity also have a shorter distance comparing to other pair of words. They are the representations of natural language, which can make the model capacity become bigger. Word2vec maps words into a semantic space.
Comparing to the classical RNN models like Elman network or Jordan network, LSTM introduces a new structure called memory cell. It contains a forget gate so that the cell can forget its previous states if needed. The state of the forget gate is computed as:
Where ht.j is the hidden state of the previous state.
The state of the input gate it is computed as:
The state of the output gate ot is computed as:
Compute a new candidate value St:
Update new cell state Ct:
Put the cell state through tanh, and multiply it by the output gate:
Wf,Wi,Wc,W0 are weight parameters for the forget gate, input gate, cell state, and output gate.
By combining the sentiment weight from sentiment polarity model and the word weight from word2vec model, we can use LSTM to input a sentence with a vector representation and output the prediction of sentiment.
DESCRIPTION OF DRAWING
The following drawings are only for the purpose of description and explanation but not for limitation, where in:
Figure 1:
This figure is an overall explanation of the process, which begins with Data gather and process, ends with the final doable strategy which being calculated from the LSTM sentimental model and the stock prediction model.
Figure2:
This is the detailed version of the first step, which is Data-Preprocessing, more specifically data collecting. In this process, Crawler is being used to receive all the Stock data according to the Stock number the user inputted. When the Crawler detects a Stock number, it initiates URL series that are ready to be collected from. The Crawler then analyses the DNS according to the series and extracts the host’s IP. Afterwards, the Crawler uses Regular Expression to store the data obtained from the website, and takes out the URL from the series. Lastly the Crawler self-checks the remaining URL in the series, if it is not empty, then the Crawler repeats the Data collecting process again.
Figure 3:
According to this Figure, Sentimental classification is being used to calculate the data collected from the websites, therefore it processes the words and converts to vectors for further calculation using “Word2vec Model”. Then the calculation is based of LSTM, which stands for Long Short-Term Memory. It first uses artificial labelled data to train the LSTM model, so it becomes more accurate, and then it inputs unlabeled data to predict the sentiments behind the data. When it is done, it calculates the change in the stock market.
Figure 4:
This step builds the stock prediction model, which combines the sentiment-time series and the Stock-time series. The two series are being matched together and history is being used to train the LSTM model. The model is given certain K days to calculate the best model possible, which we used a third day to the fifth day range. After the calculation is finished, we find the best model that predicts most accurately within all the possibilities and uses it for further investment strategy.
Figure 5:
The last step is making strategies. We use the predict results on the history data to learn a best k and get its best predict model. Then, we use a time series with length of k to predict the next day’s volatility. If the stock is predicted to fall next day, the investor will trade all the stock he/she holds for cash, otherwise he/she would choose to hold the stock.
Figure 6:
Stock price and investors net worth (Per Share) for Stock 600198. The net worth is computed by assuming that the investor holds 1 share of stock at the beginning of the trades with our predictions.
Figure 7:
This figure shows the comparison results with/without sentimental indicators. We can see the accuracy varies as the k changes, so we can get a best k to build the model. Besides, the sentimental indicators can help to improve the predict accuracy.
Figure 8:
The results are consistent with different stocks at most time, which indicate the importance of the sentimental indicators.
DESCRIPTION OF PREFERRED EMBODIMENTS
Stepl: DATA COLLECTION 1. We provide a crawler for collecting comments from the East Money Forum, the largest stock forum in China. The crawler could visit the URL, online forum. It then copies the comments posted on the forum and creates “.xslx” format under directory. At last, the crawler pastes the comments on the Excel table with the labels of title, comment, and date. 2. We need to download a software on the computer, DaZhiHui365[3]. We manually collect the prices information of each stocks on this application. The information will be collected as a .txt format, and this form will transform to .xlsx format finally.
Step2: SENTIMENT CLASSIFICATION
In order to classify the emotion of each comments, we preprocess the data that collects from the forum. We remove the useless materials in the comments, such as punctuatuation marks and separate the comment into small sections. Before the training of the sentiment classification model, we need artificially labelled some comments we have collected. And then, we use Word2vec model to train the LSTM sentiment classification model to sign the positive or negative marks on every words. At last, we could sent the unlabelled comments into this trained model and calculate the accurcy of this model. There are will form a trained model under directory.
Step3: STOCK VOLATILITY PREDICTION MODEL
The emotional model labels other comments with positive and negative signs .K is an parameter, which indicate the length between time windows, and determined by iterative experimental process. Naturally, we considered K with a specific region which between 3 and 30, but we use this data to predict k+1 day. After we finish the process of crawling, it form 2 series: sentiment time series and stock time series. They match up and combine together to form generate volatility, bullishness and number-of-comments time series from raw data files. We will input k then find the most suitable k value by evaluating accuracy, select the highest accuracy data, then save it.
Step4: MAKE INVESTING STRATEGY K data and best model will express an probability of prediction, which means if the value of p is below 0.5 the investor should sell it and if the p value is greater than 0.5, the investor should buy it rather than other choices.

Claims (2)

  1. CLAIM
    1. A method for risk prediction and medication guidance of osteoporosis based on exome analysis and panel design, which selected 47 genes after a systematic search, and tools including BWA, Picard-tools, GATK, ANNOVAR, and a Perl program we designed will be used.
  2. 2. A method for risk prediction and medication guidance as claim 1, which can be used for achieving the tailored therapy.
AU2018100320A 2018-03-15 2018-03-15 A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators Ceased AU2018100320A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018100320A AU2018100320A4 (en) 2018-03-15 2018-03-15 A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2018100320A AU2018100320A4 (en) 2018-03-15 2018-03-15 A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators

Publications (1)

Publication Number Publication Date
AU2018100320A4 true AU2018100320A4 (en) 2018-04-26

Family

ID=61973031

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018100320A Ceased AU2018100320A4 (en) 2018-03-15 2018-03-15 A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators

Country Status (1)

Country Link
AU (1) AU2018100320A4 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
WO2021174824A1 (en) * 2020-03-05 2021-09-10 苏州浪潮智能科技有限公司 Sentence-level convolution lstm training method, and device and readable medium
CN117132004A (en) * 2023-10-27 2023-11-28 四川省建筑设计研究院有限公司 Public place people stream density prediction method, system and equipment based on neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986470A (en) * 2018-08-20 2018-12-11 华南理工大学 The Travel Time Estimation Method of particle swarm algorithm optimization LSTM neural network
WO2021174824A1 (en) * 2020-03-05 2021-09-10 苏州浪潮智能科技有限公司 Sentence-level convolution lstm training method, and device and readable medium
CN117132004A (en) * 2023-10-27 2023-11-28 四川省建筑设计研究院有限公司 Public place people stream density prediction method, system and equipment based on neural network
CN117132004B (en) * 2023-10-27 2024-02-09 四川省建筑设计研究院有限公司 Public place people stream density prediction method, system and equipment based on neural network

Similar Documents

Publication Publication Date Title
Swathi et al. An optimal deep learning-based LSTM for stock price prediction using twitter sentiment analysis
de Oliveira Carosia et al. Investment strategies applied to the Brazilian stock market: a methodology based on sentiment analysis with deep learning
Mankar et al. Stock market prediction based on social sentiments using machine learning
CN108694476A (en) A kind of convolutional neural networks Stock Price Fluctuation prediction technique of combination financial and economic news
Liu et al. Stock volatility prediction using recurrent neural networks with sentiment analysis
Jammalamadaka et al. Predicting a stock portfolio with the multivariate bayesian structural time series model: Do news or emotions matter?
CN111209738A (en) Multi-task named entity recognition method combining text classification
AU2018100320A4 (en) A New System for Stock Volatility Prediction by Using Long Short-Term Memory with Sentimental Indicators
CN112069320B (en) Span-based fine-grained sentiment analysis method
CN115310722A (en) Agricultural product price prediction method based on data statistics
CN106202299B (en) Disabled person authoritative user recommendation method based on disabled person characteristics
Hwang et al. Recent deep learning methods for tabular data
Rajkar et al. Stock market price prediction and analysis
Duan et al. Deep neural networks for stock price prediction
Ravichandran et al. Stock trend prediction using deep neural networks in time series and social sentiment analysis
Jaybhay et al. Stock market prediction model by combining numeric and news textual mining
Sharma et al. Stock market prediction using historical stock prices and dependence on other companies in automotive sector
GUMUS et al. Stock market prediction by combining stock price information and sentiment analysis
Komori Convolutional neural network for stock price prediction using transfer learning
Kayım et al. Financial Instrument Forecast with Artificial Intelligence
Yao et al. Forecasting crude oil futures using an ensemble model including investor sentiment and attention
Duan et al. Mining opinion and sentiment for stock return prediction based on web-forum messages
Thazhackal et al. A hybrid deep learning model to predict business closure from reviews and user attributes using sentiment aligned topic model
Manzoor et al. Stock exchange prediction using financial news and sentiment analysis
CN110334848A (en) A kind of stock advance-decline prediction method extracted based on news features with Recognition with Recurrent Neural Network

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry