CN106022522A - Method and system for predicting stocks based on big data published by internet - Google Patents
Method and system for predicting stocks based on big data published by internet Download PDFInfo
- Publication number
- CN106022522A CN106022522A CN201610338598.4A CN201610338598A CN106022522A CN 106022522 A CN106022522 A CN 106022522A CN 201610338598 A CN201610338598 A CN 201610338598A CN 106022522 A CN106022522 A CN 106022522A
- Authority
- CN
- China
- Prior art keywords
- stock
- data
- days
- day
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 238000010801 machine learning Methods 0.000 claims abstract description 9
- 230000009193 crawling Effects 0.000 claims abstract description 5
- 230000003203 everyday effect Effects 0.000 claims abstract description 5
- 230000008451 emotion Effects 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 235000019013 Viburnum opulus Nutrition 0.000 claims description 5
- 244000071378 Viburnum opulus Species 0.000 claims description 5
- 238000003058 natural language processing Methods 0.000 claims description 5
- 241001269238 Data Species 0.000 claims description 4
- 241000233805 Phoenix Species 0.000 claims description 4
- 244000097202 Rathbunia alamosensis Species 0.000 claims description 4
- 235000009776 Rathbunia alamosensis Nutrition 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 4
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims description 4
- 239000010931 gold Substances 0.000 claims description 4
- 229910052737 gold Inorganic materials 0.000 claims description 4
- 230000000630 rising effect Effects 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims 1
- 239000000126 substance Substances 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 6
- 230000006399 behavior Effects 0.000 description 4
- 241000270322 Lepidosauria Species 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Technology Law (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a method and system for predicting stocks based on big data published by the internet. The method comprises the following steps: crawling related information of the stocks before a business day; and then performing the feature extraction using the crawled data, constructing a training dataset, and using a Group Lasso to perform prediction model training, wherein the evaluation standard of the model is yield rate in a period of time in the operation mode of selling stocks purchased in late trading day and purchasing the stocks recommended at the current trading day at the opening every day; and then constructing a new testing set according to the data crawled at the trading day, predicting using the prediction model trained in former step to obtain the finally recommended stocks. Through the adoption of the method and system disclosed by the invention, a new, useful and reliable information source is provided for quantitative stock selection or stock prediction, the adding of above information can more reflect the market in combination with the traditional information; on the basis of method and system, the stock prediction model obtained using the machine learning technique can more capture the internal operation mechanism of the market, and the benefit of the investor can be effectively improved.
Description
Technical field
The present invention relates to a kind of big data Prediction of Stock Index method, grasp based on stock invester disclosed in the Internet particularly to one
The big data Prediction of Stock Index method such as work, analyst's prediction, stock invester's comment, news, bulletin, historical stock price, funds flow, basic side
And system.
Background technology
Before the seventies in last century, equity investment is that one is analyzed qualitatively, does not has a market demand, but one subjective
Art.Along with popularizing of computer, a lot of people begin one's study and drive the rule of the change of stock price, tradition basic side research method mould
Type replaces, and p/e ratio, the concept of HSBC are born, and quantify investment and thus rise.
From subjective judgment to quantifying investment, it it is the process transferring science from art to.The seventies in last century with previous substantially
Face researcher can only pay close attention to 20 to 50 stocks, and coverage rate is the most limited.There is quantitative model just can cover all stock, this
It it is exactly a big leap.Additionally, along with the development of computer process ability, the consumption of information also has a leap change.Cross
Go to see that three indexs are the most much of that, referring now to index get more and more, the prediction made is more and more accurate.
Along with the arrival of 21 century, quantify investment and encounter again new bottleneck, it is simply that homogeneity is competed.The amount of Ge Jia mechanism
Change model more and more convergent, cause investing result with rising with falling." can seek by bigger data before seeing report data
Look for rule?" this is the problem that big data policy entrepreneurs attempt to solve.
The investment model that Nobel prize in economics winner Robert's seat in 2013 is strangled in design is spoken approvingly of the most in the industry.
In his model, three variablees of Primary Reference: the cash flow of investment project plan, the estimated cost of corporate capital, stock city
The field reaction (market sentiment) to investment.He thinks, market can affect investment per se with subjective judgment factor, investor sentiment
Behavior, and investment behavior directly affects assets price.Computer is by analyzing news, research report, social information, search behavior
Deng, by natural language processing method, extract useful information;And by machine learning intellectual analysis, the past only quantifies investment
Can cover tens strategies, the investment of big data then can cover thousands of strategies.
Show that traditional Prediction of Stock Index is all based on the history tendency of stock price, funds flow, and each stock accordingly
Market value, the information such as p/e ratio carries out stock analysis prediction.When present internet deep affects many traditional industries,
Compared to the Internet decades ago also no before invention, or even before the Internet is the most universal, except those traditional stocks
Outside ticket data, the Internet also has the data much about stock, including the practical operation of stock invester of public data, analyst
Prediction, the comment of stock invester, news, bulletin etc. information.These information are the reaction to current stock market to a certain extent, also can
Show the intended reaction to following stock market.The present invention attempts to these new useful data and traditional data profit
With a kind of big data Study on Stock Prediction Model of the technology creation such as natural language processing, machine learning.
Summary of the invention:
Goal of the invention: for problems of the prior art, the present invention proposes a kind of based on stock disclosed on the Internet
The big data quantity share-selecting method of the people and analyst's operation behavior and system, for numerous stock investers, investment reference is done by Fund Company etc..
Technical scheme: the present invention proposes a kind of method based on data prediction stock big disclosed in the Internet, including as follows
Step:
1) relevant information of stock before the day of trade is crawled;
Concrete crawling method is: first crawl some Agent IPs, then uses Scrapy framework to crawl the number of related web site
According to, it is stored in after converting the data into json form in Mongodb data base;
The specifying information crawled include snowball net, gold compass, stock, phoenix finance and economics, on the website such as Sina's finance and economics about stock
The Stock-operation of the stock invester of the ticket of ticket, the prediction of analyst, stock invester's comment, news, bulletin, and the historical price of every stock
Data, market value, net assets income ratio, Return on Assets, earnings per share rate of increase, ratio of current liabilities, enterprise value multiple, clean
The stock earnings price ratio of profit year-on-year growth rate, Equity Concentration Ratio, free flow market value and nearest one month and stability bandwidth.
2) data utilizing step 1 to crawl carry out feature extraction, construct training dataset, and use Group Lasso to enter
Row forecast model is trained;
The training dataset of structure: be made up of, for this data of 5 day of trade in the previous week of current trading day
Each day of trade of 5 day of trade, every stock is made up of feature and classification, and wherein feature obtains with according to relevant information process
Vector representation, whether classification rises for this stock price of next day of trade, if rising is just 1 to be otherwise 0, the most just obtains
Initial training matrix;Owing to data exist redundancy, this step can first filter out the data that quantity of information is not enough, concrete filter criteria
For: filter out stock invester on same day in the data crawled to sample less than 10 times of the operand of stock.
The extracting method of the vector characterizing stock feature is: operate data for stock invester, according to income last month of stock invester
Rate, is divided into 10 groups by stock invester, and each grade group is extracted first 1 day, 3 days, 7 days, 15 days, 30 days to this stock of this group
Deng buying number, sell number, the amount of holding position, position in storehouse knots modification, this group at each timestamp in timestamp each in timestamp
The feature such as average return;
For analyst's prediction data, extraction and analysis teacher was to first 1 day of this stock, 3 days, 7 days, 15 days, the time such as 30 days
Buying number, sell the features such as number in each timestamp in stamp;
For stock invester's comment data, extraction and analysis teacher was to first 1 day of this stock, 3 days, 7 days, 15 days, the timestamp such as 30 days
In the comment number of this stock in each timestamp, the average of the emotion value of each comment, the feature such as variance;
For news data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days every
The news number of this stock in individual timestamp, the average of the emotion value of each news, the feature such as variance;
For advertisement data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days every
The bulletin number of this stock in individual timestamp, the summation of the number of times that the word in bulletin keywords database corresponding in each bulletin occurs
Etc. feature;
For historical stock price data, extraction and analysis teacher was to first 1 day of this stock, 3 days, 7 days, 15 days, the timestamp such as 30 days
In the opening price of this stock in each timestamp, closing price, highest price, lowest price and the ratio of first 30 days prices, line on the 3rd oblique
The features such as rate, line slope, line slope, line slope, line slopes on the 30th on the 15th on the 10th on the 7th;
For funds flow data, extraction and analysis teacher was to first 1 day of this stock, 3 days, 7 days, 15 days, the timestamp such as 30 days
In the feature such as ratio of the amount of flowing to of this stock main force fund and discharge in each timestamp;
For other information datas, extract the current market value of this stock, net assets income ratio, Return on Assets, per share receipts
Benefit rate of increase, ratio of current liabilities, enterprise value multiple, net profit year-on-year growth rate, Equity Concentration Ratio, free flow market value with
And the feature such as the stock earnings price ratio of nearest month and stability bandwidth;
Finance emotion dictionary, bulletin keywords database two are primarily based on for text datas such as stock invester's comment, news, bulletins
Dictionary uses natural language processing technique that text is carried out participle, calculates every stock further according to the financial emotion word occurred in text
In the emotion value of people's comment, news etc., and bulletin corresponding key word occur number of times, finance emotion dictionary lists one
A little stock emotion key words and emotion score corresponding to this key word, list some and announce relevant in bulletin keywords database
Key word, the two dictionary is to use the mode of mass-rent manually to mark to obtain.
Owing to operating data for stock invester in feature extraction, according to earning rate last month of stock invester, stock invester is divided into 10
Individual group, each group in this is equivalent to a packet (Group), and the feature of each packets inner is to have stronger association, and
The relatedness between feature between different grouping is then the strongest, when model training, it would be desirable to in same packet
Feature has the factor of overall consideration, uses the Group Lasso algorithm in machine learning preferably to consider on this basis
To these factors, so selecting Group Lasso algorithm.
Group Lasso algorithmic notation is as follows:
Wherein,For model training result, X is training sample matrix, and Y is the categorization vector of sample, IgRepresent and belong to g
The aspect indexing of individual Group, wherein g=1 ..., G,Represent that belonging to model corresponding to the aspect indexing of g Group instructs
The value of the weights practised.
During model training, the method utilizing crosscheck, take turns the test set probability according to prediction for each
Descending chooses the stock that prediction probability is the highest, then according to every day, the stock bought in the last day of trade was sold in opening quotation, buys current
The earning rate of stock such mode of operation two time-of-week total revenue that the day of trade recommends, regulates the parameter of model with this.
3) crawl the test set that the data configuration on the same day day of trade is new, and the forecast model using step 2 to train is carried out
Prediction, obtains consequently recommended stock.
The present invention also proposes a kind of system based on data prediction stock big disclosed in the Internet, crawls storage including data
Module, forecast model training module and Prediction of Stock Index module;Wherein, data crawl memory module for crawling and storing stock
Relevant information;The data configuration training dataset that forecast model training module crawls before utilizing the day of trade, and use Group
Lasso trains forecast model;Prediction of Stock Index module, utilizes the test set that the data configuration crawled the same day day of trade is new, and uses
The forecast model trained predicts consequently recommended stock.
The system of big data prediction stock based on the Internet public data also includes display module, for by Prediction of Stock Index
Result shows client.
Beneficial effect: the present invention is that quantization is selected stocks or Prediction of Stock Index provides new useful reliable information source, all
As the operation of stock invester, the prediction of analyst, news, announce, grind the data such as report relative to traditional such as stock historical price,
The data such as funds flow are novel Data Sources, and these information are the reaction to current stock market to a certain extent, also can table
Reveal the intended reaction to following stock market.Owing to there being substantial amounts of text data, the difficulty crawling in real time and analyzing of these data
Degree crawls and processes difficulty than traditional equity data, and the present invention uses the skills such as Scrapy framework reptile and natural language processing
Art crawls in real time for the data of these types and processes, and and the traditional historical price of such as stock, cash flow
To etc. the combination of data more can reflect market.Owing to some part of feature of the extraction of the present invention is the last month according to stock invester
Earning rate, is divided into multiple packet by stock invester, and the feature of each packets inner is to have stronger association, and between different grouping
Relatedness between feature is then the strongest, when model training, it is therefore desirable to be able to the feature in same packet is had entirety
The factor considered, uses the Group Lasso algorithm in machine learning can preferably consider these factors on this basis,
The Study on Stock Prediction Model obtained is better able to catch the inherent operating mechanism in market, substantially increases the income brought to money person.
Accompanying drawing explanation
Fig. 1 is the integrated stand composition of the Prediction of Stock Index system of the present invention;
Fig. 2 is the Organization Chart that the data of the present invention crawl memory module;
Fig. 3 is the Organization Chart of the forecast model training module of the present invention;
Fig. 4 is the Organization Chart of the Prediction of Stock Index prediction module of the present invention.
Detailed description of the invention
Below in conjunction with specific embodiment, it is further elucidated with the present invention, it should be understood that these embodiments are merely to illustrate the present invention
Rather than restriction the scope of the present invention, after having read the present invention, the those skilled in the art's various equivalences to the present invention
The amendment of form all falls within the application claims limited range.
Fig. 1 is the general frame of the Prediction of Stock Index system of the present invention, and including four modules, data crawl memory module, stock
Ticket forecast model training module, Prediction of Stock Index module and display module.Language use Python of the present invention, data base uses
Mongodb。
Data crawl memory module as in figure 2 it is shown, reptile uses Scrapy framework, and Scrapy is one and opens based on Python
Quick, the high-level Web information grasping system sent out, is mainly used in automatically accessing relevant Web sites and extracting knot from the page
The data of structure.Scrapy use efficient Twisted asynchronous network storehouse to process network communication, Scrapy overall architecture
As shown in Figure 3.
In reptile, in order to solve the anti-creep problem of the websites such as such as snowball net, first crawl some Agent IPs, then use
Scrapy framework crawl snowball net, gold compass, stock, phoenix finance and economics, Sina's finance and economics, the data of huge website such as tide information etc., by number
It is stored in Mongodb data base according to after changing into json form.Wherein, snowball net can crawl the operand of some stock investers
According to data such as, stock invester's comment, news, bulletins, gold compass can crawl the data such as the prediction of analyst, and stock can crawl
Data, phoenix finance and economics and Sina's finance and economicss such as stock invester's comment can crawl news and the historical price of stock, funds flow, base
The data such as this face, huge tide information can crawl the data such as bulletin.
Study on Stock Prediction Model training module as shown in Figure 4, first constructs the training dataset of machine learning, training dataset by
The data composition of 5 day of trade in the previous week of distance current trading day.For each day of trade of this 5 day of trade, A
2780 every, stock stocks of stock are made up of feature and classification, and wherein feature one vector representation, this vector has 700 dimensions left
The right side, whether classification rises for this stock price of next day of trade, if rising is just 1 to be otherwise 0, so can obtain a 5*
The matrix of about 2780*701.This is initial training collection.
The composition of the characteristic vector about table 1 700 dimension
The data crawled due to the stock day having are not a lot, so describing possible distortion with original 700 dimensional vectors,
So Study on Stock Prediction Model training module can filter out the data that quantity of information is not enough, concrete filter criteria can be according to evaluating standard
Then being adjusted, the present stage present invention filters out stock invester on same day in the data crawled to sample less than 10 times of the operand of stock
This.Training set after so can being filtered.
Then carrying out model training with the Group Lasso algorithm in machine learning, the statistic of same type is one
Group.Different with traditional Machine Learning Problems at this, the evaluation criterion of model quality here is not accuracy rate, F1 etc., and
It is to recommend 8 stocks every day according to model, sell, according to opening quotation every day, the stock bought in the last day of trade, buys current trading day
The such mode of operation of stock recommended earning rate during this period of time.The parameter of model is regulated with this.Group Lasso algorithm
It is expressed as follows:
Wherein,For model training result, X is training sample matrix, and Y is the categorization vector of sample, IgRepresent and belong to g
The aspect indexing of individual Group, wherein g=1 ..., G,Represent that belonging to model corresponding to the aspect indexing of g Group instructs
The value of the weights practised.
Thus obtain big data Study on Stock Prediction Model, about 10 hours before each day of trade opens the set, this
The bright training carrying out model on the same day.
The prediction module of big data Study on Stock Prediction Model as shown in Figure 4, is extracted feature according to the data crawled the same day and is obtained
Test data set, so can obtain 2780 samples of 2780 stocks of A-share.According still further to training data and the method for filtration,
Get rid of the sample that quantity of information is few, the test set after being filtered.Finally use the big data Study on Stock Prediction Model pair trained
Test set after filtration is predicted, and selects 8 the highest stocks of the output probability recommendation stock as the next day of trade.
Claims (9)
1. a method based on data prediction stock big disclosed in the Internet, comprises the steps:
1) related data information of stock before the day of trade is crawled;
2) data utilizing step 1 to crawl carry out feature extraction, construct training set, and use Group Lasso algorithm to carry out greatly
The training of data Study on Stock Prediction Model;
3) crawl the test set that the data configuration on the same day day of trade is new, and the forecast model using step 2 to train be predicted,
Obtain consequently recommended stock.
Method based on data prediction stock big disclosed in the Internet the most according to claim 1, described step 1 extracts stock
The method of ticket information is: first crawl some Agent IPs, then uses Scrapy framework to crawl the data of related web site, data is turned
It is stored in Mongodb data base after chemical conversion Json form.
Method based on data prediction stock big disclosed in the Internet the most according to claim 1, described step 1 crawls
Specifying information include snowball net, gold compass, stock, phoenix finance and economics, on the website such as Sina's finance and economics about the stock of stock invester of stock
Operation, the prediction of analyst, stock invester's comment, news, bulletin, grind and respond with and price history data, the market value of every stock, only provide
Product earning rate, Return on Assets, earnings per share rate of increase, ratio of current liabilities, enterprise value multiple, net profit increase by a year-on-year basis
The stock earnings price ratio of rate, Equity Concentration Ratio, free flow market value and nearest one month and stability bandwidth.
Method based on data prediction stock big disclosed in the Internet the most according to claim 1, described step 2 can filter
Falling the data that quantity of information is not enough, concrete filter criteria is: filter out the stock invester on same day in the data the crawled operand to stock
Sample less than 10 times.
Method based on data prediction stock big disclosed in the Internet the most according to claim 1, described step 2 structure
Training dataset is made up of the data of 5 day of trade in the previous week of current trading day, each for this 5 day of trade
The day of trade, every stock is made up of feature and classification, and wherein feature is with processing the vector representation obtained, classification according to relevant information
Whether rise for this stock price of next day of trade, if rising is just 1 to be otherwise 0, the most just obtain initial training matrix.
Method based on data prediction stock big disclosed in the Internet the most according to claim 5, described sign stock is special
The vectorial extracting method levied is:
Data are operated for stock invester, according to earning rate last month of stock invester, stock invester is divided into 10 groups, the group of each grade is carried
Take this group to first 1 day of this stock, 3 days, 7 days, 15 days, buying number, sell in each timestamp in the timestamp such as 30 days
Number, the amount of holding position, position in storehouse knots modification, this group are in the feature such as average return of each timestamp;
For analyst's prediction data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days
Buying number, sell the features such as number in each timestamp;
For stock invester's comment data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days every
The comment number of this stock in individual timestamp, the average of emotion value of each comment, the feature such as variance;
For news data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days each time
Between the news number of this stock in stamp, the average of the emotion value of each news, the feature such as variance;
For advertisement data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days each time
Between the bulletin number of this stock in stamp, the spy such as the summation of the number of times that word in bulletin keywords database corresponding in each bulletin occurs
Levy;
For historical stock price data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days every
The opening price of this stock, closing price, highest price, lowest price and the ratio of first 30 days prices in individual timestamp, line slope on the 3rd, 7
Day feature such as line slope, line slope, line slope, line slopes on the 30th on the 15th on the 10th;
For funds flow data, extraction and analysis teacher to first 1 day of this stock, 3 days, 7 days, 15 days, in the timestamp such as 30 days every
The feature such as ratio of the amount of flowing to of this stock main force fund and discharge in individual timestamp;
For other information datas, extract the current market value of this stock, net assets income ratio, Return on Assets, earnings per share increasing
Long rate, ratio of current liabilities, enterprise value multiple, net profit year-on-year growth rate, Equity Concentration Ratio, free flow market value and
The features such as the stock earnings price ratio of nearly month and stability bandwidth;
Finance emotion dictionary, bulletin two dictionaries of keywords database are primarily based on for text datas such as stock invester's comment, news, bulletins
Use natural language processing technique that text is carried out participle, calculate every stock invester further according to the financial emotion word occurred in text and comment
In the emotion value of opinion, news etc., and bulletin corresponding key word occur number of times, finance emotion dictionary lists some stocks
Ticket emotion key word and emotion score corresponding to this key word, list some and announce relevant passes in bulletin keywords database
Keyword, the two dictionary is to use the mode of mass-rent manually to mark to obtain.
Method based on data prediction stock big disclosed in the Internet the most according to claim 6, due in feature extraction
In data are operated for stock invester, according to earning rate last month of stock invester, stock invester is divided into 10 groups, each group in this is equivalent to
One packet, the feature of each packets inner is to have stronger association, and the relatedness between feature between different grouping is then
The strongest, in order to enable the consideration that the feature in same packet is had entirety, use on this basis in machine learning
Group Lasso algorithm is predicted model training, and Group Lasso algorithmic notation is as follows:
Wherein,For model training result, X is training sample matrix, and Y is the categorization vector of sample, IgRepresent and belong to g
The aspect indexing of Group, wherein g=1 ..., G,Represent and belong to the model training that the aspect indexing of g Group is corresponding
The value of the weights gone out;
During model training, the method utilizing crosscheck, take turns the test set probability descending according to prediction for each
Choose the stock that prediction probability is the highest, then according to every day, the stock bought in the last day of trade was sold in opening quotation, buy current transaction
The earning rate of stock such mode of operation two time-of-week total revenue that day recommends, regulates the parameter of model with this.
8. a system based on data prediction stock big disclosed in the Internet, crawls memory module, forecast model including data
Training module and Prediction of Stock Index module;Wherein, data crawl memory module for crawling and store the relevant information of stock;Prediction
The data configuration training dataset that model training module crawls before utilizing the day of trade, and use Group Lasso training prediction mould
Type;Prediction of Stock Index module, utilizes the test set that the data configuration crawled the same day day of trade is new, and uses the forecast model trained
Predict consequently recommended stock.
System based on data prediction stock big disclosed in the Internet the most according to claim 8, also includes display module,
For Prediction of Stock Index result is showed client.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610338598.4A CN106022522A (en) | 2016-05-20 | 2016-05-20 | Method and system for predicting stocks based on big data published by internet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610338598.4A CN106022522A (en) | 2016-05-20 | 2016-05-20 | Method and system for predicting stocks based on big data published by internet |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106022522A true CN106022522A (en) | 2016-10-12 |
Family
ID=57096593
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610338598.4A Pending CN106022522A (en) | 2016-05-20 | 2016-05-20 | Method and system for predicting stocks based on big data published by internet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106022522A (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897932A (en) * | 2017-01-19 | 2017-06-27 | 沃民高新科技(北京)股份有限公司 | Data method of replacing and device |
CN107689000A (en) * | 2017-08-16 | 2018-02-13 | 北京国新汇金股份有限公司 | A kind of financial information management system |
CN108446984A (en) * | 2018-03-20 | 2018-08-24 | 张家林 | A kind of investment data management method and device |
CN108596765A (en) * | 2018-04-28 | 2018-09-28 | 国信优易数据有限公司 | A kind of Electronic Finance resource recommendation method and device |
CN108629690A (en) * | 2018-04-28 | 2018-10-09 | 福州大学 | Futures based on deeply study quantify transaction system |
CN108647823A (en) * | 2018-05-10 | 2018-10-12 | 北京航空航天大学 | Stock certificate data analysis method based on deep learning and device |
CN108830722A (en) * | 2018-06-27 | 2018-11-16 | 东莞市波动赢机器人科技有限公司 | Based on transaction machine people recommended method, electronic equipment and the storage medium to liquidate |
CN108921444A (en) * | 2018-07-12 | 2018-11-30 | 李俊山 | Based on block chain technology distribution formula number exchange stock index sample acquisition system |
CN109087205A (en) * | 2018-08-10 | 2018-12-25 | 北京字节跳动网络技术有限公司 | Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index |
CN109102319A (en) * | 2018-06-27 | 2018-12-28 | 众安信息技术服务有限公司 | Plate index preparation method, device and the server of block chain cryptographic assets |
CN109146166A (en) * | 2018-08-09 | 2019-01-04 | 南京安链数据科技有限公司 | A kind of personal share based on the marking of investor's content of the discussions slumps prediction model |
CN109255714A (en) * | 2018-08-27 | 2019-01-22 | 深圳市利讯互联网金融服务有限公司 | Machine learning fund optimum decision system and its preferred method |
CN109272405A (en) * | 2018-09-30 | 2019-01-25 | 大唐碳资产有限公司 | Carbon transaction in assets method and system |
WO2019019346A1 (en) * | 2017-07-25 | 2019-01-31 | 上海壹账通金融科技有限公司 | Asset allocation strategy acquisition method and apparatus, computer device, and storage medium |
WO2019041520A1 (en) * | 2017-08-31 | 2019-03-07 | 平安科技(深圳)有限公司 | Social data-based method of recommending financial product, electronic device and medium |
CN109739895A (en) * | 2018-12-07 | 2019-05-10 | 中国联合网络通信集团有限公司 | A kind of virtual article trading prediction technique and device |
CN110163758A (en) * | 2019-06-03 | 2019-08-23 | 成都慧财智科技有限公司 | Artificial intelligence Stock investment analysis system |
WO2019192135A1 (en) * | 2018-04-03 | 2019-10-10 | 平安科技(深圳)有限公司 | Electronic device, bond yield analysis method, system, and storage medium |
WO2019205378A1 (en) * | 2018-04-26 | 2019-10-31 | 平安科技(深圳)有限公司 | Method and apparatus for selecting investment stocks based on public sentiment factor, and storage medium |
CN110400225A (en) * | 2019-07-29 | 2019-11-01 | 北京北信源软件股份有限公司 | A kind of market value of stock management method |
WO2019218517A1 (en) * | 2018-05-16 | 2019-11-21 | 平安科技(深圳)有限公司 | Server, method for processing text data and storage medium |
WO2019242143A1 (en) * | 2018-06-21 | 2019-12-26 | 平安科技(深圳)有限公司 | Stocks selling early-warning method and apparatus, and computer-readable storage medium |
CN110809778A (en) * | 2018-03-30 | 2020-02-18 | 加藤宽之 | Stock price prediction support system and method |
TWI692735B (en) * | 2018-10-12 | 2020-05-01 | 台北富邦商業銀行股份有限公司 | Exposure management system of corporate finance |
-
2016
- 2016-05-20 CN CN201610338598.4A patent/CN106022522A/en active Pending
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897932A (en) * | 2017-01-19 | 2017-06-27 | 沃民高新科技(北京)股份有限公司 | Data method of replacing and device |
WO2019019346A1 (en) * | 2017-07-25 | 2019-01-31 | 上海壹账通金融科技有限公司 | Asset allocation strategy acquisition method and apparatus, computer device, and storage medium |
CN107689000A (en) * | 2017-08-16 | 2018-02-13 | 北京国新汇金股份有限公司 | A kind of financial information management system |
WO2019041520A1 (en) * | 2017-08-31 | 2019-03-07 | 平安科技(深圳)有限公司 | Social data-based method of recommending financial product, electronic device and medium |
CN108446984A (en) * | 2018-03-20 | 2018-08-24 | 张家林 | A kind of investment data management method and device |
CN110809778A (en) * | 2018-03-30 | 2020-02-18 | 加藤宽之 | Stock price prediction support system and method |
WO2019192135A1 (en) * | 2018-04-03 | 2019-10-10 | 平安科技(深圳)有限公司 | Electronic device, bond yield analysis method, system, and storage medium |
WO2019205378A1 (en) * | 2018-04-26 | 2019-10-31 | 平安科技(深圳)有限公司 | Method and apparatus for selecting investment stocks based on public sentiment factor, and storage medium |
CN108596765A (en) * | 2018-04-28 | 2018-09-28 | 国信优易数据有限公司 | A kind of Electronic Finance resource recommendation method and device |
CN108629690A (en) * | 2018-04-28 | 2018-10-09 | 福州大学 | Futures based on deeply study quantify transaction system |
CN108647823A (en) * | 2018-05-10 | 2018-10-12 | 北京航空航天大学 | Stock certificate data analysis method based on deep learning and device |
WO2019218517A1 (en) * | 2018-05-16 | 2019-11-21 | 平安科技(深圳)有限公司 | Server, method for processing text data and storage medium |
WO2019242143A1 (en) * | 2018-06-21 | 2019-12-26 | 平安科技(深圳)有限公司 | Stocks selling early-warning method and apparatus, and computer-readable storage medium |
CN108830722A (en) * | 2018-06-27 | 2018-11-16 | 东莞市波动赢机器人科技有限公司 | Based on transaction machine people recommended method, electronic equipment and the storage medium to liquidate |
CN109102319A (en) * | 2018-06-27 | 2018-12-28 | 众安信息技术服务有限公司 | Plate index preparation method, device and the server of block chain cryptographic assets |
CN108921444A (en) * | 2018-07-12 | 2018-11-30 | 李俊山 | Based on block chain technology distribution formula number exchange stock index sample acquisition system |
CN109146166A (en) * | 2018-08-09 | 2019-01-04 | 南京安链数据科技有限公司 | A kind of personal share based on the marking of investor's content of the discussions slumps prediction model |
CN109087205A (en) * | 2018-08-10 | 2018-12-25 | 北京字节跳动网络技术有限公司 | Prediction technique and device, the computer equipment and readable storage medium storing program for executing of public opinion index |
CN109087205B (en) * | 2018-08-10 | 2020-09-18 | 北京字节跳动网络技术有限公司 | Public opinion index prediction method and device, computer equipment and readable storage medium |
CN109255714A (en) * | 2018-08-27 | 2019-01-22 | 深圳市利讯互联网金融服务有限公司 | Machine learning fund optimum decision system and its preferred method |
CN109272405A (en) * | 2018-09-30 | 2019-01-25 | 大唐碳资产有限公司 | Carbon transaction in assets method and system |
TWI692735B (en) * | 2018-10-12 | 2020-05-01 | 台北富邦商業銀行股份有限公司 | Exposure management system of corporate finance |
CN109739895A (en) * | 2018-12-07 | 2019-05-10 | 中国联合网络通信集团有限公司 | A kind of virtual article trading prediction technique and device |
CN110163758A (en) * | 2019-06-03 | 2019-08-23 | 成都慧财智科技有限公司 | Artificial intelligence Stock investment analysis system |
CN110400225A (en) * | 2019-07-29 | 2019-11-01 | 北京北信源软件股份有限公司 | A kind of market value of stock management method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106022522A (en) | Method and system for predicting stocks based on big data published by internet | |
Bhowmik et al. | Stock market volatility and return analysis: A systematic literature review | |
Meng et al. | Reinforcement learning in financial markets | |
US20130138577A1 (en) | Methods and systems for predicting market behavior based on news and sentiment analysis | |
Raman et al. | Mapping ESG trends by distant supervision of neural language models | |
Brown et al. | Financial statement adequacy and firms’ MD&A disclosures | |
Eisfeld | Entry and acquisitions in software markets | |
Yin et al. | Daily investor sentiment, order flow imbalance and stock liquidity: evidence from the Chinese stock market | |
Fang et al. | Practical machine learning approach to capture the scholar data driven alpha in AI industry | |
CN116775975A (en) | Deep learning network for analysis of complex news text public opinion in financial field | |
Singh et al. | FII flow and Indian stock market: A causal study | |
Li et al. | Forecasting stock prices changes using long-short term memory neural network with symbolic genetic programming | |
Huang et al. | Autonomous self-evolving forecasting models for price movement in high frequency trading: Evidence from Taiwan | |
Teplova et al. | A retail investor in a cobweb of social networks | |
Eickhoff et al. | Stock analysts vs. the crowd: a study on mutual prediction | |
Carter et al. | The IPO window of opportunity for digital product and service firms | |
Li | Related research on news sentiment tendency and stock price fluctuation | |
Reintjes | Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid-and Long-Term Returns | |
Coiro | Tesla: Is Now the Time to Invest? An examination of Tesla, social media, and its effect on stock | |
BEŞER et al. | The impact of foreign direct investment on tax revenues: Evidence from selected transition economies | |
Puspita Sari | INFLUENCE OF INVESTORS’ATTENTION ON STOCK RETURN, LIQUIDITY, AND RETURN VOLATILITY COMPARISON BETWEEN MANUFACTURE COMPANIES IN INDONESIA AND INDIA | |
Raju et al. | Machine Learning Algorithms for Prediction of Stock Market: A Systematic Literature Review | |
Fang et al. | Practical Machine Learning Approach for Stock Trading Strategies using Alternative Dataset | |
Zheng | To What Extent Can Social Media Be Used to Identify Potential Investments? | |
Marjanovic | Extreme Views on Reddit: Information or Noise? |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161012 |
|
RJ01 | Rejection of invention patent application after publication |