CN106780036A - A kind of moos index construction method based on internet data collection - Google Patents
A kind of moos index construction method based on internet data collection Download PDFInfo
- Publication number
- CN106780036A CN106780036A CN201611030961.2A CN201611030961A CN106780036A CN 106780036 A CN106780036 A CN 106780036A CN 201611030961 A CN201611030961 A CN 201611030961A CN 106780036 A CN106780036 A CN 106780036A
- Authority
- CN
- China
- Prior art keywords
- stock
- concern
- page
- index
- month
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Operations Research (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The present invention discloses a kind of moos index construction method based on internet data collection, comprises the following steps:Step 1, for the degree of concern of a certain investment in acquisition internet data;Step 2, builds positive and negative sentiment indicator;Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;Step 4, according to equation below computing market investor sentiment index:Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern+[B/ (A+B+C+D)] positive and negative sentiment indicators of *+[C/ (A+B+C+D)] * this month IPO quantity+[D/ (A+B+C+D)] * same day Stock Price Fluctuations;Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, the weight of same day Stock Price Fluctuation.This kind of method can provide important additional information by the moos index of IT technical limit spacing market investment persons to the investment decision in financial market.
Description
Technical field
The invention belongs to data analysis technique field, more particularly to a kind of moos index on Prediction of Stock Price builds
Method.
Background technology
In recent years, a large amount of academic researches of financial educational circles find the data of nonstandardized technique, for example, built by specific process
Anxious state of mind index of the investor on market, the index of correlation on supervising aspect policy uncertainty, and investor exist
Just negative speech on investment delivered on network etc., critical work is served for explaining and analyzing financial market fluctuation
With.Therefore, the standardization finance data such as such as stock opening price, closing price, trading volume general at present is different from, can be by structure
Establish the city the moos index of an investor, to be predicted to the daily tendency of stock price.
The content of the invention
The purpose of the present invention, is to provide a kind of moos index construction method based on internet data collection, and it can lead to
The moos index of IT technical limit spacing market investment persons is crossed, important additional information is provided to the investment decision in financial market.
In order to reach above-mentioned purpose, solution of the invention is:
A kind of moos index construction method based on internet data collection, comprises the following steps:
Step 1, for the degree of concern of a certain investment in acquisition internet data;
Step 2, builds positive and negative sentiment indicator;
Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;
Step 4, according to equation below computing market investor sentiment index:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern (Focus)+[B/ (A+B+C+D)] * is positive and negative
Sentiment indicator (PosNegSentiment)+[C/ (A+B+C+D)] * this month IPO quantity (IPOnum)+[D/ (A+B+C+D)] * works as
Day Stock Price Fluctuation (Volatility)
Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation
Weight.
The detailed content of above-mentioned steps 1 is:Entered according to the keyword that the corresponding IP in general headquarters of listed company location is delivered
Row retrieval, for a certain investment, collects the index of correlation in units of fixed time period, is used as degree of concern, and the correlation refers to
Number refers to the summation of the keyword occurrence number in fixed time period.
Above-mentioned fixed time period is for monthly, weekly, daily, each hour or minute.
The detailed content of above-mentioned steps 2 is:
Step 21, using ORACLE relevant databases, including table guba, table authors, table articles and table
Comments, builds database as follows:
1) HTTP request to stock homepage URL is sent, HTML content is obtained and is parsed, record page=1;
2) new label is searched, child node is traveled through;
I. the title of model, author, stock account, date issued, final updating date are stored in table articles;
Ii. model URL is accessed, thin looking at is obtained and page HTML content and is parsed, by content, read number, comment number and be stored in table
comments;
Iii. author URL is accessed, author's page HTML content is obtained and is parsed, author information is stored in table authors;
Iv. model stock URL is accessed, stock information is stored in table guba;
3) obtain that model is total and every page of model number, judge whether current be last page, if not last page, then
Lower one page is accessed, page=page+1, repeat step 2 is recorded);
Step 22, based on the semantic data that webpage is obtained, by the Chinese feeling polarities word word that semantic analysis field is general
Allusion quotation is analyzed using SOSA algorithms, and SOSA algorithms continue multiple random polling node, and change according to state is carried out with upper type
Become;Then, gradually by temperature drop, next round operation is repeated;
Step 23, when wrap count reaches default, or when meeting stopping criterion, terminates repetitive process, is made with current results
It is positive and negative sentiment indicator.
In above-mentioned steps 4, the computational methods of weight are:The inquiry real-time income of stock, while the concern that step 1-3 is obtained
Degree, positive and negative sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation substitute into following formula,
Positive and negative sentiment indicator+C* this month IPO quantity+D* the same day stock prices of the real-time income of stock=A* degrees of concern+B*
Fluctuation+residual error
Assuming that residual error meets normal distribution, A, the value of B, C, D are drawn using multiple linear regression.
After such scheme, influence of the present invention using nonstandardized technique data to financial field, by off-gauge network
The finance data of mood data and standard is organically combined, constitute a set of nonstandardized technique, can accurately weigh market speculation mood
Index, fund manager, air control manager and numerous investors can according to this information adjust investment tactics, evade market wind
Danger.
Specific embodiment
Below with reference to specific embodiment, technical scheme and beneficial effect are described in detail.
The present invention provides a kind of moos index construction method based on internet data collection, can accordingly be referred to by design
Number, by relevant information of the IT technologies from network required for the legal acquisition calculating index.
In theory, the moos index of in the market investor can to a certain extent predict the price in future, when
Month IPO quantity, this month Add User the factors such as quantity, of that month fluctuation of stock market situation can be from some side illustration in the market
Mood, therefore the moos index of investor how is scientifically and accurately portrayed using actual available data, and information is adopted
Arrangement after collection etc., such as semantic analysis for some keywords, compiling again for some characteristic informations, can make
With of that month IPO quantity, this month Adds User quantity, and the factor such as of that month fluctuation of stock market situation builds market sentiment index.
All Internet users can freely be exchanged and their network subsidiary by this room for discussion
IP.We expect, are carried out on a large scale according to some important keywords that the corresponding IP in general headquarters of listed company location is stated one's views
Retrieval, it is possible to build a kind of local investor of measurement relative to nonlocal investor to certain degree of concern of stock.For
Every stock, we collect its index of correlation monthly, weekly, daily, even per minute per hour, degree of concern (Focus)
It is the cumulative gained of keyword occurrence number in the specific time period, degree of concern is first index for building moos index.
Secondly, how user is obtained on the just negative speech invested from webpage, summarize the page and in general store one
A little essential informations posted, including amount of reading, comment on number, stock link, model title, author, date issued and it is last more
New date etc..The page of carefully looking at generally comprises the subtab of label, with stock model number, the work of author information, and all comments
Person's information, delivers time and comment content.Artist page then includes author's pet name, and influence power, registration date, access times are closed
Note stock and all post and comment.Thus specific features of webpage, we intend using ORACLE relevant databases, specifically
It is divided into table guba, the class form of table authors, table articles, table comments tetra-.
By analysis after the specific form and design data storage organization of webpage, our specific procedure realizes that step can
To be divided into following three step:
1st, HTTP request to stock homepage URL http is sent://xxx.com, obtains HTML content and parses, and records
Page=1;
2nd, search<Ul class=" newlist ">Label, travels through child node<li>;
I. by the title (title) of model, authorid (author), gubaid (stock account), time (date issued),
Last (final updating date) is stored in table articles.
Ii. access model URL (<A class=" note "></a>Href attributes), acquisition carefully look at page HTML content simultaneously
Parsing, table comments is stored in by content (content), readcount (reading number), commentcount (comment number).
Iii. access author URL (<Cite class=" aut "><a></a></cite>Href attributes), obtain author
Page HTML content is simultaneously parsed, and author information is stored in into table authors.
Iv. access model stock URL (<A class=" balink "></a>), stock information is stored in table guba.
3rd, search<Div id=" pageArea "></div>Label, obtains that model is total and every page of model number, judges to work as
Whether preceding be last page.If not last page, then http is accessed://xxx.com/default_N.html, wherein N=
Page+1, records page=N.Repeat step 2.
Based on the semantic data that webpage is obtained, by the Chinese feeling polarities word Dictionary use that semantic analysis field is general
SOSA algorithms are analyzed.SOSA algorithms continue multiple random polling (select and test) node, and are carried out according to upper type
State change.Then, gradually by temperature drop, next round operation is repeated.The system is similar to greedy algorithm, and pseudo-code of the algorithm is as follows
Thus second positive and negative sentiment indicator of input variable in the index of moos index is built
(PosNegSentiment), while we can also obtain this month IPO quantity (IPOnum), same day stock valency from open market
Lattice wave moves (Volatility) two indexs.Our market investment person's moos index just can be in real time drawn according to following equation:
The real-time income of stock=A* degrees of concern (Focus)+B* positive and negative sentiment indicator (PosNegSentiment)+C* works as
Month IPO quantity (IPOnum)+D* same day Stock Price Fluctuation (Volatility)+residual error
By assuming that model residual error meets normal distribution, A, the corresponding power of B, C, D can be drawn using multiple linear regression
Weight, market investment person's moos index of immediate updating is then given by:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern (Focus)+[B/ (A+B+C+D)] * is positive and negative
Sentiment indicator (PosNegSentiment)+[C/ (A+B+C+D)] * this month IPO quantity (IPOnum)+[D/ (A+B+C+D)] * works as
Day Stock Price Fluctuation (Volatility)
Investor sentiment index is higher, and the congenial mood of participant in the market that refers to is higher, shows that the increased risk in market needs to examine
Careful reply, investor sentiment index is lower, and the congenial mood of participant in the market that refers to is lower, shows that the risk in market is relatively weak.City
Field moos index is organically combined by the finance data of off-gauge network mood data and standard, a set of nonstandardized technique of composition,
The index of market speculation mood can accurately be weighed.Fund manager, air control manager and numerous investors can be according to this information
Adjustment investment tactics, evades the market risk.
Above example is only explanation technological thought of the invention, it is impossible to limit protection scope of the present invention with this, every
According to technological thought proposed by the present invention, any change done on the basis of technical scheme each falls within the scope of the present invention
Within.
Claims (5)
1. it is a kind of based on internet data collection moos index construction method, it is characterised in that comprise the following steps:
Step 1, for the degree of concern of a certain investment in acquisition internet data;
Step 2, builds positive and negative sentiment indicator;
Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;
Step 4, according to equation below computing market investor sentiment index:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern+[B/ (A+B+C+D)] positive and negative sentiment indicators of *+[C/
(A+B+C+D)] * this month IPO quantity+[D/ (A+B+C+D)] * same day Stock Price Fluctuations
Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, the power of same day Stock Price Fluctuation
Weight.
2. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that:Institute
Stating the detailed content of step 1 is:Retrieved according to the keyword that the corresponding IP in general headquarters of listed company location is delivered, for
A certain investment, collects the index of correlation in units of fixed time period, is used as degree of concern, and the index of correlation refers in fixation
The summation of keyword occurrence number in time period.
3. a kind of moos index construction method based on internet data collection as claimed in claim 2, it is characterised in that:Institute
Fixed time period is stated for monthly, weekly, daily, each hour or minute.
4. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that institute
Stating the detailed content of step 2 is:
Step 21, using ORACLE relevant databases, including table guba, table authors, table articles and table
Comments, builds database as follows:
1) HTTP request to stock homepage URL is sent, HTML content is obtained and is parsed, record page=1;
2) new label is searched, child node is traveled through;
I. the title of model, author, stock account, date issued, final updating date are stored in table articles;
Ii. model URL is accessed, thin looking at is obtained and page HTML content and is parsed, by content, read number, comment number and be stored in table
comments;
Iii. author URL is accessed, author's page HTML content is obtained and is parsed, author information is stored in table authors;
Iv. model stock URL is accessed, stock information is stored in table guba;
3) obtain that model is total and every page of model number, judge whether current be last page, if not last page, is then accessed
Lower one page, records page=page+1, repeat step 2);
Step 22, based on the semantic data that webpage is obtained, is made by the general Chinese feeling polarities word dictionary in semantic analysis field
It is analyzed with SOSA algorithms, SOSA algorithms continue multiple random polling node, and carry out state change according to upper type;So
Afterwards, gradually by temperature drop, next round operation is repeated;
Step 23, when wrap count reaches default, or when meeting stopping criterion, terminates repetitive process, using current results as just
Negative-morality index.
5. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that:Institute
State in step 4, the computational methods of weight are:The inquiry real-time income of stock, while the degree of concern that step 1-3 is obtained, positive and negative
Sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation substitute into following formula,
The positive and negative IPO quantity+D* same day in the sentiment indicator+C* this month Stock Price Fluctuations of the real-time income of stock=A* degrees of concern+B*+
Residual error
Assuming that residual error meets normal distribution, A, the value of B, C, D are drawn using multiple linear regression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611030961.2A CN106780036A (en) | 2016-11-16 | 2016-11-16 | A kind of moos index construction method based on internet data collection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611030961.2A CN106780036A (en) | 2016-11-16 | 2016-11-16 | A kind of moos index construction method based on internet data collection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106780036A true CN106780036A (en) | 2017-05-31 |
Family
ID=58970932
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611030961.2A Pending CN106780036A (en) | 2016-11-16 | 2016-11-16 | A kind of moos index construction method based on internet data collection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106780036A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109493228A (en) * | 2018-12-12 | 2019-03-19 | 安徽省泰岳祥升软件有限公司 | A kind of method and device generating stock news in brief model |
CN110096631A (en) * | 2019-03-19 | 2019-08-06 | 北京师范大学 | A kind of stock market's mood report-generating method of the text analyzing of posting based on stock forum |
CN110990672A (en) * | 2019-11-20 | 2020-04-10 | 国元证券股份有限公司 | Method for analyzing heat degree of individual stock bar |
CN113393330A (en) * | 2021-07-11 | 2021-09-14 | 北京天仪百康科贸有限公司 | Financial wind control management system based on block chain |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050163A (en) * | 2013-03-11 | 2014-09-17 | 捷达世软件(深圳)有限公司 | Content recommendation system and method |
CN104408083A (en) * | 2014-10-27 | 2015-03-11 | 六盘水职业技术学院 | Socialized media analyzing system |
CN105740353A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Calculation method and system for relevance degree of individual share and article |
CN105956770A (en) * | 2016-05-03 | 2016-09-21 | 中国科学院大学 | Stock market risk prediction platform and text excavation method thereof |
CN106056449A (en) * | 2016-05-26 | 2016-10-26 | 黑龙江省容维投资顾问有限责任公司 | Stock information push system and push method |
-
2016
- 2016-11-16 CN CN201611030961.2A patent/CN106780036A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050163A (en) * | 2013-03-11 | 2014-09-17 | 捷达世软件(深圳)有限公司 | Content recommendation system and method |
CN104408083A (en) * | 2014-10-27 | 2015-03-11 | 六盘水职业技术学院 | Socialized media analyzing system |
CN105740353A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Calculation method and system for relevance degree of individual share and article |
CN105956770A (en) * | 2016-05-03 | 2016-09-21 | 中国科学院大学 | Stock market risk prediction platform and text excavation method thereof |
CN106056449A (en) * | 2016-05-26 | 2016-10-26 | 黑龙江省容维投资顾问有限责任公司 | Stock information push system and push method |
Non-Patent Citations (2)
Title |
---|
杜伟夫等: "一种新的情感词汇语义倾向计算方法 ", 《计算机研究与发展》 * |
马涛等: "《投资者关注度对股市收益影响研究》", 《特区经济》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109493228A (en) * | 2018-12-12 | 2019-03-19 | 安徽省泰岳祥升软件有限公司 | A kind of method and device generating stock news in brief model |
CN110096631A (en) * | 2019-03-19 | 2019-08-06 | 北京师范大学 | A kind of stock market's mood report-generating method of the text analyzing of posting based on stock forum |
CN110096631B (en) * | 2019-03-19 | 2021-03-05 | 北京师范大学 | Stock market emotion report generation method based on postings text analysis of stock forum |
CN110990672A (en) * | 2019-11-20 | 2020-04-10 | 国元证券股份有限公司 | Method for analyzing heat degree of individual stock bar |
CN110990672B (en) * | 2019-11-20 | 2023-06-06 | 国元证券股份有限公司 | Heat analysis method for single-strand bar |
CN113393330A (en) * | 2021-07-11 | 2021-09-14 | 北京天仪百康科贸有限公司 | Financial wind control management system based on block chain |
CN113393330B (en) * | 2021-07-11 | 2022-12-23 | 深圳市鼎驰科技发展有限公司 | Financial wind control management system based on block chain |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Groß-Klußmann et al. | Buzzwords build momentum: Global financial Twitter sentiment and the aggregate stock market | |
Patthi et al. | Altmetrics–a collated adjunct beyond citations for scholarly impact: a systematic review | |
Wildgaard et al. | A review of the characteristics of 108 author-level bibliometric indicators | |
US8166032B2 (en) | System and method for sentiment-based text classification and relevancy ranking | |
TWI601088B (en) | Topic management network public opinion evaluation management system and method | |
Ruiz et al. | Correlating financial time series with micro-blogging activity | |
US8781989B2 (en) | Method and system to predict a data value | |
CN103177090B (en) | A kind of topic detection method and device based on big data | |
CN106780036A (en) | A kind of moos index construction method based on internet data collection | |
CN105740353A (en) | Calculation method and system for relevance degree of individual share and article | |
Huang et al. | It is an equal failing to trust everybody and to trust nobody: Stock price prediction using trust filters and enhanced user sentiment on Twitter | |
Mukherjee | Do open‐access journals in library and information science have any scholarly impact? A bibliometric study of selected open‐access journals using Google Scholar | |
Dong et al. | Micro-blog social moods and Chinese stock market: The influence of emotional valence and arousal on Shanghai Composite Index volume | |
Jin et al. | CT-Rank: A Time-aware Ranking Algorithm for Web Search. | |
CN110096631B (en) | Stock market emotion report generation method based on postings text analysis of stock forum | |
Yu et al. | Dynamic effects of climate policy uncertainty on green bond volatility: an empirical investigation based on TVP-VAR models | |
US9262395B1 (en) | System, methods, and data structure for quantitative assessment of symbolic associations | |
Han et al. | Prediction of investor-specific trading trends in South Korean stock markets using a BILSTM prediction model based on sentiment analysis of financial news articles | |
Zhao et al. | Dynamic impacts of online investor sentiment on international crude oil prices | |
Zhang et al. | Stock trend forecasting method based on sentiment analysis and system similarity model | |
Wong et al. | Predictive power of public emotions as extracted from daily news articles on the movements of stock market indices | |
Grant et al. | EDGAR extraction system: An automated approach to analyze employee stock option disclosures | |
Sakaji et al. | Verification of Data Similarity using Metadata on a Data Exchange Platform | |
Chen et al. | Quantifying the effect of real estate news on Chinese stock movements | |
Wang et al. | Building consumer confidence index based on social media big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |