CN106780036A - A kind of moos index construction method based on internet data collection - Google Patents

A kind of moos index construction method based on internet data collection Download PDF

Info

Publication number
CN106780036A
CN106780036A CN201611030961.2A CN201611030961A CN106780036A CN 106780036 A CN106780036 A CN 106780036A CN 201611030961 A CN201611030961 A CN 201611030961A CN 106780036 A CN106780036 A CN 106780036A
Authority
CN
China
Prior art keywords
stock
concern
page
index
month
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611030961.2A
Other languages
Chinese (zh)
Inventor
都科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Master Orange (xiamen) Technology Co Ltd
Original Assignee
Master Orange (xiamen) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Master Orange (xiamen) Technology Co Ltd filed Critical Master Orange (xiamen) Technology Co Ltd
Priority to CN201611030961.2A priority Critical patent/CN106780036A/en
Publication of CN106780036A publication Critical patent/CN106780036A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present invention discloses a kind of moos index construction method based on internet data collection, comprises the following steps:Step 1, for the degree of concern of a certain investment in acquisition internet data;Step 2, builds positive and negative sentiment indicator;Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;Step 4, according to equation below computing market investor sentiment index:Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern+[B/ (A+B+C+D)] positive and negative sentiment indicators of *+[C/ (A+B+C+D)] * this month IPO quantity+[D/ (A+B+C+D)] * same day Stock Price Fluctuations;Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, the weight of same day Stock Price Fluctuation.This kind of method can provide important additional information by the moos index of IT technical limit spacing market investment persons to the investment decision in financial market.

Description

A kind of moos index construction method based on internet data collection
Technical field
The invention belongs to data analysis technique field, more particularly to a kind of moos index on Prediction of Stock Price builds Method.
Background technology
In recent years, a large amount of academic researches of financial educational circles find the data of nonstandardized technique, for example, built by specific process Anxious state of mind index of the investor on market, the index of correlation on supervising aspect policy uncertainty, and investor exist Just negative speech on investment delivered on network etc., critical work is served for explaining and analyzing financial market fluctuation With.Therefore, the standardization finance data such as such as stock opening price, closing price, trading volume general at present is different from, can be by structure Establish the city the moos index of an investor, to be predicted to the daily tendency of stock price.
The content of the invention
The purpose of the present invention, is to provide a kind of moos index construction method based on internet data collection, and it can lead to The moos index of IT technical limit spacing market investment persons is crossed, important additional information is provided to the investment decision in financial market.
In order to reach above-mentioned purpose, solution of the invention is:
A kind of moos index construction method based on internet data collection, comprises the following steps:
Step 1, for the degree of concern of a certain investment in acquisition internet data;
Step 2, builds positive and negative sentiment indicator;
Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;
Step 4, according to equation below computing market investor sentiment index:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern (Focus)+[B/ (A+B+C+D)] * is positive and negative Sentiment indicator (PosNegSentiment)+[C/ (A+B+C+D)] * this month IPO quantity (IPOnum)+[D/ (A+B+C+D)] * works as Day Stock Price Fluctuation (Volatility)
Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation Weight.
The detailed content of above-mentioned steps 1 is:Entered according to the keyword that the corresponding IP in general headquarters of listed company location is delivered Row retrieval, for a certain investment, collects the index of correlation in units of fixed time period, is used as degree of concern, and the correlation refers to Number refers to the summation of the keyword occurrence number in fixed time period.
Above-mentioned fixed time period is for monthly, weekly, daily, each hour or minute.
The detailed content of above-mentioned steps 2 is:
Step 21, using ORACLE relevant databases, including table guba, table authors, table articles and table Comments, builds database as follows:
1) HTTP request to stock homepage URL is sent, HTML content is obtained and is parsed, record page=1;
2) new label is searched, child node is traveled through;
I. the title of model, author, stock account, date issued, final updating date are stored in table articles;
Ii. model URL is accessed, thin looking at is obtained and page HTML content and is parsed, by content, read number, comment number and be stored in table comments;
Iii. author URL is accessed, author's page HTML content is obtained and is parsed, author information is stored in table authors;
Iv. model stock URL is accessed, stock information is stored in table guba;
3) obtain that model is total and every page of model number, judge whether current be last page, if not last page, then Lower one page is accessed, page=page+1, repeat step 2 is recorded);
Step 22, based on the semantic data that webpage is obtained, by the Chinese feeling polarities word word that semantic analysis field is general Allusion quotation is analyzed using SOSA algorithms, and SOSA algorithms continue multiple random polling node, and change according to state is carried out with upper type Become;Then, gradually by temperature drop, next round operation is repeated;
Step 23, when wrap count reaches default, or when meeting stopping criterion, terminates repetitive process, is made with current results It is positive and negative sentiment indicator.
In above-mentioned steps 4, the computational methods of weight are:The inquiry real-time income of stock, while the concern that step 1-3 is obtained Degree, positive and negative sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation substitute into following formula,
Positive and negative sentiment indicator+C* this month IPO quantity+D* the same day stock prices of the real-time income of stock=A* degrees of concern+B* Fluctuation+residual error
Assuming that residual error meets normal distribution, A, the value of B, C, D are drawn using multiple linear regression.
After such scheme, influence of the present invention using nonstandardized technique data to financial field, by off-gauge network The finance data of mood data and standard is organically combined, constitute a set of nonstandardized technique, can accurately weigh market speculation mood Index, fund manager, air control manager and numerous investors can according to this information adjust investment tactics, evade market wind Danger.
Specific embodiment
Below with reference to specific embodiment, technical scheme and beneficial effect are described in detail.
The present invention provides a kind of moos index construction method based on internet data collection, can accordingly be referred to by design Number, by relevant information of the IT technologies from network required for the legal acquisition calculating index.
In theory, the moos index of in the market investor can to a certain extent predict the price in future, when Month IPO quantity, this month Add User the factors such as quantity, of that month fluctuation of stock market situation can be from some side illustration in the market Mood, therefore the moos index of investor how is scientifically and accurately portrayed using actual available data, and information is adopted Arrangement after collection etc., such as semantic analysis for some keywords, compiling again for some characteristic informations, can make With of that month IPO quantity, this month Adds User quantity, and the factor such as of that month fluctuation of stock market situation builds market sentiment index.
All Internet users can freely be exchanged and their network subsidiary by this room for discussion IP.We expect, are carried out on a large scale according to some important keywords that the corresponding IP in general headquarters of listed company location is stated one's views Retrieval, it is possible to build a kind of local investor of measurement relative to nonlocal investor to certain degree of concern of stock.For Every stock, we collect its index of correlation monthly, weekly, daily, even per minute per hour, degree of concern (Focus) It is the cumulative gained of keyword occurrence number in the specific time period, degree of concern is first index for building moos index.
Secondly, how user is obtained on the just negative speech invested from webpage, summarize the page and in general store one A little essential informations posted, including amount of reading, comment on number, stock link, model title, author, date issued and it is last more New date etc..The page of carefully looking at generally comprises the subtab of label, with stock model number, the work of author information, and all comments Person's information, delivers time and comment content.Artist page then includes author's pet name, and influence power, registration date, access times are closed Note stock and all post and comment.Thus specific features of webpage, we intend using ORACLE relevant databases, specifically It is divided into table guba, the class form of table authors, table articles, table comments tetra-.
By analysis after the specific form and design data storage organization of webpage, our specific procedure realizes that step can To be divided into following three step:
1st, HTTP request to stock homepage URL http is sent://xxx.com, obtains HTML content and parses, and records Page=1;
2nd, search<Ul class=" newlist ">Label, travels through child node<li>;
I. by the title (title) of model, authorid (author), gubaid (stock account), time (date issued), Last (final updating date) is stored in table articles.
Ii. access model URL (<A class=" note "></a>Href attributes), acquisition carefully look at page HTML content simultaneously Parsing, table comments is stored in by content (content), readcount (reading number), commentcount (comment number).
Iii. access author URL (<Cite class=" aut "><a></a></cite>Href attributes), obtain author Page HTML content is simultaneously parsed, and author information is stored in into table authors.
Iv. access model stock URL (<A class=" balink "></a>), stock information is stored in table guba.
3rd, search<Div id=" pageArea "></div>Label, obtains that model is total and every page of model number, judges to work as Whether preceding be last page.If not last page, then http is accessed://xxx.com/default_N.html, wherein N= Page+1, records page=N.Repeat step 2.
Based on the semantic data that webpage is obtained, by the Chinese feeling polarities word Dictionary use that semantic analysis field is general SOSA algorithms are analyzed.SOSA algorithms continue multiple random polling (select and test) node, and are carried out according to upper type State change.Then, gradually by temperature drop, next round operation is repeated.The system is similar to greedy algorithm, and pseudo-code of the algorithm is as follows
Thus second positive and negative sentiment indicator of input variable in the index of moos index is built (PosNegSentiment), while we can also obtain this month IPO quantity (IPOnum), same day stock valency from open market Lattice wave moves (Volatility) two indexs.Our market investment person's moos index just can be in real time drawn according to following equation:
The real-time income of stock=A* degrees of concern (Focus)+B* positive and negative sentiment indicator (PosNegSentiment)+C* works as Month IPO quantity (IPOnum)+D* same day Stock Price Fluctuation (Volatility)+residual error
By assuming that model residual error meets normal distribution, A, the corresponding power of B, C, D can be drawn using multiple linear regression Weight, market investment person's moos index of immediate updating is then given by:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern (Focus)+[B/ (A+B+C+D)] * is positive and negative Sentiment indicator (PosNegSentiment)+[C/ (A+B+C+D)] * this month IPO quantity (IPOnum)+[D/ (A+B+C+D)] * works as Day Stock Price Fluctuation (Volatility)
Investor sentiment index is higher, and the congenial mood of participant in the market that refers to is higher, shows that the increased risk in market needs to examine Careful reply, investor sentiment index is lower, and the congenial mood of participant in the market that refers to is lower, shows that the risk in market is relatively weak.City Field moos index is organically combined by the finance data of off-gauge network mood data and standard, a set of nonstandardized technique of composition, The index of market speculation mood can accurately be weighed.Fund manager, air control manager and numerous investors can be according to this information Adjustment investment tactics, evades the market risk.
Above example is only explanation technological thought of the invention, it is impossible to limit protection scope of the present invention with this, every According to technological thought proposed by the present invention, any change done on the basis of technical scheme each falls within the scope of the present invention Within.

Claims (5)

1. it is a kind of based on internet data collection moos index construction method, it is characterised in that comprise the following steps:
Step 1, for the degree of concern of a certain investment in acquisition internet data;
Step 2, builds positive and negative sentiment indicator;
Step 3, two indexs of this month IPO quantity and same day Stock Price Fluctuation are obtained from open market;
Step 4, according to equation below computing market investor sentiment index:
Market investment person's moos index=[A/ (A+B+C+D)] * degrees of concern+[B/ (A+B+C+D)] positive and negative sentiment indicators of *+[C/ (A+B+C+D)] * this month IPO quantity+[D/ (A+B+C+D)] * same day Stock Price Fluctuations
Wherein, A, B, C, D are respectively degree of concern, positive and negative sentiment indicator, this month IPO quantity, the power of same day Stock Price Fluctuation Weight.
2. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that:Institute Stating the detailed content of step 1 is:Retrieved according to the keyword that the corresponding IP in general headquarters of listed company location is delivered, for A certain investment, collects the index of correlation in units of fixed time period, is used as degree of concern, and the index of correlation refers in fixation The summation of keyword occurrence number in time period.
3. a kind of moos index construction method based on internet data collection as claimed in claim 2, it is characterised in that:Institute Fixed time period is stated for monthly, weekly, daily, each hour or minute.
4. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that institute Stating the detailed content of step 2 is:
Step 21, using ORACLE relevant databases, including table guba, table authors, table articles and table Comments, builds database as follows:
1) HTTP request to stock homepage URL is sent, HTML content is obtained and is parsed, record page=1;
2) new label is searched, child node is traveled through;
I. the title of model, author, stock account, date issued, final updating date are stored in table articles;
Ii. model URL is accessed, thin looking at is obtained and page HTML content and is parsed, by content, read number, comment number and be stored in table comments;
Iii. author URL is accessed, author's page HTML content is obtained and is parsed, author information is stored in table authors;
Iv. model stock URL is accessed, stock information is stored in table guba;
3) obtain that model is total and every page of model number, judge whether current be last page, if not last page, is then accessed Lower one page, records page=page+1, repeat step 2);
Step 22, based on the semantic data that webpage is obtained, is made by the general Chinese feeling polarities word dictionary in semantic analysis field It is analyzed with SOSA algorithms, SOSA algorithms continue multiple random polling node, and carry out state change according to upper type;So Afterwards, gradually by temperature drop, next round operation is repeated;
Step 23, when wrap count reaches default, or when meeting stopping criterion, terminates repetitive process, using current results as just Negative-morality index.
5. a kind of moos index construction method based on internet data collection as claimed in claim 1, it is characterised in that:Institute State in step 4, the computational methods of weight are:The inquiry real-time income of stock, while the degree of concern that step 1-3 is obtained, positive and negative Sentiment indicator, this month IPO quantity, same day Stock Price Fluctuation substitute into following formula,
The positive and negative IPO quantity+D* same day in the sentiment indicator+C* this month Stock Price Fluctuations of the real-time income of stock=A* degrees of concern+B*+ Residual error
Assuming that residual error meets normal distribution, A, the value of B, C, D are drawn using multiple linear regression.
CN201611030961.2A 2016-11-16 2016-11-16 A kind of moos index construction method based on internet data collection Pending CN106780036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611030961.2A CN106780036A (en) 2016-11-16 2016-11-16 A kind of moos index construction method based on internet data collection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611030961.2A CN106780036A (en) 2016-11-16 2016-11-16 A kind of moos index construction method based on internet data collection

Publications (1)

Publication Number Publication Date
CN106780036A true CN106780036A (en) 2017-05-31

Family

ID=58970932

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611030961.2A Pending CN106780036A (en) 2016-11-16 2016-11-16 A kind of moos index construction method based on internet data collection

Country Status (1)

Country Link
CN (1) CN106780036A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493228A (en) * 2018-12-12 2019-03-19 安徽省泰岳祥升软件有限公司 A kind of method and device generating stock news in brief model
CN110096631A (en) * 2019-03-19 2019-08-06 北京师范大学 A kind of stock market's mood report-generating method of the text analyzing of posting based on stock forum
CN110990672A (en) * 2019-11-20 2020-04-10 国元证券股份有限公司 Method for analyzing heat degree of individual stock bar
CN113393330A (en) * 2021-07-11 2021-09-14 北京天仪百康科贸有限公司 Financial wind control management system based on block chain

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050163A (en) * 2013-03-11 2014-09-17 捷达世软件(深圳)有限公司 Content recommendation system and method
CN104408083A (en) * 2014-10-27 2015-03-11 六盘水职业技术学院 Socialized media analyzing system
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN105956770A (en) * 2016-05-03 2016-09-21 中国科学院大学 Stock market risk prediction platform and text excavation method thereof
CN106056449A (en) * 2016-05-26 2016-10-26 黑龙江省容维投资顾问有限责任公司 Stock information push system and push method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050163A (en) * 2013-03-11 2014-09-17 捷达世软件(深圳)有限公司 Content recommendation system and method
CN104408083A (en) * 2014-10-27 2015-03-11 六盘水职业技术学院 Socialized media analyzing system
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN105956770A (en) * 2016-05-03 2016-09-21 中国科学院大学 Stock market risk prediction platform and text excavation method thereof
CN106056449A (en) * 2016-05-26 2016-10-26 黑龙江省容维投资顾问有限责任公司 Stock information push system and push method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杜伟夫等: "一种新的情感词汇语义倾向计算方法 ", 《计算机研究与发展》 *
马涛等: "《投资者关注度对股市收益影响研究》", 《特区经济》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493228A (en) * 2018-12-12 2019-03-19 安徽省泰岳祥升软件有限公司 A kind of method and device generating stock news in brief model
CN110096631A (en) * 2019-03-19 2019-08-06 北京师范大学 A kind of stock market's mood report-generating method of the text analyzing of posting based on stock forum
CN110096631B (en) * 2019-03-19 2021-03-05 北京师范大学 Stock market emotion report generation method based on postings text analysis of stock forum
CN110990672A (en) * 2019-11-20 2020-04-10 国元证券股份有限公司 Method for analyzing heat degree of individual stock bar
CN110990672B (en) * 2019-11-20 2023-06-06 国元证券股份有限公司 Heat analysis method for single-strand bar
CN113393330A (en) * 2021-07-11 2021-09-14 北京天仪百康科贸有限公司 Financial wind control management system based on block chain
CN113393330B (en) * 2021-07-11 2022-12-23 深圳市鼎驰科技发展有限公司 Financial wind control management system based on block chain

Similar Documents

Publication Publication Date Title
Groß-Klußmann et al. Buzzwords build momentum: Global financial Twitter sentiment and the aggregate stock market
Patthi et al. Altmetrics–a collated adjunct beyond citations for scholarly impact: a systematic review
Wildgaard et al. A review of the characteristics of 108 author-level bibliometric indicators
US8166032B2 (en) System and method for sentiment-based text classification and relevancy ranking
TWI601088B (en) Topic management network public opinion evaluation management system and method
Ruiz et al. Correlating financial time series with micro-blogging activity
US8781989B2 (en) Method and system to predict a data value
CN103177090B (en) A kind of topic detection method and device based on big data
CN106780036A (en) A kind of moos index construction method based on internet data collection
CN105740353A (en) Calculation method and system for relevance degree of individual share and article
Huang et al. It is an equal failing to trust everybody and to trust nobody: Stock price prediction using trust filters and enhanced user sentiment on Twitter
Mukherjee Do open‐access journals in library and information science have any scholarly impact? A bibliometric study of selected open‐access journals using Google Scholar
Dong et al. Micro-blog social moods and Chinese stock market: The influence of emotional valence and arousal on Shanghai Composite Index volume
Jin et al. CT-Rank: A Time-aware Ranking Algorithm for Web Search.
CN110096631B (en) Stock market emotion report generation method based on postings text analysis of stock forum
Yu et al. Dynamic effects of climate policy uncertainty on green bond volatility: an empirical investigation based on TVP-VAR models
US9262395B1 (en) System, methods, and data structure for quantitative assessment of symbolic associations
Han et al. Prediction of investor-specific trading trends in South Korean stock markets using a BILSTM prediction model based on sentiment analysis of financial news articles
Zhao et al. Dynamic impacts of online investor sentiment on international crude oil prices
Zhang et al. Stock trend forecasting method based on sentiment analysis and system similarity model
Wong et al. Predictive power of public emotions as extracted from daily news articles on the movements of stock market indices
Grant et al. EDGAR extraction system: An automated approach to analyze employee stock option disclosures
Sakaji et al. Verification of Data Similarity using Metadata on a Data Exchange Platform
Chen et al. Quantifying the effect of real estate news on Chinese stock movements
Wang et al. Building consumer confidence index based on social media big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication