CN107025264A - A kind of automatic share-selecting method based on news big data - Google Patents

A kind of automatic share-selecting method based on news big data Download PDF

Info

Publication number
CN107025264A
CN107025264A CN201710076418.4A CN201710076418A CN107025264A CN 107025264 A CN107025264 A CN 107025264A CN 201710076418 A CN201710076418 A CN 201710076418A CN 107025264 A CN107025264 A CN 107025264A
Authority
CN
China
Prior art keywords
vocabulary
industry
news
emotion
news content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710076418.4A
Other languages
Chinese (zh)
Inventor
张铁军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minnan Normal University
Original Assignee
Minnan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minnan Normal University filed Critical Minnan Normal University
Priority to CN201710076418.4A priority Critical patent/CN107025264A/en
Publication of CN107025264A publication Critical patent/CN107025264A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of automatic share-selecting method based on news big data.Adopt the technical scheme that:Emotion vocabulary and vocabulary are stored in memory first, above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Obtain internet financial and economic news in real time by RSS, updated once per half an hour;The newsletter archive content on the same day is parsed by server, the text analyzing of news content, including two subdivisions, 1)The emotion dimensional analysis of news content, calculates the Sentiment orientation for obtaining news content;2)The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment.Stock ranking is calculated using Sentiment orientation, industry attention rate, selects some stocks in the top to be used as investee.The present invention by multi-angle by that to having the vocabulary of emotion dimension index and industry attention rate index to carry out filtering statistical in news, can be excavated and obtain more information that influence is produced on stock price.

Description

A kind of automatic share-selecting method based on news big data
Technical field
Specifically it is exactly a kind of automatic share-selecting method based on news big data the present invention relates to information retrieval field.
Background technology
Now, quantify to make an investment in management assets and play more and more important effect, everybody starts with computer equipment To historical stock amount valence mumber according to as data resource, verified by mathematical modeling, for carrying out quantization investment to stock.It is existing There is scheme to pay close attention to personal share characteristic analysis, do not selected stocks automatically from industry attention rate angle.Because directive propaganda by mass media also can simultaneously There is certain forewarning function to the trend of stock certificate data(" emotion that the present invention excavates news using glossary statistic analytic approach is inclined To completion is selected stocks automatically ").Briefly, the Sentiment orientation of news content can be divided into two kinds of situations:Positive emotion and negative feelings Sense, positive emotion uses the positive emotion vocabulary ratio in news content to represent, negative emotion uses negative in news content Vocabulary ratio is represented.How the Sentiment orientation embodied using news content, make its serve quantization investment, be that the present invention will be solved Technical problem certainly.
The content of the invention
It is an object of the invention to provide a kind of automatic share-selecting method based on news big data, first by emotion vocabulary and row Industry vocabulary is stored in memory, and above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Obtained in real time by RSS Internet financial and economic news is taken, is updated per hour once;The news content on the same day is passed through into the news on the day of server analytical analysis Content, news content analysis includes two subdivisions, 1)The emotion dimensional analysis of news content, calculates the feelings for obtaining news content Sense tendency;2)The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment;Inclined using emotion Stock ranking is calculated to, industry attention rate, selects stock in the top to be used as investee.
News content is resolved to the set of vocabulary, i.e.,, t represents vocabulary total number), wherein wrapping Include r positive emotion vocabulary, s negative emotion vocabulary, in i-th day, front vocabulary ratio For, the ratio represents the positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative feelings of news Sense;
In i-th day, industry x attention rate is, computational methods are, wherein, y represents industry x phases in news content Vocabulary number is closed, t is total vocabulary number;
In i-th day, set industry x positive attention rate as,= ×;Set industry x negative attention rate For,= ×
Within past one month, setting industry x accumulative temperature(Attention rate)For, = , Wherein { i=1 ..., m }, m is of that month number of days;
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x=1 ..., 24 }, totally 24 industries;FoundationThe numerical value of { x=1 ..., 24 } from high to low, completes the sequence of 24 industries;This method is selected The whole corporate shares for selecting the industry ranked the first are used as the investee of next month.
The theoretical foundation of the present invention is to be based on analysis below:Emotion vocabulary has psychological significance, and front vocabulary is represented Positive psychology is implied;Negative vocabulary represents mankind's passive attitude hint;For example, the vocabulary such as " limit-up, good, good harvest " is embodied Positive attitude in news content, and the vocabulary such as " limit down, weakness, dispirited " embodies the passive attitude in news content.When new In news during negative vocabulary ratio increase, market presents the expection of pessimistic passiveness, the increase of stock market's downside risks.Industry vocabulary has Stronger industry directive property, for example, " non-performing loan " is generally directed to the listed company of banking, " passenger car " is generally directed to automobile The listed company of industry.When the ratio increase of the industry vocabulary of certain in news, market focus turns to the sector, the sector it is upper Company of city will be paid close attention to by more investors.
The present invention is selected stocks by emotion dimension, the industry dimension of news big data;Existing scheme pays close attention to personal share characteristic analysis, Do not selected stocks automatically from industry temperature angle.This programme by vocabulary association confirm news content embody Sentiment orientation and Industry attention rate, is to innovation of the prior art.News big data is selected stocks advantage automatically:1)The emotion of news(Positive negative emotion Intensity)The theoretical foundation of interaction relation is confirmed extensively between stock market, listed company.2)Automatically extract Sentiment orientation With industry attention rate, full-automatic ranking screens stock.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart that the present invention performs algorithm.
Embodiment
Below, with reference to the main flow and execution algorithm flow chart of the present invention, the present invention is described in further detail.
Keyword:Emotion vocabulary, industry vocabulary.Emotion vocabulary refers to the vocabulary in emotion word lists, including Two parts of front vocabulary and negative vocabulary.What industry vocabulary was obtained after being collected for the common keyword of each industry Vocabulary.Above-mentioned emotion vocabulary and industry vocabulary are derived from《Special Chinese finance and economics dictionary》, applicant, which has compiled, to bind into book form.
For example, front vocabulary is included:It is successful, outstanding, richly endowed by nature, lead, improve, innovating.Negative vocabulary bag Include failure, loss, deficiency, difference comment, recall, depression etc..
Industry vocabulary, for example, in banking, common keyword is interest, loan, the Banking Supervision Commission, Central Bank, interest rate, Credit etc..In real estate industry, common the keyword purchase of property, first suite, house, plot, the commercial house, real estate market etc..
Industry company, the company that represents for referring to banking is Minsheng bank, China Merchants Bank, Nanjing bank, safety bank etc.. The enterprise that represents of real estate industry is Wanke A, Poly real estate, China's happiness, COUNTRY GARDEN etc..
This process monitors open press source by RSS and obtained, for example, People's Net RSS, www.xinhuanet.com RSS etc..In order to ensure reality Shi Xing, this method each hour updates once to news.
It is assumed that daily(To 24 points of that night 0 point on the day of the Beijing time, of that month last day is at 0 point and started to that night 23 points, similarly hereinafter)News content be made up of t Chinese vocabulary, including r positive emotion vocabulary, s negative emotion vocabulary.In i-th day, front vocabulary ratio is, ratio representative The positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative emotion of news.
The industry dimensional analysis of news
According to《Shenyin & Wanguo's professional museum》(2014)28 one-level categorys of employment, the industry dimension of this patent also has 28, one industry of each dimension correspondence.This method is each industry setting " industry temperature ", and industry temperature represents news to spy Determine the degree of concern of industry.It is assumed that industry x temperature is in i-th day, computational methods are, wherein, y represents industry X relative words numbers, t is total vocabulary number.It is higher, show that news is more to industry x Reporting, industry x heat Degree is higher.If i-th day expert x relative words number is 0,=0。
Stock ranking is calculated
This method in every month last day evening 23:00 pair of stock carries out ranking, calculates of that month daily industry temperature, while calculating of that month daily news positive emotionAnd negative emotion, wherein { i=1 ... .m }, m is this month Number of days.
1)The positive temperature of industry and the negative temperature of industry
In i-th day, set industry x positive temperature as,= × .Similarly, in i-th day, setting row Industry x negative temperature is,= ×
2)The month to date temperature of industry
Within past one month, set industry x accumulative temperature as, = , wherein i= 1 ... .m }, m is of that month number of days.
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x= 1 ... 28 }, totally 28 industries.FoundationThe numerical value of { x=1 ... 28 } from high to low, completes the sequence of 28 industries.It is assumed that industry x1Include y1Individual company, industry x2Include y2Individual company, industry x3Include y3Individual company, the industry that this method selection ranks the first Whole corporate shares are used as investee.

Claims (2)

1. a kind of automatic share-selecting method based on news big data, it is characterised in that:Emotion vocabulary and industry vocabulary are deposited first Enter memory, above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Internet is obtained by RSS in real time Financial and economic news, updates once per hour;The news content on the same day is passed through into the news content on the day of server analytical analysis, news Content analysis includes two subdivisions, 1)The emotion dimensional analysis of news content, calculates the Sentiment orientation for obtaining news content;2) The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment;Paid close attention to using Sentiment orientation, industry Degree calculates stock ranking, selects stock in the top to be used as investee.
2. the automatic share-selecting method according to claim 1 based on news big data, it is characterised in that:By news content solution The set for vocabulary is analysed, i.e.,, t represents vocabulary total number), including r positive emotion vocabulary, s negative emotion vocabulary, in i-th day, front vocabulary ratio is, ratio representative The positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative emotion of news;
In i-th day, industry x attention rate is, computational methods are, wherein, y represents industry x phases in news content Vocabulary number is closed, t is total vocabulary number;
In i-th day, set industry x positive attention rate as,= × ;Set industry x negative concern Spend and be,= ×
Within past one month, setting industry x accumulative temperature(Attention rate)For, = , Wherein { i=1 ..., m }, m is of that month number of days;
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x=1 ..., 24 }, totally 24 industries;FoundationThe numerical value of { x=1 ..., 24 } from high to low, completes the sequence of 24 industries;This method is selected The whole corporate shares for selecting the industry ranked the first are used as the investee of next month.
CN201710076418.4A 2017-02-13 2017-02-13 A kind of automatic share-selecting method based on news big data Pending CN107025264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710076418.4A CN107025264A (en) 2017-02-13 2017-02-13 A kind of automatic share-selecting method based on news big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710076418.4A CN107025264A (en) 2017-02-13 2017-02-13 A kind of automatic share-selecting method based on news big data

Publications (1)

Publication Number Publication Date
CN107025264A true CN107025264A (en) 2017-08-08

Family

ID=59526166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710076418.4A Pending CN107025264A (en) 2017-02-13 2017-02-13 A kind of automatic share-selecting method based on news big data

Country Status (1)

Country Link
CN (1) CN107025264A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213934A (en) * 2018-08-23 2019-01-15 阿里巴巴集团控股有限公司 A kind of processing method of resource, device and equipment
CN110889024A (en) * 2019-10-25 2020-03-17 武汉灯塔之光科技有限公司 Method and device for calculating information-related stock
CN111241399A (en) * 2020-01-10 2020-06-05 杜长江 Method for evaluating attention of listed companies
CN112862617A (en) * 2019-11-27 2021-05-28 泰康保险集团股份有限公司 Data processing method, system, storage medium and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226554A (en) * 2012-12-14 2013-07-31 西藏同信证券有限责任公司 Automatic stock matching and classifying method and system based on news data
CN103778215A (en) * 2014-01-17 2014-05-07 北京理工大学 Stock market forecasting method based on sentiment analysis and hidden Markov fusion model
CN105022825A (en) * 2015-07-22 2015-11-04 中国人民解放军国防科学技术大学 Financial variety price prediction method capable of combining financial news mining and financial historical data
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN105786962A (en) * 2016-01-15 2016-07-20 优品财富管理有限公司 Big data index analysis method and system based on news transmissibility
CN106384166A (en) * 2016-09-12 2017-02-08 中山大学 Deep learning stock market prediction method combined with financial news

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226554A (en) * 2012-12-14 2013-07-31 西藏同信证券有限责任公司 Automatic stock matching and classifying method and system based on news data
CN103778215A (en) * 2014-01-17 2014-05-07 北京理工大学 Stock market forecasting method based on sentiment analysis and hidden Markov fusion model
CN105022825A (en) * 2015-07-22 2015-11-04 中国人民解放军国防科学技术大学 Financial variety price prediction method capable of combining financial news mining and financial historical data
CN105786962A (en) * 2016-01-15 2016-07-20 优品财富管理有限公司 Big data index analysis method and system based on news transmissibility
CN105740353A (en) * 2016-01-26 2016-07-06 中国人民解放军国防科学技术大学 Calculation method and system for relevance degree of individual share and article
CN106384166A (en) * 2016-09-12 2017-02-08 中山大学 Deep learning stock market prediction method combined with financial news

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213934A (en) * 2018-08-23 2019-01-15 阿里巴巴集团控股有限公司 A kind of processing method of resource, device and equipment
CN110889024A (en) * 2019-10-25 2020-03-17 武汉灯塔之光科技有限公司 Method and device for calculating information-related stock
CN112862617A (en) * 2019-11-27 2021-05-28 泰康保险集团股份有限公司 Data processing method, system, storage medium and electronic device
CN111241399A (en) * 2020-01-10 2020-06-05 杜长江 Method for evaluating attention of listed companies
CN111241399B (en) * 2020-01-10 2023-07-04 杜长江 Evaluation method for attention of marketing company

Similar Documents

Publication Publication Date Title
Kravchenko et al. The digitalization as a global trend and growth factor of the modern economy
Magri Debt maturity choice of nonpublic Italian firms
CN109767318A (en) Loan product recommended method, device, equipment and storage medium
Gallagher et al. Standardizing sustainable development: A comparison of development banks in the Americas
CN110428322A (en) A kind of adaptation method and device of business datum
US20090254469A1 (en) System for Cash, Expense And Withdrawal Allocation Across Assets and Liabilities to Maximize Net Worth Over a Specified Period
CN107025264A (en) A kind of automatic share-selecting method based on news big data
Hanh Does WTO accession matter for the dynamics of foreign direct investment and trade? Vietnam’s new evidence 1
Chun-Hao et al. A bibliometric study of financial risk literature: a historic approach
Ashton et al. Remaking mortgage markets by remaking mortgages: US housing finance after the crisis
US20210295434A1 (en) Platform for research, analysis, and communications compliance of investment data
Norisnita et al. Application of theory of planned behavior (TPB) in cryptocurrency investment prediction: A literature review
Tröster et al. Delivering on promises? The expected impacts and implementation challenges of the economic partnership agreements between the European Union and Africa
Yudowati et al. Big data framework for auditing process
US8626658B1 (en) Methods, systems and apparatus for providing a dynamic account list in an online financial services system
Avgouleas et al. The architecture of decentralised finance platforms: a new open finance paradigm
Ray Who controls multilateral development finance?
Miori et al. Clustering Uniswap v3 traders from their activity on multiple liquidity pools, via novel graph embeddings
US10394885B1 (en) Methods, systems and computer program products for generating personalized financial podcasts
Yan Financial Modeling using R
Kim Empirical evidence of faulty credit scoring and business failure in P2P lending
US20160371779A1 (en) System implementing smart beta factor deposition based on assets in existing portfolio
Ge et al. CSNCD: China Stock News Co-mention Dataset
Gunay et al. Extreme return connectedness between DeFi tokens and traditional financial markets: an entrepreneurial perspective
Li Essays in financial technology: banking efficiency and application of machine learning models in Supply Chain Finance and credit risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Wang Tiejun

Inventor before: Zhang Tiejun

SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170808