CN107025264A - A kind of automatic share-selecting method based on news big data - Google Patents
A kind of automatic share-selecting method based on news big data Download PDFInfo
- Publication number
- CN107025264A CN107025264A CN201710076418.4A CN201710076418A CN107025264A CN 107025264 A CN107025264 A CN 107025264A CN 201710076418 A CN201710076418 A CN 201710076418A CN 107025264 A CN107025264 A CN 107025264A
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- industry
- news
- emotion
- news content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of automatic share-selecting method based on news big data.Adopt the technical scheme that:Emotion vocabulary and vocabulary are stored in memory first, above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Obtain internet financial and economic news in real time by RSS, updated once per half an hour;The newsletter archive content on the same day is parsed by server, the text analyzing of news content, including two subdivisions, 1)The emotion dimensional analysis of news content, calculates the Sentiment orientation for obtaining news content;2)The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment.Stock ranking is calculated using Sentiment orientation, industry attention rate, selects some stocks in the top to be used as investee.The present invention by multi-angle by that to having the vocabulary of emotion dimension index and industry attention rate index to carry out filtering statistical in news, can be excavated and obtain more information that influence is produced on stock price.
Description
Technical field
Specifically it is exactly a kind of automatic share-selecting method based on news big data the present invention relates to information retrieval field.
Background technology
Now, quantify to make an investment in management assets and play more and more important effect, everybody starts with computer equipment
To historical stock amount valence mumber according to as data resource, verified by mathematical modeling, for carrying out quantization investment to stock.It is existing
There is scheme to pay close attention to personal share characteristic analysis, do not selected stocks automatically from industry attention rate angle.Because directive propaganda by mass media also can simultaneously
There is certain forewarning function to the trend of stock certificate data(" emotion that the present invention excavates news using glossary statistic analytic approach is inclined
To completion is selected stocks automatically ").Briefly, the Sentiment orientation of news content can be divided into two kinds of situations:Positive emotion and negative feelings
Sense, positive emotion uses the positive emotion vocabulary ratio in news content to represent, negative emotion uses negative in news content
Vocabulary ratio is represented.How the Sentiment orientation embodied using news content, make its serve quantization investment, be that the present invention will be solved
Technical problem certainly.
The content of the invention
It is an object of the invention to provide a kind of automatic share-selecting method based on news big data, first by emotion vocabulary and row
Industry vocabulary is stored in memory, and above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Obtained in real time by RSS
Internet financial and economic news is taken, is updated per hour once;The news content on the same day is passed through into the news on the day of server analytical analysis
Content, news content analysis includes two subdivisions, 1)The emotion dimensional analysis of news content, calculates the feelings for obtaining news content
Sense tendency;2)The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment;Inclined using emotion
Stock ranking is calculated to, industry attention rate, selects stock in the top to be used as investee.
News content is resolved to the set of vocabulary, i.e.,, t represents vocabulary total number), wherein wrapping
Include r positive emotion vocabulary, s negative emotion vocabulary, in i-th day, front vocabulary ratio
For, the ratio represents the positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative feelings of news
Sense;
In i-th day, industry x attention rate is, computational methods are, wherein, y represents industry x phases in news content
Vocabulary number is closed, t is total vocabulary number;
In i-th day, set industry x positive attention rate as,= ×;Set industry x negative attention rate
For,= × ;
Within past one month, setting industry x accumulative temperature(Attention rate)For, = ,
Wherein { i=1 ..., m }, m is of that month number of days;
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x=1 ...,
24 }, totally 24 industries;FoundationThe numerical value of { x=1 ..., 24 } from high to low, completes the sequence of 24 industries;This method is selected
The whole corporate shares for selecting the industry ranked the first are used as the investee of next month.
The theoretical foundation of the present invention is to be based on analysis below:Emotion vocabulary has psychological significance, and front vocabulary is represented
Positive psychology is implied;Negative vocabulary represents mankind's passive attitude hint;For example, the vocabulary such as " limit-up, good, good harvest " is embodied
Positive attitude in news content, and the vocabulary such as " limit down, weakness, dispirited " embodies the passive attitude in news content.When new
In news during negative vocabulary ratio increase, market presents the expection of pessimistic passiveness, the increase of stock market's downside risks.Industry vocabulary has
Stronger industry directive property, for example, " non-performing loan " is generally directed to the listed company of banking, " passenger car " is generally directed to automobile
The listed company of industry.When the ratio increase of the industry vocabulary of certain in news, market focus turns to the sector, the sector it is upper
Company of city will be paid close attention to by more investors.
The present invention is selected stocks by emotion dimension, the industry dimension of news big data;Existing scheme pays close attention to personal share characteristic analysis,
Do not selected stocks automatically from industry temperature angle.This programme by vocabulary association confirm news content embody Sentiment orientation and
Industry attention rate, is to innovation of the prior art.News big data is selected stocks advantage automatically:1)The emotion of news(Positive negative emotion
Intensity)The theoretical foundation of interaction relation is confirmed extensively between stock market, listed company.2)Automatically extract Sentiment orientation
With industry attention rate, full-automatic ranking screens stock.
Brief description of the drawings
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the flow chart that the present invention performs algorithm.
Embodiment
Below, with reference to the main flow and execution algorithm flow chart of the present invention, the present invention is described in further detail.
Keyword:Emotion vocabulary, industry vocabulary.Emotion vocabulary refers to the vocabulary in emotion word lists, including
Two parts of front vocabulary and negative vocabulary.What industry vocabulary was obtained after being collected for the common keyword of each industry
Vocabulary.Above-mentioned emotion vocabulary and industry vocabulary are derived from《Special Chinese finance and economics dictionary》, applicant, which has compiled, to bind into book form.
For example, front vocabulary is included:It is successful, outstanding, richly endowed by nature, lead, improve, innovating.Negative vocabulary bag
Include failure, loss, deficiency, difference comment, recall, depression etc..
Industry vocabulary, for example, in banking, common keyword is interest, loan, the Banking Supervision Commission, Central Bank, interest rate,
Credit etc..In real estate industry, common the keyword purchase of property, first suite, house, plot, the commercial house, real estate market etc..
Industry company, the company that represents for referring to banking is Minsheng bank, China Merchants Bank, Nanjing bank, safety bank etc..
The enterprise that represents of real estate industry is Wanke A, Poly real estate, China's happiness, COUNTRY GARDEN etc..
This process monitors open press source by RSS and obtained, for example, People's Net RSS, www.xinhuanet.com RSS etc..In order to ensure reality
Shi Xing, this method each hour updates once to news.
It is assumed that daily(To 24 points of that night 0 point on the day of the Beijing time, of that month last day is at 0 point and started to that night
23 points, similarly hereinafter)News content be made up of t Chinese vocabulary, including r positive emotion vocabulary, s negative emotion vocabulary.In i-th day, front vocabulary ratio is, ratio representative
The positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative emotion of news.
The industry dimensional analysis of news
According to《Shenyin & Wanguo's professional museum》(2014)28 one-level categorys of employment, the industry dimension of this patent also has
28, one industry of each dimension correspondence.This method is each industry setting " industry temperature ", and industry temperature represents news to spy
Determine the degree of concern of industry.It is assumed that industry x temperature is in i-th day, computational methods are, wherein, y represents industry
X relative words numbers, t is total vocabulary number.It is higher, show that news is more to industry x Reporting, industry x heat
Degree is higher.If i-th day expert x relative words number is 0,=0。
Stock ranking is calculated
This method in every month last day evening 23:00 pair of stock carries out ranking, calculates of that month daily industry temperature, while calculating of that month daily news positive emotionAnd negative emotion, wherein { i=1 ... .m }, m is this month
Number of days.
1)The positive temperature of industry and the negative temperature of industry
In i-th day, set industry x positive temperature as,= × .Similarly, in i-th day, setting row
Industry x negative temperature is,= ×。
2)The month to date temperature of industry
Within past one month, set industry x accumulative temperature as, = , wherein i=
1 ... .m }, m is of that month number of days.
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x=
1 ... 28 }, totally 28 industries.FoundationThe numerical value of { x=1 ... 28 } from high to low, completes the sequence of 28 industries.It is assumed that industry
x1Include y1Individual company, industry x2Include y2Individual company, industry x3Include y3Individual company, the industry that this method selection ranks the first
Whole corporate shares are used as investee.
Claims (2)
1. a kind of automatic share-selecting method based on news big data, it is characterised in that:Emotion vocabulary and industry vocabulary are deposited first
Enter memory, above-mentioned emotion vocabulary and industry vocabulary derive from Special Chinese finance and economics dictionary;Internet is obtained by RSS in real time
Financial and economic news, updates once per hour;The news content on the same day is passed through into the news content on the day of server analytical analysis, news
Content analysis includes two subdivisions, 1)The emotion dimensional analysis of news content, calculates the Sentiment orientation for obtaining news content;2)
The industry dimensional analysis of news content, calculates the industry attention rate for obtaining news content embodiment;Paid close attention to using Sentiment orientation, industry
Degree calculates stock ranking, selects stock in the top to be used as investee.
2. the automatic share-selecting method according to claim 1 based on news big data, it is characterised in that:By news content solution
The set for vocabulary is analysed, i.e.,, t represents vocabulary total number), including r positive emotion vocabulary, s negative emotion vocabulary, in i-th day, front vocabulary ratio is, ratio representative
The positive emotion of news;Negatively vocabulary ratio is, the ratio represents the negative emotion of news;
In i-th day, industry x attention rate is, computational methods are, wherein, y represents industry x phases in news content
Vocabulary number is closed, t is total vocabulary number;
In i-th day, set industry x positive attention rate as,= × ;Set industry x negative concern
Spend and be,= ×;
Within past one month, setting industry x accumulative temperature(Attention rate)For, = ,
Wherein { i=1 ..., m }, m is of that month number of days;
Last day evening 23 of every month:After 00, the month to date temperature of this month all industries is calculated, wherein x=1 ...,
24 }, totally 24 industries;FoundationThe numerical value of { x=1 ..., 24 } from high to low, completes the sequence of 24 industries;This method is selected
The whole corporate shares for selecting the industry ranked the first are used as the investee of next month.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076418.4A CN107025264A (en) | 2017-02-13 | 2017-02-13 | A kind of automatic share-selecting method based on news big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076418.4A CN107025264A (en) | 2017-02-13 | 2017-02-13 | A kind of automatic share-selecting method based on news big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107025264A true CN107025264A (en) | 2017-08-08 |
Family
ID=59526166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710076418.4A Pending CN107025264A (en) | 2017-02-13 | 2017-02-13 | A kind of automatic share-selecting method based on news big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107025264A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213934A (en) * | 2018-08-23 | 2019-01-15 | 阿里巴巴集团控股有限公司 | A kind of processing method of resource, device and equipment |
CN110889024A (en) * | 2019-10-25 | 2020-03-17 | 武汉灯塔之光科技有限公司 | Method and device for calculating information-related stock |
CN111241399A (en) * | 2020-01-10 | 2020-06-05 | 杜长江 | Method for evaluating attention of listed companies |
CN112862617A (en) * | 2019-11-27 | 2021-05-28 | 泰康保险集团股份有限公司 | Data processing method, system, storage medium and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226554A (en) * | 2012-12-14 | 2013-07-31 | 西藏同信证券有限责任公司 | Automatic stock matching and classifying method and system based on news data |
CN103778215A (en) * | 2014-01-17 | 2014-05-07 | 北京理工大学 | Stock market forecasting method based on sentiment analysis and hidden Markov fusion model |
CN105022825A (en) * | 2015-07-22 | 2015-11-04 | 中国人民解放军国防科学技术大学 | Financial variety price prediction method capable of combining financial news mining and financial historical data |
CN105740353A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Calculation method and system for relevance degree of individual share and article |
CN105786962A (en) * | 2016-01-15 | 2016-07-20 | 优品财富管理有限公司 | Big data index analysis method and system based on news transmissibility |
CN106384166A (en) * | 2016-09-12 | 2017-02-08 | 中山大学 | Deep learning stock market prediction method combined with financial news |
-
2017
- 2017-02-13 CN CN201710076418.4A patent/CN107025264A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226554A (en) * | 2012-12-14 | 2013-07-31 | 西藏同信证券有限责任公司 | Automatic stock matching and classifying method and system based on news data |
CN103778215A (en) * | 2014-01-17 | 2014-05-07 | 北京理工大学 | Stock market forecasting method based on sentiment analysis and hidden Markov fusion model |
CN105022825A (en) * | 2015-07-22 | 2015-11-04 | 中国人民解放军国防科学技术大学 | Financial variety price prediction method capable of combining financial news mining and financial historical data |
CN105786962A (en) * | 2016-01-15 | 2016-07-20 | 优品财富管理有限公司 | Big data index analysis method and system based on news transmissibility |
CN105740353A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Calculation method and system for relevance degree of individual share and article |
CN106384166A (en) * | 2016-09-12 | 2017-02-08 | 中山大学 | Deep learning stock market prediction method combined with financial news |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213934A (en) * | 2018-08-23 | 2019-01-15 | 阿里巴巴集团控股有限公司 | A kind of processing method of resource, device and equipment |
CN110889024A (en) * | 2019-10-25 | 2020-03-17 | 武汉灯塔之光科技有限公司 | Method and device for calculating information-related stock |
CN112862617A (en) * | 2019-11-27 | 2021-05-28 | 泰康保险集团股份有限公司 | Data processing method, system, storage medium and electronic device |
CN111241399A (en) * | 2020-01-10 | 2020-06-05 | 杜长江 | Method for evaluating attention of listed companies |
CN111241399B (en) * | 2020-01-10 | 2023-07-04 | 杜长江 | Evaluation method for attention of marketing company |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kravchenko et al. | The digitalization as a global trend and growth factor of the modern economy | |
Magri | Debt maturity choice of nonpublic Italian firms | |
CN109767318A (en) | Loan product recommended method, device, equipment and storage medium | |
Gallagher et al. | Standardizing sustainable development: A comparison of development banks in the Americas | |
CN110428322A (en) | A kind of adaptation method and device of business datum | |
US20090254469A1 (en) | System for Cash, Expense And Withdrawal Allocation Across Assets and Liabilities to Maximize Net Worth Over a Specified Period | |
CN107025264A (en) | A kind of automatic share-selecting method based on news big data | |
Hanh | Does WTO accession matter for the dynamics of foreign direct investment and trade? Vietnam’s new evidence 1 | |
Chun-Hao et al. | A bibliometric study of financial risk literature: a historic approach | |
Ashton et al. | Remaking mortgage markets by remaking mortgages: US housing finance after the crisis | |
US20210295434A1 (en) | Platform for research, analysis, and communications compliance of investment data | |
Norisnita et al. | Application of theory of planned behavior (TPB) in cryptocurrency investment prediction: A literature review | |
Tröster et al. | Delivering on promises? The expected impacts and implementation challenges of the economic partnership agreements between the European Union and Africa | |
Yudowati et al. | Big data framework for auditing process | |
US8626658B1 (en) | Methods, systems and apparatus for providing a dynamic account list in an online financial services system | |
Avgouleas et al. | The architecture of decentralised finance platforms: a new open finance paradigm | |
Ray | Who controls multilateral development finance? | |
Miori et al. | Clustering Uniswap v3 traders from their activity on multiple liquidity pools, via novel graph embeddings | |
US10394885B1 (en) | Methods, systems and computer program products for generating personalized financial podcasts | |
Yan | Financial Modeling using R | |
Kim | Empirical evidence of faulty credit scoring and business failure in P2P lending | |
US20160371779A1 (en) | System implementing smart beta factor deposition based on assets in existing portfolio | |
Ge et al. | CSNCD: China Stock News Co-mention Dataset | |
Gunay et al. | Extreme return connectedness between DeFi tokens and traditional financial markets: an entrepreneurial perspective | |
Li | Essays in financial technology: banking efficiency and application of machine learning models in Supply Chain Finance and credit risk assessment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Wang Tiejun Inventor before: Zhang Tiejun |
|
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170808 |