CN107122481A - News temperature real-time online Forecasting Methodology - Google Patents
News temperature real-time online Forecasting Methodology Download PDFInfo
- Publication number
- CN107122481A CN107122481A CN201710308998.5A CN201710308998A CN107122481A CN 107122481 A CN107122481 A CN 107122481A CN 201710308998 A CN201710308998 A CN 201710308998A CN 107122481 A CN107122481 A CN 107122481A
- Authority
- CN
- China
- Prior art keywords
- calorific value
- word
- hot
- hot word
- temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
Include two large divisions the invention discloses a kind of news temperature real-time online Forecasting Methodology, focus incident is analyzed to be predicted with modeling and latest news temperature, the calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, calorific value of the hot word in each event is added, the current calorific value of hot word and hot word pair is obtained;Constantly update the calorific value table of above-mentioned hot word and hot word pair;The vocabulary and word combination that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained carry out temperature marking, the calorific value of vocabulary and the calorific value of word combination are inquired about i.e. in calorific value table, the calorific value of identical vocabulary and word combination is added up, the calorific value of each vocabulary and word combination in Present News is obtained;All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the news temperature of prediction.The present invention can analyze much-talked-about topic comprehensively and the hot news that upgrades in time.
Description
Technical field
The present invention relates to Domestic News field, and in particular to a kind of news temperature real-time online Forecasting Methodology.
Background technology
With the fast development of Internet technology, network public-opinion increasingly influences the stable development of society, monitoring network carriage
Feelings are the important steps that government maintains social stability.One of link, the prediction of hot news are monitored as public sentiment
Seem particularly critical.Microblogging changes the propagation side of traditional news media information with its unique propagation characteristic and real-time interactive character
Formula.The especially combination of microblogging and mobile terminal, enables micro-blog information to be more quickly forwarded or comment on, in microblog
Substantial amounts of user comment and exchange of information can quick collecting be viewpoint, so as to form certain public opinion trend.Microblogging is natural
Opening, real-time, interactivity, magnanimity and the easy property examined, constitute the basis of hot news prediction.Pass through comprehensive analysis news
Volume judges the temperature of news if microblog.
Traditional public sentiment hot topic judged merely by hits, forwarding number, the comment data such as number, but this
Much-talked-about topic Predicting Technique can not analyze the feature of much-talked-about topic comprehensively, and not prompt enough to the extraction of hot news.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of news temperature real-time online Forecasting Methodology,
Much-talked-about topic can be analyzed comprehensively and the hot news that upgrades in time.
The purpose of the present invention is achieved through the following technical solutions:
A kind of news temperature real-time online Forecasting Methodology, including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various
The related information of the focus incident;
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is bigger
, score value is higher, no ceiling;
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature, and entry is analyzed
Combine the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination
The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value
The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed
Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained
Temperature marking is carried out with word combination, i.e., the calorific value of vocabulary and the calorific value of word combination are inquired about in calorific value table, by identical vocabulary
Added up with the calorific value of word combination, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction
Hear temperature.
Further, the network in described step S01 includes major news websites, microblogging, wechat, forum, mhkc, political affairs
The difference channel such as mansion website includes the content of the article, microblogging, wechat of the keyword.
Further, the calculation formula of event hot value is Hotvalue=sum [count × k] in described step S02,
Wherein count is represented(Public sentiment sum), k is that weight its value is 1 ~ 100.
Further, hot word analysis comprises the following steps in described step S03, it is necessary first to remove stop words, then
Vocabulary is given a mark using the frequency of occurrences, word frequency refers to the number of times that the vocabulary occurs in all the elements, settles and finds out by score
The vocabulary of temperature highest 20%.
The beneficial effects of the invention are as follows:The present invention is carried out by analyzing focus incident instantly to individual focus incident
The combing event that scores temperature formation focus incident table, then gives a mark for focus incident table to the focus vocabulary of collection, and
Corresponding hot value is obtained by a series of calculating, upgraded in time it is achieved thereby that analyzing focus incident instantly in real time comprehensively
Hot news.
Embodiment
A kind of news temperature real-time online Forecasting Methodology, including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various
The related information of the focus incident;Including major news websites, microblogging, wechat, forum, mhkc, the different channels such as government website
Include the article of the keyword, microblogging, the content such as wechat.
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is got over
Big, score value is higher, no ceiling;Different information sources, can manually set weights, more important information source, to commenting
Divide the weights of influence higher, can constantly be adjusted according to business scenario, such as media event, from public affairs letter
The high website of power, such as People's Net, the www.xinhuanet.com etc., higher weights can be set, for entertainment event, from star
Big V micro-blog information, can give higher score value.
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;Hot word is analyzed, it is necessary first to
Stop words is removed, then vocabulary is given a mark using the frequency of occurrences, word frequency refers to the number of times that the vocabulary occurs in all the elements,
The vocabulary for finding out temperature highest 20% is settled by score, such as in certain sports news, the number of times that " football " occurs in the text
For 10, then the word frequency value of vocabulary " football " is that " football " word frequency value is added in all news under 10, the special topic, in all words
20% is arranged in front in word frequency, then " football " is to change one of hot word under special topic.
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature is analyzed, and
Entry combines the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination
The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value
The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed
Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Further, the calculation formula of event hot value is Hotvalue=sum [count × k] in described step S02, wherein
Count is represented(Public sentiment sum), k is that weight its value is 1 ~ 100, such as People's Net, and the high weight website sources such as www.xinhuanet.com are new
It is 100 to hear k values, microblogging etc. from media outlets domestic consumer news k values be 1.
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained
Temperature marking is carried out with word combination, calculation formula is Hotvalue=sum [count × k], and wherein count is represented(Public sentiment is total
Number), k is that weight its value is 1 ~ 100.The calorific value of vocabulary and the calorific value of word combination are inquired about i.e. in calorific value table, by same words
Converge and the calorific value of word combination adds up, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction
Hear temperature.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein
Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this
In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered
Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain
It is interior.
Claims (4)
1. a kind of news temperature real-time online Forecasting Methodology, it is characterised in that including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various
The related information of the focus incident;
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is bigger
, score value is higher, no ceiling;
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature, and entry is analyzed
Combine the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination
The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value
The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed
Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained
Temperature marking is carried out with word combination, i.e., the calorific value of vocabulary and the calorific value of word combination are inquired about in calorific value table, by identical vocabulary
Added up with the calorific value of word combination, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction
Hear temperature.
2. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step
Network in S01 includes major news websites, microblogging, wechat, forum, mhkc, and the different channels of government website include the pass
The article of keyword, microblogging, the content of wechat.
3. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step
The calculation formula of event hot value is Hotvalue=sum [count × k] in S02, and wherein count is represented(Public sentiment sum), k is
Its value of weight is 1 ~ 100.
4. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step
Hot word analysis comprises the following steps in S03, it is necessary first to remove stop words, and then vocabulary is given a mark using the frequency of occurrences,
Word frequency refers to the number of times that the vocabulary occurs in all the elements, and the vocabulary for finding out temperature highest 20% is settled by score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710308998.5A CN107122481B (en) | 2017-05-04 | 2017-05-04 | Real-time online prediction method for news popularity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710308998.5A CN107122481B (en) | 2017-05-04 | 2017-05-04 | Real-time online prediction method for news popularity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107122481A true CN107122481A (en) | 2017-09-01 |
CN107122481B CN107122481B (en) | 2020-06-30 |
Family
ID=59726634
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710308998.5A Active CN107122481B (en) | 2017-05-04 | 2017-05-04 | Real-time online prediction method for news popularity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107122481B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344316A (en) * | 2018-08-14 | 2019-02-15 | 优视科技(中国)有限公司 | News temperature calculates method and device |
CN109376231A (en) * | 2018-09-29 | 2019-02-22 | 杭州凡闻科技有限公司 | A kind of media hotspot tracking and system |
CN109885656A (en) * | 2019-02-18 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | Microblogging forwarding prediction technique and device based on quantization temperature |
CN110457594A (en) * | 2019-08-01 | 2019-11-15 | 深圳市顶尖传诚科技有限公司 | A kind of hot spot of public opinions prediction technique based on big data |
CN110750682A (en) * | 2018-07-06 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Title hot word automatic metering method, storage medium, electronic equipment and system |
CN112597280A (en) * | 2020-12-28 | 2021-04-02 | 上海朝阳永续信息技术股份有限公司 | Method for automatically discovering hot keywords and hot news |
CN113535956A (en) * | 2021-07-26 | 2021-10-22 | 北京清博智能科技有限公司 | News hotspot prediction method based on medium contribution degree |
CN114938477A (en) * | 2022-06-23 | 2022-08-23 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
CN102945290A (en) * | 2012-12-03 | 2013-02-27 | 北京奇虎科技有限公司 | Hot microblog topic digging device and method |
US20130226560A1 (en) * | 2010-02-05 | 2013-08-29 | Jebu Ittiachen | System and method for discovering story trends in real time from user generated content |
CN104035960A (en) * | 2014-05-08 | 2014-09-10 | 东莞市巨细信息科技有限公司 | Internet information hotspot predicting method |
-
2017
- 2017-05-04 CN CN201710308998.5A patent/CN107122481B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
US20130226560A1 (en) * | 2010-02-05 | 2013-08-29 | Jebu Ittiachen | System and method for discovering story trends in real time from user generated content |
CN102945290A (en) * | 2012-12-03 | 2013-02-27 | 北京奇虎科技有限公司 | Hot microblog topic digging device and method |
CN104035960A (en) * | 2014-05-08 | 2014-09-10 | 东莞市巨细信息科技有限公司 | Internet information hotspot predicting method |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750682B (en) * | 2018-07-06 | 2022-08-16 | 武汉斗鱼网络科技有限公司 | Title hot word automatic metering method, storage medium, electronic equipment and system |
CN110750682A (en) * | 2018-07-06 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Title hot word automatic metering method, storage medium, electronic equipment and system |
CN109344316A (en) * | 2018-08-14 | 2019-02-15 | 优视科技(中国)有限公司 | News temperature calculates method and device |
CN109376231A (en) * | 2018-09-29 | 2019-02-22 | 杭州凡闻科技有限公司 | A kind of media hotspot tracking and system |
CN109885656A (en) * | 2019-02-18 | 2019-06-14 | 国家计算机网络与信息安全管理中心 | Microblogging forwarding prediction technique and device based on quantization temperature |
CN109885656B (en) * | 2019-02-18 | 2021-06-29 | 国家计算机网络与信息安全管理中心 | Microblog forwarding prediction method and device based on quantification heat degree |
CN110457594B (en) * | 2019-08-01 | 2021-06-01 | 深圳市顶尖传诚科技有限公司 | Big data-based public opinion hotspot prediction method |
CN110457594A (en) * | 2019-08-01 | 2019-11-15 | 深圳市顶尖传诚科技有限公司 | A kind of hot spot of public opinions prediction technique based on big data |
CN112597280A (en) * | 2020-12-28 | 2021-04-02 | 上海朝阳永续信息技术股份有限公司 | Method for automatically discovering hot keywords and hot news |
WO2022141803A1 (en) * | 2020-12-28 | 2022-07-07 | 上海朝阳永续信息技术股份有限公司 | Method for automatically discovering hot keywords and hot news |
CN113535956A (en) * | 2021-07-26 | 2021-10-22 | 北京清博智能科技有限公司 | News hotspot prediction method based on medium contribution degree |
CN114938477A (en) * | 2022-06-23 | 2022-08-23 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
CN114938477B (en) * | 2022-06-23 | 2024-05-03 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107122481B (en) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122481A (en) | News temperature real-time online Forecasting Methodology | |
US11847612B2 (en) | Social media profiling for one or more authors using one or more social media platforms | |
Raisi et al. | Cyberbullying identification using participant-vocabulary consistency | |
CN106980692A (en) | A kind of influence power computational methods based on microblogging particular event | |
US8965867B2 (en) | Measuring and altering topic influence on edited and unedited media | |
CN105069099B (en) | A kind of information recommendation method and system | |
US20160019659A1 (en) | Predicting the business impact of tweet conversations | |
Subramanian | The growth of global internet censorship and circumvention: A survey | |
Trabelsi et al. | Mining social networks for software vulnerabilities monitoring | |
CN103577405A (en) | Interest analysis based micro-blogger community classification method | |
Siddiqui et al. | Bots and Gender Profiling on Twitter. | |
CN104899335A (en) | Method for performing sentiment classification on network public sentiment of information | |
CN109033286B (en) | Data statistical method and device | |
KR101326313B1 (en) | Method of classifying emotion from multi sentence using context information | |
Cha et al. | Flash floods and ripples: The spread of media content through the blogosphere | |
Asghar et al. | Political miner: opinion extraction from user generated political reviews | |
CN107526782A (en) | Demand supply matching process based on natural language analysis | |
CN110210927A (en) | A kind of IT books recommender system design based on collaborative filtering | |
Ding et al. | Click versus share: A feature-driven study of micro-video popularity and virality in social media | |
Liu et al. | Big data for social media evaluation: a case of WeChat platform rankings in China | |
Ozawa et al. | A sentiment polarity prediction model using transfer learning and its application to SNS flaming event detection | |
Trabelsi et al. | Monitoring software vulnerabilities through social networks analysis | |
Virmani et al. | HashMiner: Feature Characterisation and analysis of# Hashtag Hijacking using real-time neural network | |
JP5990474B2 (en) | Abnormality detection apparatus, program, and method for detecting specific abnormality using posted text from unspecified number of users | |
Raina et al. | Twitter sentiment analysis using apache storm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |