CN107122481A - News temperature real-time online Forecasting Methodology - Google Patents

News temperature real-time online Forecasting Methodology Download PDF

Info

Publication number
CN107122481A
CN107122481A CN201710308998.5A CN201710308998A CN107122481A CN 107122481 A CN107122481 A CN 107122481A CN 201710308998 A CN201710308998 A CN 201710308998A CN 107122481 A CN107122481 A CN 107122481A
Authority
CN
China
Prior art keywords
calorific value
word
hot
hot word
temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710308998.5A
Other languages
Chinese (zh)
Other versions
CN107122481B (en
Inventor
余军
卢品吟
刘盾
张汨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Hua Seiun Technology Co Ltd
Original Assignee
Chengdu Hua Seiun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hua Seiun Technology Co Ltd filed Critical Chengdu Hua Seiun Technology Co Ltd
Priority to CN201710308998.5A priority Critical patent/CN107122481B/en
Publication of CN107122481A publication Critical patent/CN107122481A/en
Application granted granted Critical
Publication of CN107122481B publication Critical patent/CN107122481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

Include two large divisions the invention discloses a kind of news temperature real-time online Forecasting Methodology, focus incident is analyzed to be predicted with modeling and latest news temperature, the calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, calorific value of the hot word in each event is added, the current calorific value of hot word and hot word pair is obtained;Constantly update the calorific value table of above-mentioned hot word and hot word pair;The vocabulary and word combination that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained carry out temperature marking, the calorific value of vocabulary and the calorific value of word combination are inquired about i.e. in calorific value table, the calorific value of identical vocabulary and word combination is added up, the calorific value of each vocabulary and word combination in Present News is obtained;All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the news temperature of prediction.The present invention can analyze much-talked-about topic comprehensively and the hot news that upgrades in time.

Description

News temperature real-time online Forecasting Methodology
Technical field
The present invention relates to Domestic News field, and in particular to a kind of news temperature real-time online Forecasting Methodology.
Background technology
With the fast development of Internet technology, network public-opinion increasingly influences the stable development of society, monitoring network carriage Feelings are the important steps that government maintains social stability.One of link, the prediction of hot news are monitored as public sentiment Seem particularly critical.Microblogging changes the propagation side of traditional news media information with its unique propagation characteristic and real-time interactive character Formula.The especially combination of microblogging and mobile terminal, enables micro-blog information to be more quickly forwarded or comment on, in microblog Substantial amounts of user comment and exchange of information can quick collecting be viewpoint, so as to form certain public opinion trend.Microblogging is natural Opening, real-time, interactivity, magnanimity and the easy property examined, constitute the basis of hot news prediction.Pass through comprehensive analysis news Volume judges the temperature of news if microblog.
Traditional public sentiment hot topic judged merely by hits, forwarding number, the comment data such as number, but this Much-talked-about topic Predicting Technique can not analyze the feature of much-talked-about topic comprehensively, and not prompt enough to the extraction of hot news.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of news temperature real-time online Forecasting Methodology, Much-talked-about topic can be analyzed comprehensively and the hot news that upgrades in time.
The purpose of the present invention is achieved through the following technical solutions:
A kind of news temperature real-time online Forecasting Methodology, including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various The related information of the focus incident;
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is bigger , score value is higher, no ceiling;
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature, and entry is analyzed Combine the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained Temperature marking is carried out with word combination, i.e., the calorific value of vocabulary and the calorific value of word combination are inquired about in calorific value table, by identical vocabulary Added up with the calorific value of word combination, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction Hear temperature.
Further, the network in described step S01 includes major news websites, microblogging, wechat, forum, mhkc, political affairs The difference channel such as mansion website includes the content of the article, microblogging, wechat of the keyword.
Further, the calculation formula of event hot value is Hotvalue=sum [count × k] in described step S02, Wherein count is represented(Public sentiment sum), k is that weight its value is 1 ~ 100.
Further, hot word analysis comprises the following steps in described step S03, it is necessary first to remove stop words, then Vocabulary is given a mark using the frequency of occurrences, word frequency refers to the number of times that the vocabulary occurs in all the elements, settles and finds out by score The vocabulary of temperature highest 20%.
The beneficial effects of the invention are as follows:The present invention is carried out by analyzing focus incident instantly to individual focus incident The combing event that scores temperature formation focus incident table, then gives a mark for focus incident table to the focus vocabulary of collection, and Corresponding hot value is obtained by a series of calculating, upgraded in time it is achieved thereby that analyzing focus incident instantly in real time comprehensively Hot news.
Embodiment
A kind of news temperature real-time online Forecasting Methodology, including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various The related information of the focus incident;Including major news websites, microblogging, wechat, forum, mhkc, the different channels such as government website Include the article of the keyword, microblogging, the content such as wechat.
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is got over Big, score value is higher, no ceiling;Different information sources, can manually set weights, more important information source, to commenting Divide the weights of influence higher, can constantly be adjusted according to business scenario, such as media event, from public affairs letter The high website of power, such as People's Net, the www.xinhuanet.com etc., higher weights can be set, for entertainment event, from star Big V micro-blog information, can give higher score value.
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;Hot word is analyzed, it is necessary first to Stop words is removed, then vocabulary is given a mark using the frequency of occurrences, word frequency refers to the number of times that the vocabulary occurs in all the elements, The vocabulary for finding out temperature highest 20% is settled by score, such as in certain sports news, the number of times that " football " occurs in the text For 10, then the word frequency value of vocabulary " football " is that " football " word frequency value is added in all news under 10, the special topic, in all words 20% is arranged in front in word frequency, then " football " is to change one of hot word under special topic.
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature is analyzed, and Entry combines the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Further, the calculation formula of event hot value is Hotvalue=sum [count × k] in described step S02, wherein Count is represented(Public sentiment sum), k is that weight its value is 1 ~ 100, such as People's Net, and the high weight website sources such as www.xinhuanet.com are new It is 100 to hear k values, microblogging etc. from media outlets domestic consumer news k values be 1.
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained Temperature marking is carried out with word combination, calculation formula is Hotvalue=sum [count × k], and wherein count is represented(Public sentiment is total Number), k is that weight its value is 1 ~ 100.The calorific value of vocabulary and the calorific value of word combination are inquired about i.e. in calorific value table, by same words Converge and the calorific value of word combination adds up, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction Hear temperature.
Described above is only the preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form, is not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and environment, and can be at this In the text contemplated scope, it is modified by the technology or knowledge of above-mentioned teaching or association area.And those skilled in the art are entered Capable change and change does not depart from the spirit and scope of the present invention, then all should appended claims of the present invention protection domain It is interior.

Claims (4)

1. a kind of news temperature real-time online Forecasting Methodology, it is characterised in that including following two large divisions:
Focus incident is analyzed with modeling, and is comprised the following steps:
S01:To the focus incident occurred, keyword is manually determined, based on the keyword manually determined, is crawled from network various The related information of the focus incident;
S02:Event hot value is assessed, the informational capacity crawled using network, and the temperature to event is given a mark, and informational capacity is bigger , score value is higher, no ceiling;
S03:Hot word analysis is carried out to the information crawled, 20% temperature highest vocabulary is found out;
S04:Focus incident modeling is carried out to known focus incident, contribution rate of the various entries to temperature, and entry is analyzed Combine the joint contribution rate to event temperature;
S05:Heat is calculated to the contribution rate of event using the contribution rate of event hot value and hot word to event, and hot word combination The calorific value of word, calculation formula is:The frequency sum of hot word calorific value=event calorific value × hot word frequency/all hot words;Hot word is to calorific value The frequency sum of=event calorific value × hot word to frequency/all hot words pair;
S06:The calorific value table that hot word and hot word in all events to be formed to hot word and hot word pair to being merged together, hot word is existed Calorific value in each event is added, and obtains the current calorific value of hot word and hot word pair;
S07:Constantly update the calorific value table of above-mentioned hot word and hot word pair;
Latest news temperature is predicted, is comprised the following steps:
S11:The information in various sources, including but not limited to news, microblogging, forum, the content of mhkc are gathered in real time;
S12:Participle is carried out to the above-mentioned information collected, removes stop words, the related vocabulary of news is obtained;
S13:The vocabulary that the calorific value table centering for analyzing hot word and hot word pair with being obtained in modeling procedure using focus incident is obtained Temperature marking is carried out with word combination, i.e., the calorific value of vocabulary and the calorific value of word combination are inquired about in calorific value table, by identical vocabulary Added up with the calorific value of word combination, obtain the calorific value of each vocabulary and word combination in Present News;
S14:All vocabulary in news are added with the temperature of word combination, news temperature is obtained, this temperature is the new of prediction Hear temperature.
2. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step Network in S01 includes major news websites, microblogging, wechat, forum, mhkc, and the different channels of government website include the pass The article of keyword, microblogging, the content of wechat.
3. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step The calculation formula of event hot value is Hotvalue=sum [count × k] in S02, and wherein count is represented(Public sentiment sum), k is Its value of weight is 1 ~ 100.
4. a kind of news temperature real-time online Forecasting Methodology according to claim 1, it is characterised in that:Described step Hot word analysis comprises the following steps in S03, it is necessary first to remove stop words, and then vocabulary is given a mark using the frequency of occurrences, Word frequency refers to the number of times that the vocabulary occurs in all the elements, and the vocabulary for finding out temperature highest 20% is settled by score.
CN201710308998.5A 2017-05-04 2017-05-04 Real-time online prediction method for news popularity Active CN107122481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710308998.5A CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710308998.5A CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Publications (2)

Publication Number Publication Date
CN107122481A true CN107122481A (en) 2017-09-01
CN107122481B CN107122481B (en) 2020-06-30

Family

ID=59726634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710308998.5A Active CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Country Status (1)

Country Link
CN (1) CN107122481B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344316A (en) * 2018-08-14 2019-02-15 优视科技(中国)有限公司 News temperature calculates method and device
CN109376231A (en) * 2018-09-29 2019-02-22 杭州凡闻科技有限公司 A kind of media hotspot tracking and system
CN109885656A (en) * 2019-02-18 2019-06-14 国家计算机网络与信息安全管理中心 Microblogging forwarding prediction technique and device based on quantization temperature
CN110457594A (en) * 2019-08-01 2019-11-15 深圳市顶尖传诚科技有限公司 A kind of hot spot of public opinions prediction technique based on big data
CN110750682A (en) * 2018-07-06 2020-02-04 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN112597280A (en) * 2020-12-28 2021-04-02 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news
CN113535956A (en) * 2021-07-26 2021-10-22 北京清博智能科技有限公司 News hotspot prediction method based on medium contribution degree
CN114938477A (en) * 2022-06-23 2022-08-23 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
US20130226560A1 (en) * 2010-02-05 2013-08-29 Jebu Ittiachen System and method for discovering story trends in real time from user generated content
CN104035960A (en) * 2014-05-08 2014-09-10 东莞市巨细信息科技有限公司 Internet information hotspot predicting method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
US20130226560A1 (en) * 2010-02-05 2013-08-29 Jebu Ittiachen System and method for discovering story trends in real time from user generated content
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
CN104035960A (en) * 2014-05-08 2014-09-10 东莞市巨细信息科技有限公司 Internet information hotspot predicting method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750682B (en) * 2018-07-06 2022-08-16 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN110750682A (en) * 2018-07-06 2020-02-04 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN109344316A (en) * 2018-08-14 2019-02-15 优视科技(中国)有限公司 News temperature calculates method and device
CN109376231A (en) * 2018-09-29 2019-02-22 杭州凡闻科技有限公司 A kind of media hotspot tracking and system
CN109885656A (en) * 2019-02-18 2019-06-14 国家计算机网络与信息安全管理中心 Microblogging forwarding prediction technique and device based on quantization temperature
CN109885656B (en) * 2019-02-18 2021-06-29 国家计算机网络与信息安全管理中心 Microblog forwarding prediction method and device based on quantification heat degree
CN110457594B (en) * 2019-08-01 2021-06-01 深圳市顶尖传诚科技有限公司 Big data-based public opinion hotspot prediction method
CN110457594A (en) * 2019-08-01 2019-11-15 深圳市顶尖传诚科技有限公司 A kind of hot spot of public opinions prediction technique based on big data
CN112597280A (en) * 2020-12-28 2021-04-02 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news
WO2022141803A1 (en) * 2020-12-28 2022-07-07 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news
CN113535956A (en) * 2021-07-26 2021-10-22 北京清博智能科技有限公司 News hotspot prediction method based on medium contribution degree
CN114938477A (en) * 2022-06-23 2022-08-23 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment
CN114938477B (en) * 2022-06-23 2024-05-03 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment

Also Published As

Publication number Publication date
CN107122481B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN107122481A (en) News temperature real-time online Forecasting Methodology
US11847612B2 (en) Social media profiling for one or more authors using one or more social media platforms
Raisi et al. Cyberbullying identification using participant-vocabulary consistency
CN106980692A (en) A kind of influence power computational methods based on microblogging particular event
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
CN105069099B (en) A kind of information recommendation method and system
US20160019659A1 (en) Predicting the business impact of tweet conversations
Subramanian The growth of global internet censorship and circumvention: A survey
Trabelsi et al. Mining social networks for software vulnerabilities monitoring
CN103577405A (en) Interest analysis based micro-blogger community classification method
Siddiqui et al. Bots and Gender Profiling on Twitter.
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
CN109033286B (en) Data statistical method and device
KR101326313B1 (en) Method of classifying emotion from multi sentence using context information
Cha et al. Flash floods and ripples: The spread of media content through the blogosphere
Asghar et al. Political miner: opinion extraction from user generated political reviews
CN107526782A (en) Demand supply matching process based on natural language analysis
CN110210927A (en) A kind of IT books recommender system design based on collaborative filtering
Ding et al. Click versus share: A feature-driven study of micro-video popularity and virality in social media
Liu et al. Big data for social media evaluation: a case of WeChat platform rankings in China
Ozawa et al. A sentiment polarity prediction model using transfer learning and its application to SNS flaming event detection
Trabelsi et al. Monitoring software vulnerabilities through social networks analysis
Virmani et al. HashMiner: Feature Characterisation and analysis of# Hashtag Hijacking using real-time neural network
JP5990474B2 (en) Abnormality detection apparatus, program, and method for detecting specific abnormality using posted text from unspecified number of users
Raina et al. Twitter sentiment analysis using apache storm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant