CN107122481B - Real-time online prediction method for news popularity - Google Patents

Real-time online prediction method for news popularity Download PDF

Info

Publication number
CN107122481B
CN107122481B CN201710308998.5A CN201710308998A CN107122481B CN 107122481 B CN107122481 B CN 107122481B CN 201710308998 A CN201710308998 A CN 201710308998A CN 107122481 B CN107122481 B CN 107122481B
Authority
CN
China
Prior art keywords
hot
news
heat
vocabulary
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710308998.5A
Other languages
Chinese (zh)
Other versions
CN107122481A (en
Inventor
余军
卢品吟
刘盾
张汨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Chinamcloud Technology Co ltd
Original Assignee
Chengdu Chinamcloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Chinamcloud Technology Co ltd filed Critical Chengdu Chinamcloud Technology Co ltd
Priority to CN201710308998.5A priority Critical patent/CN107122481B/en
Publication of CN107122481A publication Critical patent/CN107122481A/en
Application granted granted Critical
Publication of CN107122481B publication Critical patent/CN107122481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a news popularity real-time online prediction method which comprises two parts, wherein hot event analysis and modeling and latest news popularity prediction are carried out, hot word pairs and hot word pairs in all events are combined together to form a hot value table of the hot words and the hot word pairs, and the heat values of the hot words in all events are added to obtain the current heat values of the hot words and the hot word pairs; continuously updating the heat value table of the hot words and the hot word pairs; heat degree scoring is carried out on the vocabulary and vocabulary combinations obtained in the heat value table pairs of the hot words and the hot word pairs obtained in the hot event analysis and modeling step, namely, the heat value of the vocabulary and the heat value of the vocabulary combinations are inquired in the heat value table, and the heat values of the same vocabulary and the vocabulary combinations are accumulated to obtain the heat value of each vocabulary and the vocabulary combinations in the current news; and adding the popularity of all the words and the word combinations in the news to obtain the news popularity, wherein the popularity is the predicted news popularity. The method and the system can comprehensively analyze the hot topics and update hot news in time.

Description

Real-time online prediction method for news popularity
Technical Field
The invention relates to the field of news information, in particular to a real-time online prediction method for news popularity.
Background
With the rapid development of internet technology, online public sentiment increasingly affects the stable development of society, and monitoring of online public sentiment is an important link for government to maintain social stability. As a link in public opinion monitoring, prediction of hot news appears to be particularly critical. The microblog changes the spreading mode of the traditional news information by the unique spreading characteristic and the real-time interaction characteristic. Particularly, the combination of the microblog and the mobile terminal enables the microblog information to be forwarded or commented more quickly, and a large amount of user comments and communication information on a microblog platform can be collected quickly as viewpoints, so that a certain public opinion trend is formed. The natural openness, real-time performance, interactivity, mass performance and easiness in detection of the microblog form the basis of hot news prediction. And judging the popularity of the news by comprehensively analyzing the topic quantity of the news on the microblog platform.
The traditional public opinion hot topics are only judged through data such as click number, forwarding number, comment number and the like, but the hot topic prediction technology cannot comprehensively analyze the characteristics of the hot topics, and the hot news is not extracted timely.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a news popularity real-time online prediction method which can comprehensively analyze hot topics and update hot news in time.
The purpose of the invention is realized by the following technical scheme:
a news popularity real-time online prediction method comprises the following two parts:
the hotspot event analysis and modeling method comprises the following steps:
s01: manually determining keywords for the occurred hot events, and crawling various information related to the hot events from the network based on the manually determined keywords;
s02: evaluating the event heat value, namely scoring the heat of the event by using the total information amount crawled by the network, wherein the larger the total information amount is, the higher the score is, and the top is not covered;
s03: carrying out hot word analysis on the crawled information, and finding out a vocabulary with the highest 20% of heat;
s04: carrying out hot event modeling on known hot events, and analyzing the contribution rate of various entries to the heat degree and the joint contribution rate of entry combinations to the event heat degree;
s05, calculating the heat value of the hot word by utilizing the event heat value, the contribution rate of the hot word to the event and the contribution rate of the hot word combination to the event, wherein the calculation formula is that the heat value of the hot word = event heat value × hot word frequency/sum of frequencies of all hot words;
s06: combining the hot words and the hot word pairs in all the events together to form a hot value table of the hot words and the hot word pairs, and adding the heat values of the hot words in all the events to obtain the current heat values of the hot words and the hot word pairs;
s07: continuously updating the heat value table of the hot words and the hot word pairs;
the latest news popularity prediction method comprises the following steps:
s11: collecting information of various sources in real time, including but not limited to news, microblogs, forums and content of posts;
s12: segmenting the acquired information, and removing stop words to obtain related words of news;
s13: heat degree scoring is carried out on the vocabulary and vocabulary combinations obtained in the heat value table pairs of the hot words and the hot word pairs obtained in the hot event analysis and modeling step, namely, the heat value of the vocabulary and the heat value of the vocabulary combinations are inquired in the heat value table, and the heat values of the same vocabulary and the vocabulary combinations are accumulated to obtain the heat value of each vocabulary and the vocabulary combinations in the current news;
s14: and adding the popularity of all the words and the word combinations in the news to obtain the news popularity, wherein the popularity is the predicted news popularity.
Further, the network in step S01 includes the content of articles, microblogs, and wechat that contain the keyword in different channels, such as news websites, microblogs, wechat, forums, posts, government websites, and the like.
Further, the calculation formula of the event heat value in step S02 is Hotvalue = sum [ count × k ], where count represents (total number of public sentiments), and k is a weight whose value is 1-100.
Further, the analysis of the vocabulary in step S03 includes the steps of removing stop words, scoring the vocabulary by the frequency of occurrence, and finding out the vocabulary with the highest heat degree of 20% according to the score.
The invention has the beneficial effects that: according to the invention, the current hot event is analyzed, the score is carried out on each hot event, the event heat is sorted to form the hot event list, then the collected hot words are scored according to the hot event list, and the corresponding heat value is obtained through a series of calculations, so that the current hot event is analyzed comprehensively in real time, and the news hot spot is updated in time.
Detailed Description
A news popularity real-time online prediction method comprises the following two parts:
the hotspot event analysis and modeling method comprises the following steps:
s01: manually determining keywords for the occurred hot events, and crawling various information related to the hot events from the network based on the manually determined keywords; the method comprises the contents of articles, microblogs, WeChat and the like containing the keyword in different channels such as news websites, microblogs, WeChat, forums, post bars, government websites and the like.
S02: evaluating the event heat value, namely scoring the heat of the event by using the total information amount crawled by the network, wherein the larger the total information amount is, the higher the score is, and the top is not covered; different information sources can be manually set with weights, the more important information sources are, the higher the weights of scoring influence is, the higher the weights can be continuously adjusted according to service scenes, for example, for news events, the more important information sources are from websites with high public confidence, such as people's network, newcastle network and the like, the higher weights can be set, and for entertainment events, the more important values can be given by microblog information from star-big-V.
S03: carrying out hot word analysis on the crawled information, and finding out a vocabulary with the highest 20% of heat; the hot word analysis firstly needs to remove stop words, then scores the words by adopting the occurrence frequency, the word frequency refers to the frequency of the occurrence of the words in all contents, and finds out the words with the highest heat degree by scoring, for example, in a certain sports news, the frequency of the occurrence of a football in the text is 10, the frequency of the word of the football in all news under the special topic is added, the words of the football in all news under the special topic are arranged at the top 20%, and the football is one of the hot words under the special topic.
S04: carrying out hot event modeling on known hot events, and analyzing the contribution rate of various entries to the heat degree and the joint contribution rate of entry combinations to the event heat degree;
s05, calculating the heat value of the hot word by utilizing the event heat value, the contribution rate of the hot word to the event and the contribution rate of the hot word combination to the event, wherein the calculation formula is that the heat value of the hot word = event heat value × hot word frequency/sum of frequencies of all hot words;
s06: combining the hot words and the hot word pairs in all the events together to form a hot value table of the hot words and the hot word pairs, and adding the heat values of the hot words in all the events to obtain the current heat values of the hot words and the hot word pairs;
s07: continuously updating the heat value table of the hot words and the hot word pairs;
further, the calculation formula of the event heat value in step S02 is Hotvalue = sum [ count × k ], where count represents (total number of public sentiments), k is a weight whose value is 1-100, e.g., the k value of news from high-weight websites such as people' S web, newcastle, etc. is 100, and the k value of news from self-media channels such as microblogs, etc. is 1.
The latest news popularity prediction method comprises the following steps:
s11: collecting information of various sources in real time, including but not limited to news, microblogs, forums and content of posts;
s12: segmenting the acquired information, and removing stop words to obtain related words of news;
s13, performing heat degree scoring on the vocabulary and vocabulary combinations obtained from the heat value table pairs of the hot words and the hot word pairs obtained in the hot event analysis and modeling step, wherein the calculation formula is Hotvalue = sum [ count × k ], the count represents (total number of public sentiments), k is weight, and the value of the weight is 1-100. namely, the heat value of the vocabulary and the heat value of the vocabulary combinations are inquired in the heat value table, and the heat values of the same vocabulary and the vocabulary combinations are accumulated to obtain the heat value of each vocabulary and the vocabulary combinations in the current news;
s14: and adding the popularity of all the words and the word combinations in the news to obtain the news popularity, wherein the popularity is the predicted news popularity.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. A news popularity real-time online prediction method is characterized by comprising the following two parts:
the hotspot event analysis and modeling method comprises the following steps:
s01: manually determining keywords for the occurred hot events, and crawling various information related to the hot events from the network based on the manually determined keywords;
s02: evaluating the event heat value, namely scoring the heat of the event by using the total information amount crawled by the network, wherein the larger the total information amount is, the higher the score is, and the top is not covered;
s03: carrying out hot word analysis on the crawled information, and finding out a vocabulary with the highest 20% of heat;
s04: carrying out hot event modeling on known hot events, and analyzing the contribution rate of various terms to the event heat and the joint contribution rate of term combinations to the event heat;
s05, calculating the heat value of the hot word by utilizing the event heat value, the contribution rate of the hot word to the event and the contribution rate of the hot word combination to the event, wherein the calculation formula is that the heat value of the hot word = the event heat value × hot word frequency/the sum of the frequencies of all the hot words;
s06: combining the hot words and the hot word pairs in all the events together to form a hot value table of the hot words and the hot word pairs, and adding the heat values of the hot words in all the events to obtain the current heat values of the hot words and the hot word pairs;
s07: continuously updating the heat value table of the hot words and the hot word pairs;
the latest news popularity prediction method comprises the following steps:
s11: collecting information of various sources in real time, including but not limited to news, microblogs, forums and content of posts;
s12: performing word segmentation on the acquired information, and removing stop words to obtain words and word combinations related to the information content;
s13: performing heat degree scoring on a heat value table of the hot words and the hot word pairs obtained in the hot event analysis and modeling step and the obtained vocabularies and vocabulary combinations, namely inquiring the heat values of the vocabularies and the vocabulary combinations in the heat value table, and accumulating the heat values of the same vocabularies and the vocabulary combinations to obtain the heat value of each vocabulary and the vocabulary combination in the current news;
s14: and adding the popularity of all the words and the word combinations in the news to obtain the news popularity, wherein the popularity is the predicted news popularity.
2. The real-time online news popularity prediction method according to claim 1, wherein: the network in step S01 includes the content of articles, microblogs, and wechat that contain the keyword in different channels of news websites, microblogs, wechat, forums, post bars, and government websites.
3. The method for real-time online prediction of news popularity according to claim 1, wherein the formula for calculating the event popularity value in step S02 is Hotvalue = sum [ count × k ], where count represents the total number of public sentiments, and k is a weight whose value is 1-100.
4. The real-time online news popularity prediction method according to claim 1, wherein: the analysis of the vocabulary in step S03 includes the following steps, first removing stop words, then scoring the vocabulary by using the frequency of occurrence, where the frequency of occurrence refers to the number of times the vocabulary appears in all contents, and finding out the vocabulary with the highest degree of popularity of 20% according to the score.
CN201710308998.5A 2017-05-04 2017-05-04 Real-time online prediction method for news popularity Active CN107122481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710308998.5A CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710308998.5A CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Publications (2)

Publication Number Publication Date
CN107122481A CN107122481A (en) 2017-09-01
CN107122481B true CN107122481B (en) 2020-06-30

Family

ID=59726634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710308998.5A Active CN107122481B (en) 2017-05-04 2017-05-04 Real-time online prediction method for news popularity

Country Status (1)

Country Link
CN (1) CN107122481B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750682B (en) * 2018-07-06 2022-08-16 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN109344316B (en) * 2018-08-14 2022-04-29 阿里巴巴(中国)有限公司 News popularity calculation method and device
CN109376231A (en) * 2018-09-29 2019-02-22 杭州凡闻科技有限公司 A kind of media hotspot tracking and system
CN109885656B (en) * 2019-02-18 2021-06-29 国家计算机网络与信息安全管理中心 Microblog forwarding prediction method and device based on quantification heat degree
CN110457594B (en) * 2019-08-01 2021-06-01 深圳市顶尖传诚科技有限公司 Big data-based public opinion hotspot prediction method
CN112597280A (en) * 2020-12-28 2021-04-02 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news
CN113535956A (en) * 2021-07-26 2021-10-22 北京清博智能科技有限公司 News hotspot prediction method based on medium contribution degree
CN114938477B (en) * 2022-06-23 2024-05-03 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
CN104035960A (en) * 2014-05-08 2014-09-10 东莞市巨细信息科技有限公司 Internet information hotspot predicting method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8429170B2 (en) * 2010-02-05 2013-04-23 Yahoo! Inc. System and method for discovering story trends in real time from user generated content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
CN102945290A (en) * 2012-12-03 2013-02-27 北京奇虎科技有限公司 Hot microblog topic digging device and method
CN104035960A (en) * 2014-05-08 2014-09-10 东莞市巨细信息科技有限公司 Internet information hotspot predicting method

Also Published As

Publication number Publication date
CN107122481A (en) 2017-09-01

Similar Documents

Publication Publication Date Title
CN107122481B (en) Real-time online prediction method for news popularity
US11847612B2 (en) Social media profiling for one or more authors using one or more social media platforms
Jain et al. Towards automated real-time detection of misinformation on Twitter
CN102411587B (en) Webpage classification method and device
US20140108388A1 (en) Method and system for sorting, searching and presenting micro-blogs
US20160019659A1 (en) Predicting the business impact of tweet conversations
US8527450B2 (en) Apparatus and methods for analyzing and using short messages from commercial accounts
CN103116605A (en) Method and system of microblog hot events real-time detection based on detection subnet
CN103049440A (en) Recommendation processing method and processing system for related articles
CN104657498B (en) The appraisal procedure of microblog users influence power
Sahana et al. Automatic detection of rumoured tweets and finding its origin
US8965867B2 (en) Measuring and altering topic influence on edited and unedited media
US20110314009A1 (en) Method and Device for Extracting Characteristic Relation Circle From Network
WO2013037223A1 (en) Recommendation processing method and device for internet microblog celebrity information
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
Granskogen et al. Fake news detection: Network data from social media used to predict fakes
WO2014183544A1 (en) Method and device for generating a personalized navigation webpage
CN114048389B (en) Content recommendation method and system for engineering machinery industry
KR101486924B1 (en) Method for recommanding media contents using social network service
US20090307344A1 (en) Web page ranking method and system based on user referrals
JP2017091436A (en) Feature word selection device
KR101821777B1 (en) Automatic answering system for on-line bulletin board and method of the same
JP2020129239A (en) Post Analysis System, Post Analysis Device, and Post Analysis Method
Soman et al. A study of Spam Detection Algorithm On Social Media networks
Inches et al. Statistics of online user-generated short documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant