CN106156364A - A kind of method and system of calculating media event dynamic effect power based on time stream - Google Patents

A kind of method and system of calculating media event dynamic effect power based on time stream Download PDF

Info

Publication number
CN106156364A
CN106156364A CN201610625873.0A CN201610625873A CN106156364A CN 106156364 A CN106156364 A CN 106156364A CN 201610625873 A CN201610625873 A CN 201610625873A CN 106156364 A CN106156364 A CN 106156364A
Authority
CN
China
Prior art keywords
media event
event
website
time
comment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610625873.0A
Other languages
Chinese (zh)
Inventor
陈雁
韩修龙
代臻
李平
孙先
胡栋
赵刚
郭培伦
彭欣宇
陈凯琪
杨先凤
朱鹏军
刘婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN201610625873.0A priority Critical patent/CN106156364A/en
Publication of CN106156364A publication Critical patent/CN106156364A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses the method and system of a kind of calculating media event dynamic effect power based on time stream, the report to same event is obtained from different news websites, and obtain each website to the browsing of pageview, comment amount and the correspondence of this event, comment time simultaneously, report the grade of the website of this event, and the classification of corresponding event.According to the data provided, calculate the dynamic effect power that each event flows based on the time.Owing to the demand data in the method easily obtains, the method calculating event power of influence compared to other, on embodiment very simple, efficiency of the practice is also the highest, and the data chosen not only the most rationally and also the mode calculated the most very should be readily appreciated that.

Description

A kind of method and system of calculating media event dynamic effect power based on time stream
Technical field
The present invention relates to internet arena, particularly relate to media event report field on the Internet, specifically a kind of based on The method and system of the calculating media event dynamic effect power of time stream.
Background technology
Along with the development of the Internet, people all can touch every day many media event, but most people ratio That be relatively concerned about or that those impacts are bigger event, this has just had the biggest wanting to the power of influence how calculating media event Asking, the power of influence calculated wants the impact reflecting this event that can be correct.
The power of influence how weighing media event is academic circles at present and the problem of industrial quarters the most very general concern.For Same media event, there are different computational methods different websites, and the most simple artificial mark having, when mark, personnel recognize For this event can cause the biggest when of impact he this event will to be regarded as power of influence huge, otherwise the least.This side The thinking of method simply individual, does not consider the view of the part as to this of general ordinary populace.Additionally also have many other Method evaluates the power of influence of event, but more or less has certain problem.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of calculating media event based on time stream is moved The method and system of state power of influence, it is possible to the power of influence of the calculating event of development trend based on event, the result of calculating more accords with Close popular epistemic logic.
It is an object of the invention to be achieved through the following technical solutions: a kind of calculating media event based on time stream is moved The method of state power of influence, the method comprises the following steps,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and by independent for the ranking of website Build a table and leave in data base, using the grade of targeted news website as a feature of media event;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and data are deposited In data base;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, washing and news thing The information that part is unrelated;
S5. each media event of each website after cleaning is numbered, and carries out the cluster of similar events, obtain The list of similar events and the website of this media event of report;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse Comment number corresponding to number, comment time, browsing time and the website reporting this event in this time period, browse number, the comment time, Browsing time;By the quantity of this this media event of website browsing in time period T and total the browsing of this website in this time period T The ratio of quantity as a feature of media event, by the quantity of this this media event of website comment in time period T with should In time period T, the ratio of total number of reviews of this website is as a feature of media event;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
E inf l u e n c e = C m * Σ i = 1 n rank i * ( 0.4 * browse i allbrowse i + 0.6 * comment i allcomment i )
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmFor time class Not corresponding weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n Individual website is respectively (comment to this event comment amount within a time period and pageview1,browse1)、 (comment2,browse2)…(commentn,browsen), this corresponding website is in this time period total comment amount and browses Amount is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn, allbrowsen)。
Media event classification is divided into three kinds, and one is that this event is identified public media event, and another is initial State is mishap, along with the development of event, slowly becomes the event that the public is concerned about very much, and a kind of is minor matter for original state Part, will not continue the too long of time, there will not be follow-up event fermentation.
When crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
As follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
A kind of system of calculating media event dynamic effect power based on time stream, it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out the dynamic shadow of media event based on time stream by the identical media event after cluster The calculating of the power of sound.
The invention has the beneficial effects as follows: the invention provides a kind of calculating media event dynamic effect power based on time stream Method and system, it is possible to the power of influence of the calculating event of development trend based on event, the result of calculating more meet masses Epistemic logic, system can quickly calculate the dynamic effect power of event, recommends the new of current power of influence maximum for users News event, the propagation for media event provides strong support.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of calculating event dynamic effect power based on time stream;
Fig. 2 is the system framework figure of calculating event dynamic effect power based on time stream.
Detailed description of the invention
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to The following stated.
A kind of method of calculating media event dynamic effect power based on time stream, the method comprises the following steps,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and by independent for the ranking of website Build a table and leave in data base, using the grade of targeted news website as a feature of media event, according to authority's The website ranking that website assessment mechanism is given, when getting the ranking of all of targeted website when, enters all of ranking One normalized calculating of row, determines this website Rank scores in targeted website, the namely weight of this feature;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and data are deposited In data base;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, washing and news thing The information that part is unrelated, improves the accuracy rate of the cluster of similar events;
S5. each media event of each website after cleaning is numbered, numbered globally unique, work together mutually The cluster of part, obtains the list of similar events and reports the website of this media event;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined, for different Calculating yardstick, choosing of time period can be different, can choose the time period of correspondence according to the different demands of business, the present invention's Choosing of time period is 24 hours;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event, no Same event category has different weights calculating last event power of influence when, and media event classification is divided into three kinds, One is that this event is identified public media event, and classification is C1, the such as Olympic Games, two Conferences etc., the shadow of this type of event self Ring force rate relatively big, so a penalty factor can be added this event is determined weight when so that the power of influence meeting finally calculated Have certain cogency, another be original state be mishap, along with the development of event, slowly become the public and be concerned about very much Event, classification is C3, and such event can be bigger given weight when, and also a kind of is mishap for original state, Will not continue the too long of time, there will not be follow-up event fermentation, classification is C2, in the present invention, and the weight value of classification C1 Being 0.8, the weight of classification C2 is 1, and the weight of classification C3 is 1.2;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse Comment number corresponding to number, comment time, browsing time and the website reporting this event in this time period, browse number, the comment time, Browsing time;By the quantity of this this media event of website browsing in time period T and total the browsing of this website in this time period T The ratio of quantity as a feature of media event, by the quantity of this this media event of website comment in time period T with should In time period T, the ratio of total number of reviews of this website is as a feature of media event, pageview feature in the present invention Weight be 0.4, comment measure feature weight be 0.6;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
E inf l u e n c e = C m * Σ i = 1 n rank i * ( 0.4 * browse i allbrowse i + 0.6 * comment i allcomment i )
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmFor time class Not corresponding weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n Individual website is respectively (comment to this event comment amount within a time period and pageview1,browse1)、 (comment2,browse2)…(commentn,browsen), this corresponding website is in this time period total comment amount and browses Amount is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn, allbrowsen)。
When crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
As follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
A kind of system of calculating media event dynamic effect power based on time stream, it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out the dynamic shadow of media event based on time stream by the identical media event after cluster The calculating of the power of sound.
The step that the present invention mainly implements has three steps, as it is shown in figure 1, when carrying out step1 when, from different News Networks Station crawls all media events that this website is current, the packet the crawled time of origin containing every media event, comment number, clear Look at number, comment time, browsing time.
Getting the data of media event when, in step2, the media event got is carried out similar events Extraction, the method for use is classic algorithm LDA (the Latent Dirichlet of topic model in natural language processing Allocation), identical media event can effectively be condensed together by this algorithm, provides the biggest for ensuing calculating Help.
When process proceeds to step3, utilize result the correlated characteristic of binding events of the event aggregation that step2 produces Weight, calculate event dynamic effect power.
The power of influence of the event of calculating when, calculation be according to therewith by the feature of each event that determines Corresponding weight calculates.
Below in conjunction with concrete example, the whole flow process of the application is done one to show and calculating to power of influence does one Individual specific description.
Embodiment, calculate the Real-time and Dynamic power of influence of NPC and CPPCC in 2016, and NPC and CPPCC is in the division of event category Time, it is clear that it is the public media events of focus all paid special attention to of national people, so the classification of event is just easy to Determine, for public media event.In order to obtain preferably comparing convictive dynamic effect power result, calculating when Adding a penalty factor for this is Wcategory=0.8.Next according to the step in accompanying drawing 1:
Step1: capture media event from targeted news website, the crawl time holds the last week from two Conferences, both 2016.2.25 zero point starts to capture media event from targeted news website, it is assumed that our targeted website has 10, is respectively website1、website2、…website9、website10.The website assessment module in accompanying drawing 2 is used to obtain 10 websites Grade ranking is assumed to be rank1, rank2…rank9、rank10.The program that crawls constantly obtains media event from news website, climbs The media event of each website taken has to be commented on number accordingly, browses number and corresponding comment time, browsing time and this thing The time of origin of part.
Step2: the 2016.2.25 zero point that each website is occurred to 2016.2.26 zero point media event during this period of time Extract, utilize LDA that first the media event of ten websites is carried out subject distillation, at the theme distribution obtaining every news After, use KL divergence calculate the similarity of media event and obtain this time period media event report about two Conferences.
Step3: assume into after step2,10 websites had within the time of this day 8 report the relevant whole nation two The media event of meeting, respectively website1、website2、…website8.These 8 website report these events of NPC and CPPCC News comment number within the time period of this day is respectively (comment with browsing number1,browse1)、(comment2, browse2)…(comment8,browse8), corresponding is 8 targeted websites are total within this time period comment number and clear Number of looking at is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcomment8, allbrowse8).Illustrate at this, when some websites has the report a lot of to two Conferences within the time period, process It is the same with computational methods.
When having had these data when, it is possible to calculate the dynamic effect power of media event based on time stream, according to Data in example, 2016.2.25 zero point to 2016.2.26 zero point during this period of time in, the power of influence of NPC and CPPCC is:
E inf l u e n c e = W c a t e g o r y * Σ i = 1 8 rank i * ( 0.4 * comment i allcomment i + 0.6 * browse i allbrowse i ) .

Claims (5)

1. the method for a calculating media event dynamic effect power based on time stream, it is characterised in that: the method includes following Step,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and the ranking of website is individually built one Individual table leaves in data base, using the grade of targeted news website as a feature of media event;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and places the data in number According in storehouse;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, wash with media event without The information closed;
S5. each media event of each website after cleaning is numbered, and carries out the cluster of similar events, obtain identical The list of event and the website of this media event of report;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse number, comment Opinion time, the browsing time comment number corresponding with the website reporting this event in this time period, browse number, comment time, browse Time;The total of this website in the quantity of this this media event of website browsing in time period T and this time period T is browsed quantity Ratio as a feature of media event, by the quantity of this this media event of website comment in time period T and this time In section T, the ratio of total number of reviews of this website is as a feature of media event;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
E inf l u e n c e = C m * Σ i = 1 n rank i * ( 0.4 * browse i allbrowse i + 0.6 * comment i allcomment i )
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmCorresponding for time classification Weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n website This event comment amount within a time period and pageview are respectively (comment1,browse1)、(comment2, browse2)…(commentn,browsen), this corresponding website in this time period total comment amount and pageview is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn,allbrowsen)。
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature Being: media event classification is divided into three kinds, one is that this event is identified public media event, and another is original state Being mishap, along with the development of event, slowly become the event that the public is concerned about very much, a kind of is mishap for original state, The too long of time will not be continued, there will not be follow-up event fermentation.
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature It is: when crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature It is: as follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
5. the system of a calculating media event dynamic effect power based on time stream, it is characterised in that it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out media event dynamic effect power based on time stream by the identical media event after cluster Calculating.
CN201610625873.0A 2016-08-02 2016-08-02 A kind of method and system of calculating media event dynamic effect power based on time stream Pending CN106156364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610625873.0A CN106156364A (en) 2016-08-02 2016-08-02 A kind of method and system of calculating media event dynamic effect power based on time stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610625873.0A CN106156364A (en) 2016-08-02 2016-08-02 A kind of method and system of calculating media event dynamic effect power based on time stream

Publications (1)

Publication Number Publication Date
CN106156364A true CN106156364A (en) 2016-11-23

Family

ID=57328692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610625873.0A Pending CN106156364A (en) 2016-08-02 2016-08-02 A kind of method and system of calculating media event dynamic effect power based on time stream

Country Status (1)

Country Link
CN (1) CN106156364A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967364A (en) * 2017-12-22 2018-04-27 新华网股份有限公司 Web documents transmissibility appraisal procedure and device
CN110598151A (en) * 2019-09-09 2019-12-20 河南牧业经济学院 Method and system for judging news spreading effect
CN111949847A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Information evaluation method, information evaluation device, computer system, and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814171A (en) * 2009-02-24 2010-08-25 李晓萌 Media-oriented network influence index calculation method
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN103164427A (en) * 2011-12-13 2013-06-19 中国移动通信集团公司 Method and device of news aggregation
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104598450A (en) * 2013-10-30 2015-05-06 北大方正集团有限公司 Popularity analysis method and system of network public opinion event
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101814171A (en) * 2009-02-24 2010-08-25 李晓萌 Media-oriented network influence index calculation method
CN103164427A (en) * 2011-12-13 2013-06-19 中国移动通信集团公司 Method and device of news aggregation
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104598450A (en) * 2013-10-30 2015-05-06 北大方正集团有限公司 Popularity analysis method and system of network public opinion event
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967364A (en) * 2017-12-22 2018-04-27 新华网股份有限公司 Web documents transmissibility appraisal procedure and device
CN110598151A (en) * 2019-09-09 2019-12-20 河南牧业经济学院 Method and system for judging news spreading effect
CN111949847A (en) * 2020-08-14 2020-11-17 中国工商银行股份有限公司 Information evaluation method, information evaluation device, computer system, and medium

Similar Documents

Publication Publication Date Title
CN105260474B (en) A kind of microblog users influence power computational methods based on information exchange network
CN102663101B (en) A kind of user gradation sort algorithm based on Sina's microblogging
CN102279851B (en) Intelligent navigation method, device and system
CN102377790B (en) A kind of method and apparatus of propelling data
CN103617169B (en) A kind of hot microblog topic extracting method based on Hadoop
CN103678613B (en) Method and device for calculating influence data
CN103177090B (en) A kind of topic detection method and device based on big data
CN103617289B (en) Micro-blog recommendation method based on user characteristics and cyberrelationship
CN105279288A (en) Online content recommending method based on deep neural network
CN107066476A (en) A kind of real-time recommendation method based on article similarity
CN106776841A (en) The acquisition methods and system of a kind of internet public feelings event propagation index
CN106528693A (en) Individualized learning-oriented educational resource recommendation method and system
CN104394118A (en) User identity identification method and system
CN102750320B (en) Method, device and system for calculating network video real-time attention
CN106021577B (en) Information pushing method and device and electronic equipment
TW201214167A (en) Matching text sets
CN101819573A (en) Self-adaptive network public opinion identification method
CN105354305A (en) Online-rumor identification method and apparatus
CN103309894B (en) Based on search implementation method and the system of user property
CN104462383A (en) Movie recommendation method based on feedback of users' various behaviors
CN102955813B (en) A kind of information search method and system
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN108009220A (en) A kind of method for being detected in network hotspot public sentiment event and positioning abnormal user
CN105260899A (en) Electronic business subject credibility evaluation method and system
CN106156364A (en) A kind of method and system of calculating media event dynamic effect power based on time stream

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161123

RJ01 Rejection of invention patent application after publication