CN106156364A - A kind of method and system of calculating media event dynamic effect power based on time stream - Google Patents
A kind of method and system of calculating media event dynamic effect power based on time stream Download PDFInfo
- Publication number
- CN106156364A CN106156364A CN201610625873.0A CN201610625873A CN106156364A CN 106156364 A CN106156364 A CN 106156364A CN 201610625873 A CN201610625873 A CN 201610625873A CN 106156364 A CN106156364 A CN 106156364A
- Authority
- CN
- China
- Prior art keywords
- media event
- event
- website
- time
- comment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention discloses the method and system of a kind of calculating media event dynamic effect power based on time stream, the report to same event is obtained from different news websites, and obtain each website to the browsing of pageview, comment amount and the correspondence of this event, comment time simultaneously, report the grade of the website of this event, and the classification of corresponding event.According to the data provided, calculate the dynamic effect power that each event flows based on the time.Owing to the demand data in the method easily obtains, the method calculating event power of influence compared to other, on embodiment very simple, efficiency of the practice is also the highest, and the data chosen not only the most rationally and also the mode calculated the most very should be readily appreciated that.
Description
Technical field
The present invention relates to internet arena, particularly relate to media event report field on the Internet, specifically a kind of based on
The method and system of the calculating media event dynamic effect power of time stream.
Background technology
Along with the development of the Internet, people all can touch every day many media event, but most people ratio
That be relatively concerned about or that those impacts are bigger event, this has just had the biggest wanting to the power of influence how calculating media event
Asking, the power of influence calculated wants the impact reflecting this event that can be correct.
The power of influence how weighing media event is academic circles at present and the problem of industrial quarters the most very general concern.For
Same media event, there are different computational methods different websites, and the most simple artificial mark having, when mark, personnel recognize
For this event can cause the biggest when of impact he this event will to be regarded as power of influence huge, otherwise the least.This side
The thinking of method simply individual, does not consider the view of the part as to this of general ordinary populace.Additionally also have many other
Method evaluates the power of influence of event, but more or less has certain problem.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of calculating media event based on time stream is moved
The method and system of state power of influence, it is possible to the power of influence of the calculating event of development trend based on event, the result of calculating more accords with
Close popular epistemic logic.
It is an object of the invention to be achieved through the following technical solutions: a kind of calculating media event based on time stream is moved
The method of state power of influence, the method comprises the following steps,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and by independent for the ranking of website
Build a table and leave in data base, using the grade of targeted news website as a feature of media event;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and data are deposited
In data base;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, washing and news thing
The information that part is unrelated;
S5. each media event of each website after cleaning is numbered, and carries out the cluster of similar events, obtain
The list of similar events and the website of this media event of report;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse
Comment number corresponding to number, comment time, browsing time and the website reporting this event in this time period, browse number, the comment time,
Browsing time;By the quantity of this this media event of website browsing in time period T and total the browsing of this website in this time period T
The ratio of quantity as a feature of media event, by the quantity of this this media event of website comment in time period T with should
In time period T, the ratio of total number of reviews of this website is as a feature of media event;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmFor time class
Not corresponding weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n
Individual website is respectively (comment to this event comment amount within a time period and pageview1,browse1)、
(comment2,browse2)…(commentn,browsen), this corresponding website is in this time period total comment amount and browses
Amount is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn,
allbrowsen)。
Media event classification is divided into three kinds, and one is that this event is identified public media event, and another is initial
State is mishap, along with the development of event, slowly becomes the event that the public is concerned about very much, and a kind of is minor matter for original state
Part, will not continue the too long of time, there will not be follow-up event fermentation.
When crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
As follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
A kind of system of calculating media event dynamic effect power based on time stream, it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out the dynamic shadow of media event based on time stream by the identical media event after cluster
The calculating of the power of sound.
The invention has the beneficial effects as follows: the invention provides a kind of calculating media event dynamic effect power based on time stream
Method and system, it is possible to the power of influence of the calculating event of development trend based on event, the result of calculating more meet masses
Epistemic logic, system can quickly calculate the dynamic effect power of event, recommends the new of current power of influence maximum for users
News event, the propagation for media event provides strong support.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of calculating event dynamic effect power based on time stream;
Fig. 2 is the system framework figure of calculating event dynamic effect power based on time stream.
Detailed description of the invention
Technical scheme is described in further detail below in conjunction with the accompanying drawings, but protection scope of the present invention is not limited to
The following stated.
A kind of method of calculating media event dynamic effect power based on time stream, the method comprises the following steps,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and by independent for the ranking of website
Build a table and leave in data base, using the grade of targeted news website as a feature of media event, according to authority's
The website ranking that website assessment mechanism is given, when getting the ranking of all of targeted website when, enters all of ranking
One normalized calculating of row, determines this website Rank scores in targeted website, the namely weight of this feature;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and data are deposited
In data base;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, washing and news thing
The information that part is unrelated, improves the accuracy rate of the cluster of similar events;
S5. each media event of each website after cleaning is numbered, numbered globally unique, work together mutually
The cluster of part, obtains the list of similar events and reports the website of this media event;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined, for different
Calculating yardstick, choosing of time period can be different, can choose the time period of correspondence according to the different demands of business, the present invention's
Choosing of time period is 24 hours;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event, no
Same event category has different weights calculating last event power of influence when, and media event classification is divided into three kinds,
One is that this event is identified public media event, and classification is C1, the such as Olympic Games, two Conferences etc., the shadow of this type of event self
Ring force rate relatively big, so a penalty factor can be added this event is determined weight when so that the power of influence meeting finally calculated
Have certain cogency, another be original state be mishap, along with the development of event, slowly become the public and be concerned about very much
Event, classification is C3, and such event can be bigger given weight when, and also a kind of is mishap for original state,
Will not continue the too long of time, there will not be follow-up event fermentation, classification is C2, in the present invention, and the weight value of classification C1
Being 0.8, the weight of classification C2 is 1, and the weight of classification C3 is 1.2;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse
Comment number corresponding to number, comment time, browsing time and the website reporting this event in this time period, browse number, the comment time,
Browsing time;By the quantity of this this media event of website browsing in time period T and total the browsing of this website in this time period T
The ratio of quantity as a feature of media event, by the quantity of this this media event of website comment in time period T with should
In time period T, the ratio of total number of reviews of this website is as a feature of media event, pageview feature in the present invention
Weight be 0.4, comment measure feature weight be 0.6;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmFor time class
Not corresponding weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n
Individual website is respectively (comment to this event comment amount within a time period and pageview1,browse1)、
(comment2,browse2)…(commentn,browsen), this corresponding website is in this time period total comment amount and browses
Amount is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn,
allbrowsen)。
When crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
As follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
A kind of system of calculating media event dynamic effect power based on time stream, it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out the dynamic shadow of media event based on time stream by the identical media event after cluster
The calculating of the power of sound.
The step that the present invention mainly implements has three steps, as it is shown in figure 1, when carrying out step1 when, from different News Networks
Station crawls all media events that this website is current, the packet the crawled time of origin containing every media event, comment number, clear
Look at number, comment time, browsing time.
Getting the data of media event when, in step2, the media event got is carried out similar events
Extraction, the method for use is classic algorithm LDA (the Latent Dirichlet of topic model in natural language processing
Allocation), identical media event can effectively be condensed together by this algorithm, provides the biggest for ensuing calculating
Help.
When process proceeds to step3, utilize result the correlated characteristic of binding events of the event aggregation that step2 produces
Weight, calculate event dynamic effect power.
The power of influence of the event of calculating when, calculation be according to therewith by the feature of each event that determines
Corresponding weight calculates.
Below in conjunction with concrete example, the whole flow process of the application is done one to show and calculating to power of influence does one
Individual specific description.
Embodiment, calculate the Real-time and Dynamic power of influence of NPC and CPPCC in 2016, and NPC and CPPCC is in the division of event category
Time, it is clear that it is the public media events of focus all paid special attention to of national people, so the classification of event is just easy to
Determine, for public media event.In order to obtain preferably comparing convictive dynamic effect power result, calculating when
Adding a penalty factor for this is Wcategory=0.8.Next according to the step in accompanying drawing 1:
Step1: capture media event from targeted news website, the crawl time holds the last week from two Conferences, both
2016.2.25 zero point starts to capture media event from targeted news website, it is assumed that our targeted website has 10, is respectively
website1、website2、…website9、website10.The website assessment module in accompanying drawing 2 is used to obtain 10 websites
Grade ranking is assumed to be rank1, rank2…rank9、rank10.The program that crawls constantly obtains media event from news website, climbs
The media event of each website taken has to be commented on number accordingly, browses number and corresponding comment time, browsing time and this thing
The time of origin of part.
Step2: the 2016.2.25 zero point that each website is occurred to 2016.2.26 zero point media event during this period of time
Extract, utilize LDA that first the media event of ten websites is carried out subject distillation, at the theme distribution obtaining every news
After, use KL divergence calculate the similarity of media event and obtain this time period media event report about two Conferences.
Step3: assume into after step2,10 websites had within the time of this day 8 report the relevant whole nation two
The media event of meeting, respectively website1、website2、…website8.These 8 website report these events of NPC and CPPCC
News comment number within the time period of this day is respectively (comment with browsing number1,browse1)、(comment2,
browse2)…(comment8,browse8), corresponding is 8 targeted websites are total within this time period comment number and clear
Number of looking at is (allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcomment8,
allbrowse8).Illustrate at this, when some websites has the report a lot of to two Conferences within the time period, process
It is the same with computational methods.
When having had these data when, it is possible to calculate the dynamic effect power of media event based on time stream, according to
Data in example, 2016.2.25 zero point to 2016.2.26 zero point during this period of time in, the power of influence of NPC and CPPCC is:
Claims (5)
1. the method for a calculating media event dynamic effect power based on time stream, it is characterised in that: the method includes following
Step,
S1. the targeted news website needing to capture is determined;
S2. the overall ranking of acquisition targeted website at the website grade evaluation of authority, and the ranking of website is individually built one
Individual table leaves in data base, using the grade of targeted news website as a feature of media event;
S3. crawl the media event of targeted website, the media event crawled is set up corresponding table and places the data in number
According in storehouse;
S4. from data base, take out the media event crawled and carry out the removal of relevant stop words, wash with media event without
The information closed;
S5. each media event of each website after cleaning is numbered, and carries out the cluster of similar events, obtain identical
The list of event and the website of this media event of report;
S6., after obtaining similar events list, the time period T calculating media event dynamic effect power is determined;
S7. the classification of the media event of acquisition is determined, using the classification of media event as a feature of media event;
S8. from data base, obtain each event is corresponding in time period T in similar events list comment number, browse number, comment
Opinion time, the browsing time comment number corresponding with the website reporting this event in this time period, browse number, comment time, browse
Time;The total of this website in the quantity of this this media event of website browsing in time period T and this time period T is browsed quantity
Ratio as a feature of media event, by the quantity of this this media event of website comment in time period T and this time
In section T, the ratio of total number of reviews of this website is as a feature of media event;
S9. the weight that each feature of media event is corresponding is determined;
S10. according to the dynamic effect power of weight calculation media event corresponding to feature, computing formula is as follows:
Wherein, n is website number, and the weight after n website normalization is rank1, rank2…rankn, CmCorresponding for time classification
Weight, wherein m be 1,2,3 one of them, the time period T chosen is one day, in the time period of this day, n website
This event comment amount within a time period and pageview are respectively (comment1,browse1)、(comment2,
browse2)…(commentn,browsen), this corresponding website in this time period total comment amount and pageview is
(allcomment1,allbrowse1)、(allcomment2,allbrowse2)…(allcommentn,allbrowsen)。
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature
Being: media event classification is divided into three kinds, one is that this event is identified public media event, and another is original state
Being mishap, along with the development of event, slowly become the event that the public is concerned about very much, a kind of is mishap for original state,
The too long of time will not be continued, there will not be follow-up event fermentation.
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature
It is: when crawling the media event of targeted website, the crawling content and should include of each media event:
A. the time of this news is reported in this targeted website;
B. the comment amount to this media event and the time of every comment are reported in this targeted website;
C. the pageview to this media event and each browsing time are reported in this targeted website.
The method of a kind of calculating media event dynamic effect power based on time stream the most according to claim 1, its feature
It is: as follows to the step of the cluster that all of news carries out similar events:
S501: every news content is carried out participle;
S502: use LDA model to draw the theme distribution of each news the result after participle;
S503: use KL divergence to calculate the similarity between media event;
S504: when the KL divergence of media event is more than the threshold values set when, be attributed to same event.
5. the system of a calculating media event dynamic effect power based on time stream, it is characterised in that it includes:
The acquisition module of media event, obtains the media event of targeted website;
Data memory module, stores the media event crawled and the correlated results produced in data processing;
Data cleansing module, obtains the data crawled from data base and carries out the cleaning being correlated with;
Affair clustering module, carries out the cluster of similar events to the media event after cleaning;
Power of influence computing module, carries out media event dynamic effect power based on time stream by the identical media event after cluster
Calculating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610625873.0A CN106156364A (en) | 2016-08-02 | 2016-08-02 | A kind of method and system of calculating media event dynamic effect power based on time stream |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610625873.0A CN106156364A (en) | 2016-08-02 | 2016-08-02 | A kind of method and system of calculating media event dynamic effect power based on time stream |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106156364A true CN106156364A (en) | 2016-11-23 |
Family
ID=57328692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610625873.0A Pending CN106156364A (en) | 2016-08-02 | 2016-08-02 | A kind of method and system of calculating media event dynamic effect power based on time stream |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106156364A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967364A (en) * | 2017-12-22 | 2018-04-27 | 新华网股份有限公司 | Web documents transmissibility appraisal procedure and device |
CN110598151A (en) * | 2019-09-09 | 2019-12-20 | 河南牧业经济学院 | Method and system for judging news spreading effect |
CN111949847A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Information evaluation method, information evaluation device, computer system, and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814171A (en) * | 2009-02-24 | 2010-08-25 | 李晓萌 | Media-oriented network influence index calculation method |
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN103164427A (en) * | 2011-12-13 | 2013-06-19 | 中国移动通信集团公司 | Method and device of news aggregation |
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
CN104598450A (en) * | 2013-10-30 | 2015-05-06 | 北大方正集团有限公司 | Popularity analysis method and system of network public opinion event |
CN105550365A (en) * | 2016-01-15 | 2016-05-04 | 中国科学院自动化研究所 | Visualization analysis system based on text topic model |
-
2016
- 2016-08-02 CN CN201610625873.0A patent/CN106156364A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814171A (en) * | 2009-02-24 | 2010-08-25 | 李晓萌 | Media-oriented network influence index calculation method |
CN103164427A (en) * | 2011-12-13 | 2013-06-19 | 中国移动通信集团公司 | Method and device of news aggregation |
CN103077190A (en) * | 2012-12-20 | 2013-05-01 | 人民搜索网络股份公司 | Hot event ranking method based on order learning technology |
CN104598450A (en) * | 2013-10-30 | 2015-05-06 | 北大方正集团有限公司 | Popularity analysis method and system of network public opinion event |
CN104216954A (en) * | 2014-08-20 | 2014-12-17 | 北京邮电大学 | Prediction device and prediction method for state of emergency topic |
CN105550365A (en) * | 2016-01-15 | 2016-05-04 | 中国科学院自动化研究所 | Visualization analysis system based on text topic model |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107967364A (en) * | 2017-12-22 | 2018-04-27 | 新华网股份有限公司 | Web documents transmissibility appraisal procedure and device |
CN110598151A (en) * | 2019-09-09 | 2019-12-20 | 河南牧业经济学院 | Method and system for judging news spreading effect |
CN111949847A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Information evaluation method, information evaluation device, computer system, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105260474B (en) | A kind of microblog users influence power computational methods based on information exchange network | |
CN102663101B (en) | A kind of user gradation sort algorithm based on Sina's microblogging | |
CN102279851B (en) | Intelligent navigation method, device and system | |
CN102377790B (en) | A kind of method and apparatus of propelling data | |
CN103617169B (en) | A kind of hot microblog topic extracting method based on Hadoop | |
CN103678613B (en) | Method and device for calculating influence data | |
CN103177090B (en) | A kind of topic detection method and device based on big data | |
CN103617289B (en) | Micro-blog recommendation method based on user characteristics and cyberrelationship | |
CN105279288A (en) | Online content recommending method based on deep neural network | |
CN107066476A (en) | A kind of real-time recommendation method based on article similarity | |
CN106776841A (en) | The acquisition methods and system of a kind of internet public feelings event propagation index | |
CN106528693A (en) | Individualized learning-oriented educational resource recommendation method and system | |
CN104394118A (en) | User identity identification method and system | |
CN102750320B (en) | Method, device and system for calculating network video real-time attention | |
CN106021577B (en) | Information pushing method and device and electronic equipment | |
TW201214167A (en) | Matching text sets | |
CN101819573A (en) | Self-adaptive network public opinion identification method | |
CN105354305A (en) | Online-rumor identification method and apparatus | |
CN103309894B (en) | Based on search implementation method and the system of user property | |
CN104462383A (en) | Movie recommendation method based on feedback of users' various behaviors | |
CN102955813B (en) | A kind of information search method and system | |
CN103324745A (en) | Text garbage identifying method and system based on Bayesian model | |
CN108009220A (en) | A kind of method for being detected in network hotspot public sentiment event and positioning abnormal user | |
CN105260899A (en) | Electronic business subject credibility evaluation method and system | |
CN106156364A (en) | A kind of method and system of calculating media event dynamic effect power based on time stream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161123 |
|
RJ01 | Rejection of invention patent application after publication |