CN109947935A - The generation method and device of media event - Google Patents
The generation method and device of media event Download PDFInfo
- Publication number
- CN109947935A CN109947935A CN201810938137.XA CN201810938137A CN109947935A CN 109947935 A CN109947935 A CN 109947935A CN 201810938137 A CN201810938137 A CN 201810938137A CN 109947935 A CN109947935 A CN 109947935A
- Authority
- CN
- China
- Prior art keywords
- news
- event
- media event
- subevent
- history
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides the generation method and device of a kind of media event, comprising: obtains the news data collected in preset time period;It include news content corresponding to each news of collection in news data;It is clustered according to news of the news content corresponding to each news to collection;Gather and constitutes a news subevent for a kind of news;Belong to of a sort news in the news that news subevent is used to describe to collect in preset time period;For each news subevent, the history media event of preset threshold is reached from the event similarity searched between the news subevent in history media event set;Event similarity is determined based on the event keyword feature of history media event and the news subevent;The news for belonging to the news subevent is incorporated into the history media event found, the media event after being merged.In the application, reduce the workload of cluster, the time-consuming for generating media event is shortened, to ensure that the real-time of media event.
Description
Technical field
This application involves field of information processing more particularly to the generation methods and device of a kind of media event.
Background technique
With the rapid development of Internet technology, network becomes each media releasing news information and user obtains news
The main channel of information.And due to the open feature of internet, the news information issued on website may be many and diverse unordered,
The news information for describing same media event may be dispersed on different websites, be unfavorable for understanding in depth for user.In order to
Media event is understood in depth convenient for user, is generally integrated together to by the news under same media event with special topic
Form shows user.
In the prior art, it when needing to generate the media event for being directed to some news every time, requires to obtain news front and back
Then the news data released news in a very long time carries out clustering to all news data collection of collection, from
And it clusters and generates media event.It requires to carry out clustering processing to mass data when in this way, generating media event every time, due to new
It is larger to hear data volume, therefore calculation amount is larger when being clustered, and causes to take a long time, to not can guarantee the reality of media event
Shi Xing.
Therefore, it is necessary to a kind of technical solution be proposed, to accelerate the formation speed of media event, to guarantee media event
Real-time.
Summary of the invention
The purpose of the embodiment of the present application is to provide the generation method and device of a kind of media event, to solve in the prior art
When generating media event, calculative data volume is larger, takes a long time, thus not can guarantee the real-time of media event
Problem.
In order to solve the above technical problems, the embodiment of the present application is achieved in that
The embodiment of the present application provides a kind of generation method of media event, comprising:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data
Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering
News constitutes a news subevent;Belong in the news that the news subevent is used to describe to collect in the preset time period
Of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent
Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and
The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged
News event.
The application also embodiment provides a kind of generating means of media event, comprising:
Module is obtained, for obtaining the news data collected in preset time period;It wherein, include searching in the news data
News content corresponding to each news of collection;
Cluster module clusters the news of collection for the news content according to corresponding to each news;Its
In, gather and constitutes a news subevent for a kind of news;The news subevent is for describing to search in the preset time period
Belong to of a sort news in the news of collection;
Searching module is searched and the news from history media event set for being directed to each news subevent
Event similarity between subevent reaches the history media event of preset threshold;Wherein, the event similarity is based on described
History media event and the event keyword feature of the news subevent are determined;
Merging module, for the news for belonging to the news subevent to be incorporated into the history media event found,
Media event after being merged.
The embodiment of the present application also provides a kind of generating devices of media event, comprising:
Processor;And it is arranged to the memory of storage computer executable instructions, the computer executable instructions
The processor is set to realize following below scheme when executed:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data
Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering
News constitutes a news subevent;In the news that the news subevent is used to describe to collect in the preset time period
Belong to of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent
Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and
The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged
News event.
The embodiment of the present application also provides a kind of storage mediums, for storing computer executable instructions, the computer
Executable instruction realizes following below scheme when executed:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data
Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering
News constitutes a news subevent;In the news that the news subevent is used to describe to collect in the preset time period
Belong to of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent
Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and
The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged
News event.
The generation method and device of media event provided by the embodiments of the present application, by the news collected in a period of time
It is clustered, generates the news subevent one by one in this time;Then and the event phase between the news subevent then,
It is merged like the history media event that degree reaches given threshold, using the media event after merging as the media event generated.
In this way, due to the merging for constantly carrying out media event, every time when generating media event, it is only necessary to kainogenesis in a period of time
News clustered, reduce the workload of cluster, shorten generate media event time-consuming, to ensure that news
The real-time of event.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in application, for those of ordinary skill in the art, in the premise of not making the creative labor property
Under, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the first method flow chart of the generation method of media event provided by the embodiments of the present application;
Fig. 2 is a kind of concrete structure schematic diagram that media event provided by the embodiments of the present application generates server;
Fig. 3 is the flow diagram of media event generation method provided by the embodiments of the present application;
Fig. 4 is the second method flow chart of the generation method of media event provided by the embodiments of the present application;
Fig. 5 is the module diagram of the generating means of media event provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of the generating device of media event provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality
The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation
Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common
The application protection all should belong in technical staff's every other embodiment obtained without creative efforts
Range.
The thought of the embodiment of the present application is, just gathers to the news collected in this time at interval of a period of time
Class obtains the news subevent one by one in this time;It then again will the obtained each news subevent of cluster and new with this
The history media event that event similarity between news reaches preset threshold merges, to obtain media event.Due to not
The disconnected merging for carrying out media event, every time when generating media event, it is only necessary to which the news of kainogenesis in a period of time is carried out
Cluster avoids clustering a large amount of news in some time section, reduces the quantity of the news clustered, thus
The cluster spent time is reduced, can guarantee the real-time of media event.Based on this, the embodiment of the present application provides one kind
The generation method and device of media event.It is following that the generation side of media event provided by the embodiments of the present application will be discussed in detail respectively
Method and device.
It should be noted that multiple of a sort news constitute a media event, when the quantity of news in media event
When reaching preset threshold, which can serve as Special Topics in Journalism and is issued by related web site.
The embodiment of the present application provides a kind of generation method of media event, and this method can be applied to server side, i.e.,
The executing subject of method provided by the embodiments of the present application can be server, specifically, can be set to it is new on server
The generating means of news event.
Fig. 1 is the first method flow chart of the generation method of media event provided by the embodiments of the present application, shown in Fig. 1
Method, include at least following steps:
Step 102, the news data collected in preset time period is obtained;It wherein, include collecting in above-mentioned news data
News content corresponding to each news.
In the embodiment of the present application, constantly can to crawl the website from each news website by web crawlers currently newest
The news data released news.Wherein, the news data of the crawled news of web crawlers may include the news news content,
The information such as the issuing time of news, the issuing web site address of news, the publication medium of news and the affiliated type of news.Specifically
, above-mentioned news content may include headline and body etc..
In the specific implementation, it is new can to crawl in this time this from each news website at interval of setting duration for news crawler
Hear the relevant information of the news of website orientation.For example, news crawler can at interval of 2 minutes from each news website crawl this 2
The relevant information of the news of news website publication in minute.Specifically, news crawler is when each news website crawls news
Between the length that is spaced can be configured according to practical application scene, the embodiment of the present application is not to the specific value of time interval
It is defined.
In certain embodiments, news crawler can also crawl the newest publication of the news website from each news website in real time
The relevant information of news.
It in the embodiment of the present application, may be every if the news crawler frequency that crawls news from each news website is higher
The secondary news negligible amounts crawled, need not execute a cluster operation.It therefore, can be new what is crawled in order to save resource
Cluster is executed again after hearing accumulation a period of time.
In the specific implementation, caching can be written into the news data that news crawler crawls every time, be clustered when needing to be implemented
When operation, then news data is read from caching.
In addition, the news data crawled can be written by message queue (MQ) when caching is written in news data
Caching, in this manner it is achieved that decoupling services and solve the problems, such as that peak value is handled.
It should be noted that in some cases, the news data that news crawler crawls from each news website there may be
The case where missing data, invalid data.Therefore, it in order to guarantee the accuracy of news data, is written by the news data crawled
News data can be read before caching to be cleaned, and caching then is written into the news data after cleaning.
Therefore, in the embodiment of the present application, it is pre- can to obtain this at interval of preset time period for the generating means of media event
If the news data collected in the period.Wherein, which can be any for half an hour, one hour or two hours etc.
Time span.Specific value in above-mentioned preset time can be configured according to practical application scene, and the embodiment of the present application is simultaneously
The specific value of above-mentioned preset time period is not defined.
Step 104, the news content according to corresponding to each news clusters the news of collection;Wherein, gather is one
The news of class constitutes a news subevent, belongs in the news which is used to describe to collect in preset time period same
A kind of news.
In the embodiment of the present application, the news of collection can be clustered using clustering algorithm.Specifically, can use
Hierarchical clustering algorithm clusters news.For example, chain (centroid linkage) clustering algorithm pair of mass center can be used
The news of acquisition is clustered.
When clustering using the chain clustering algorithm of mass center to news, following process is specifically included:
It chooses in the news collected and belongs to same type of news, then, each news is classified as a class cluster, a class
Only include a news in cluster, calculates the distance between each class cluster and other class clusters, distance is met to the class of preset condition
Cluster merges;Then, the distance between the class cluster for recalculating newly generated class cluster and being had been friends in the past will preset item apart from meeting
The class cluster of part merges, until preset condition is not satisfied in calculated all distances;It can be with by the above method
By same type of news cluster at news subevent one by one.
Wherein, the type of above-mentioned news refers to entertainment news, current political news, social news etc..Belong to same type of
News then refers to the identical news of type.In addition, in the detailed process of above-mentioned cluster, each class cluster calculated and its
The distance between his class cluster can be the distance of class cluster central point.
Specifically, belonging in the news that above-mentioned news subevent is used to describe to collect in preset time period of a sort new
It hears, for example, belonging to of a sort news all is that " U.S. government is outstanding to announce to imported large-scale washing machine and photovoltaic with news subevent
Product takes the global safeguard measure of 4 years by a definite date and 3 years respectively, and imposes maximum tariff respectively up to 30% and 50% tariff "
Relevant news, the news subevent are then used to describe to belong to this kind of news in the news collected.
Step 106, it for each news subevent, is searched between the news subevent from history media event set
Event similarity reach the history media event of preset threshold;Wherein, above-mentioned event similarity be based on history media event and
The event keyword feature of the news subevent is determined.
In above-mentioned steps 106, for each news subevent, searched from history media event set and news
Event similarity between event reaches the history media event of given threshold, include at least following steps (1), step (2) and
Step (3);
Step (1) extracts keyword corresponding to each news subevent;
Step (2) is directed to each news subevent, and screening from history media event set has with the news subevent
The history media event subclass of same keyword;
Step (3) is searched and the event phase between the news subevent from the history media event subclass filtered out
Reach the history media event of preset threshold like degree.
Wherein, the keyword in above-mentioned steps (1) is the representative word that can characterize the news subevent.For example, if
It is news subevent for " on March 10th, 2018, US President's same day announce that the U.S. will impose import steel products at the White House
25% tariff imposes 10% tariff to Imported Aluminium product ", then keyword corresponding to the news subevent can be with are as follows: " beauty
State president ", " U.S. ", " import ", " tariff ", " steel ", " aluminium product " etc..
It should be noted that in the embodiment of the present application, can be extracted in news subevent using keyword extraction algorithm
Keyword, for example, term frequency-inverse document frequency (Term Frequency-Inverse Document can be used
Frequency, TF-IDF) algorithm extract news subevent in keyword.Specific extraction process is as follows:
News content corresponding to news subevent is split into multiple words first, each words is calculated by formula 1
The frequency of appearance, i.e. word frequency (Term Frequency, TF);Then, the inverse document frequency of the news subevent is calculated by formula 2
Rate (Inverse Document Frequency, IDF);Finally, calculating the TF-IDF value of the news subevent by formula 3.
Word frequency=total words number the formula 1 in certain words frequency of occurrence/news subevent
Inverse document frequency=log (total quantity of news subevent/(quantity+1 of the news subevent comprising the word))
Formula 2
TF-IDF value=word frequency * inverse document frequency formula 3
It in the embodiment of the present application, can will be under each news subevent when calculating the TF-IDF value of each words
All news are regarded a document as and are handled.Therefore, in above-mentioned formula 1, the frequency of occurrence of certain words is that some words exists
The frequency of occurrence in all news under the news subevent.In above-mentioned formula 2, the total quantity of news subevent is referred to
The quantity of the news subevent of generation is clustered in above-mentioned steps 104.
It, can be according to TF-IDF value from big to small in calculating the news subevent after the TF-IDF value of each words
Sequence chooses keyword of the preset quantity words as the news subevent;Alternatively, TF-IDF value can also be greater than default
Keyword of the words of value as the news subevent.
For ease of understanding, following to be illustrated citing.
For example, in a specific embodiment, after being clustered to the news collected in preset time period, obtaining new
Hear subevent 1, news subevent 2 and news subevent, 3 three, news subevent, also, news subevent 1 include news 1, it is new
2 and news 3 are heard, news subevent 2 includes news 4 and news 5, and news subevent 3 includes news 6, news 7 and news 8.
When calculating the keyword of news subevent 1, news 1, news 2 and news 3 can be merged into a news text
Shelves, are denoted as the corresponding document 1 in news subevent 1, news 4 and news 5 are merged into a news documents, are denoted as news subevent
News 6, news 7 and news 8 are merged into a news documents by 2 corresponding documents 2, are denoted as the corresponding document in news subevent 3
3。
Therefore, in calculating news subevent 1 when the word frequency of some words, the words going out in document 1 can be calculated
The ratio of the total degree of occurrence number and document 1, using the ratio as the word frequency of the words;Then the inverse of news subevent 1 is calculated
The product of the word frequency of the words and inverse document frequency is finally determined as the TF-IDF value of the words by document frequency.
Specifically, being pre-established with history news library in above-mentioned steps (2), it is stored in history news library each
The mapping relations of the keyword of history media event and the history media event.Specifically, storing news in history news library
When the mapping relations of historical events and keyword, can be used history media event unique encodings (identification,
ID the history media event) is characterized.Specifically, one kind of the history media event and keyword stored in history news library is reflected
It is as shown in table 1 to penetrate relationship.
Certainly, table 1 is exemplary illustration, does not constitute the number to the history media event stored in history news library
The restriction of amount, history media event ID and corresponding keyword.
Table 1
Media event ID | Keyword |
5669865 | The U.S., US President, import, tariff ... |
7628921 | China, Wenchuan, earthquake ... |
8559872 | World cup, football ... |
In a specific embodiment, pass corresponding to each news subevent is extracted in (1) through the above steps
After keyword, by pass corresponding to each history media event in keyword corresponding to each news subevent and history news library
Keyword is compared, and judges in history news library with the presence or absence of the history news thing with news subevent with same keyword
Part.
Then, with each news subevent will there is the history media event of same keyword to screen respectively, forms
One set, becomes history media event subclass.It should be noted that the corresponding history news thing in a news subevent
Part subclass.
In addition, in the embodiment of the present application, as long as history media event is identical there are one with some news subevent
Keyword is considered as the history media event and the news subevent keyword having the same.
For example, keyword corresponding to some history media event is " U.S., US President, pass in history news library
Tax ", clustering keyword corresponding to one of news subevent of generation is " U.S., in emerging, ban ", above-mentioned history news
Event and the news subevent have an identical keyword " U.S. ", it is therefore contemplated that the history media event with it is upper
News subevent is stated with same keyword.
In the embodiment of the present application, from history media event set search it is similar to the event between news subevent
When degree reaches the history media event of preset threshold, first filtering out from history media event set has phase with news subevent
With the history media event subclass of keyword, then, then lookup and news subevent phase from history media event subclass
Matched history media event.In this way, first passing through the mode of keyword to history media event in such a way that two steps are searched
Screened roughly, then from the history media event subclass filtered out search and news subevent between event similarity
Reach the history media event of preset threshold, it is possible to reduce search the workload when history media event to match, raising is looked into
Look for efficiency.
Specifically, being searched and the sub- thing of the news from the history media event subclass filtered out in above-mentioned steps (3)
Event similarity between part reaches the history media event of preset threshold, specifically includes following process:
Calculate separately the thing between each history media event in the news subevent and history media event subclass
Part similarity value;Each event similarity value is compared with preset threshold, it is similar with the event that lookup reaches preset threshold
History media event corresponding to angle value.
Specifically, cosine similarity, Simhash algorithm or longest common subsequence (Longest can be used
Common Subsequence, LSC) algorithm calculates each history news thing in news subevent and history media event subclass
Event similarity value between part.
It is following by by taking the method using cosine similarity as an example, calculate between news subevent and history media event in detail
Event similarity value detailed process.
It determines first each corresponding to the word frequency and history media event of each keyword corresponding to news subevent
The word frequency of a keyword;Keyword corresponding to above-mentioned news subevent and the corresponding keyword of history media event are closed
Keyword set is constituted together, is determined according to word frequency of each keyword in above-mentioned keyword set in news subevent new
Vector corresponding to subevent is heard, vector a is denoted as, according to the keyword in above-mentioned keyword set in history media event
Word frequency determines vector corresponding to history media event, is denoted as vector b;Wherein, above-mentioned vector a is denoted as corresponding to news subevent
Keyword feature, above-mentioned vector b is denoted as keyword feature corresponding to history media event.Finally, according to vector a and vector
B calculates the similarity value between above-mentioned news subevent and above-mentioned history media event by following formula 4.
Wherein, in above-mentioned formula 4, cos θ indicates similar between above-mentioned news subevent and above-mentioned history media event
Angle value, θ indicate the angle between vector a and vector b.
It should be noted that the event similarity value between calculated two events is higher, then illustrate the two events
Content be the same event probability it is higher.
For ease of understanding, following to be illustrated citing.
For example, in a specific embodiment, news subevent is denoted as event A, history media event is denoted as thing
The corresponding keyword of part B, event A is " U.S. ", " taxation ", " washing machine ", " solar energy ", and the corresponding keyword of event B is " beauty
State ", " taxation ", " steel products ", " aluminium product ".Therefore, keyword set corresponding to event A and event B is combined into [U.S., sign
Tax, washing machine, solar energy, steel products, aluminium product].Also, in event A, " U.S. ", " taxation ", " washing machine ", " sun
The word frequency that energy ", " steel products ", " aluminium product " are corresponding in turn to is m1、m2、m3、m4、m5、m6;In event B, " U.S. ", " sign
The word frequency that tax ", " washing machine ", " solar energy ", " steel products ", " aluminium product " are corresponding in turn to is n1、n2、n3、n4、n5、n6;Cause
This, it can be deduced that the corresponding vector a of event A is [m1, m2, m3, m4, m5, m6], the corresponding vector b of event B is [n1, n2, n3, n4,
n5, n6];Finally, vector a and vector b then to be substituted into the similarity value between the calculating of formula 4 event A and event B.
In the specific implementation, when calculating each history news in the news subevent and history media event subclass
Between event similarity value after, each event similarity value is compared with preset threshold, if some event similarity
Value is greater than or equal to preset threshold, then history media event corresponding to the event similarity value is new as the history found
News event.
In some cases, it is possible that there are multiple event similarity values be greater than or equal to preset threshold the case where,
At this moment, it can choose the maximum history media event of event similarity value as the history news subevent found.
When clustering at step 104 to the news collected in preset time period, multiple news may be clustered out
Event, also, the history media event subclass corresponding to each news subevent includes one or more history news things
Part, if by serial mode calculate in each news subevent and history media event subclass each history media event it
Between event similarity value, take a long time.Therefore, it in order to shorten the time-consuming of the above process, can adopt in the embodiment of the present application
Event similarity value is calculated with the mode of multi-threading parallel process.
Therefore, above calculate separately each history media event in the news subevent and history media event subclass it
Between event similarity value, specifically include following process:
Each of each news subevent and history media event subclass are calculated by multiple similarity calculation threads
Event similarity value between history media event;Wherein, multiple similarity thread parallels execute.
For example, cluster obtains news subevent A and news subevent B, having with news subevent A for filtering out is identical
In the history media event subclass of keyword include history media event 1, history media event 2 and history media event 3,
Filter out with news subevent B have same keyword history media event subclass in include history media event 4 and
History media event 5.Event similarity between determining and news subevent A reaches the history media event of preset threshold
When, need to calculate news subevent A respectively between history media event 1, history media event 2 and history media event 3
Event similarity value, when the event similarity between determining and news subevent B reaches the history media event of preset threshold,
Need to calculate event similarity value of the news subevent B respectively between history media event 4 and history media event 5.
If being handled by single thread, then 5 event similarity values of serial computing are needed, if passing through multiple a phases
Above-mentioned event similarity value is calculated like degree computational threads, then multiple similarity calculation thread parallels calculate, a similarity calculation
Thread calculates an event similarity value.If so, the number of similarity calculation thread is 5, it is denoted as similarity calculation line respectively
Journey 1, similarity calculation thread 2, similarity calculation thread 3, similarity calculation thread 4 and similarity calculation thread 5 are calculating thing
When part similarity value, the event phase between the calculating news subevent A of similarity calculation thread 1 and history media event 1 can be
Like angle value, similarity calculation thread 2 calculates the event similarity value between news subevent A and history media event 2, with such
It pushes away, then only needs 1/5 original time that can complete the calculating of event similarity value, it is similar to substantially reduce calculating event
The time-consuming of degree.
It should be noted that each similarity calculation thread can read news for not calculating event similarity value at random
Event and history media event are calculated;The news for not calculating event similarity value can also be successively read according to setting rule
Subevent and history media event are calculated.
In the embodiment of the present application, new by multiple similarity calculation thread parallel parallel computation news subevents and history
Event similarity value in news event subclass between each history media event, substantially reduce calculating event similarity value this
The time-consuming of one step, and then the generation time of media event is shortened, to further ensure the real-time of media event.
Step 108, the news for belonging to the news subevent is merged to the history media event found, after being merged
Media event.
Specifically, the news for belonging to the news subevent to be incorporated into the history news found in above-mentioned steps 108
The news for belonging to the news subevent is actually added under the history media event found by event.
For example, in a specific embodiment, above-mentioned news subevent includes that news 1, news 2 and news 3 three are new
It hears, certain history media event includes 7 four news 4, news 5, news 6 and news news, if determining the sub- thing of above-mentioned news
When event similarity between part and above-mentioned history media event reaches given threshold, then simultaneously by news 1, news 2 and news 3
Enter under above-mentioned history media event, the media event after merging includes news 1, news 2, news 3, news 4, news 5, news 6
With totally seven news of news 7.
In addition, in the embodiment of the present application, the news belonged under above-mentioned news subevent is being incorporated into going through of finding
After in history media event, news corresponding to the media event after merging is changed, in order to guarantee the news thing of storage
The accuracy of the mapping relations of the corresponding news of part, after having executed above-mentioned steps 108, the embodiment of the present application is provided
Method further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
Continue to use the example above, before merging, is stored with above-mentioned history media event ID and news 4, news 5, news 6 and new
The mapping relations between 7 four news are heard, after news corresponding to above-mentioned news subevent is incorporated to the history media event,
Above-mentioned news 1, news 2 and news 3 also belong to the history media event, therefore, it is necessary to more new historical media event ID with belong to
Mapping relations between the news of the history media event, storing after update is history news ID and news 1, news 2, news
3, the mapping relations between news 4, news 5, news 6 and news 7.
Specifically, the mapping relations writing events between media event and the news for belonging to the media event can be delayed
It deposits, is stored in the form of key-value (Key-Value), Key can be the ID of media event, and Value can be the media event
The ID of corresponding news.
In the embodiment of the present application, merge with newly generated news subevent with the continuous of history media event, each
News corresponding to media event is constantly increasing, by constantly updating the mapping relations between media event and news,
Improve the accuracy of the mapping relations between the media event of storage and news.
In addition, in the embodiment of the present application, for the ease of subsequent history media event and newly generated new of continuing
The merging between subevent is heard, after having executed above-mentioned steps 108, it is also necessary to the history stored in more new historical news library
The mapping relations of media event ID and keyword.
Specifically, can be incited somebody to action in more new historical news library when the mapping relations of the history media event ID and keyword
Keyword of the keyword that above-mentioned news subevent and history media event all have as the media event after merging.
The above process for ease of understanding, it is following to be illustrated citing.
For example, the news subevent clustered is to declare with U.S. government when clustering to the news of collection
Cloth " is taken respectively import leviathan and photovoltaic products the global safeguard of 4 years by a definite date and 3 years, and is imposed respectively most
The relevant news subevent of tariff of the high tax rate up to 30% and 50% ", the event id of the news subevent are 4569865.
Wherein, the representative news for constituting the news subevent is as shown in table 2.
Keyword corresponding to above-mentioned news subevent is extracted, the keyword extracted is " U.S. ", " trade ", " levies
Tax ", " import ", " washing machine " and " solar energy ".It is not being found between the news subevent in history media event set
Event similarity value reach the history media event of preset threshold therefore can be using the news subevent as newly-increased news
The mapping relations of the ID of the news subevent and keyword are stored in history news library by event.Wherein, one kind is possible deposits
Storage form is as follows:
U.S.'s --- --- 4569865
Trade --- --- 4569865
Taxation --- --- 4569865
Import --- --- 4569865
Washing machine --- --- 4569865
Solar energy --- --- 4569865
And then when clustering to the news newly collected, the news subevent clustered is and US President
Announce that " 10% tariff being imposed to Imported Aluminium product by the tariff of import steel products collection 25% in the U.S. " is related at the White House
News subevent, the event id of the news subevent is 5789123.
Table 2
Wherein, the representative news for constituting the news subevent is as described in Table 3.
Keyword corresponding to above-mentioned news subevent is extracted, the keyword extracted is " trade war ", " U.S. ", " levies
Tax ", " import ", " steel ", " aluminium product ".The news subevent is found in history media event set with ID is
Event similarity between 4569865 history media event reaches preset threshold, therefore, can be by the news subevent and ID
Merge for 4569865 history media event.
The ID of the media event obtained after merging is denoted as 4569865, and the news under the media event includes news D1, new
Hear D2, news D3, news D4, news D5, news D6, news D7, news D8 and news D9.Media event institute after merging is right
The keyword answered are as follows: " U.S. ", " trade war ", " US President ", " import ", " taxation ", it should by what is stored in history news library
Media event and the mapping relations of keyword are updated, wherein updated mapping relations are as follows:
U.S.'s --- --- 4569865
Trade war --- --- 4569865
US President's --- --- 4569865
Import --- --- 4569865
Taxation --- --- 4569865
Table 3
In addition, in the embodiment of the present application, after the media event after being merged, it is also necessary to the news thing after merging
Part is further edited, so that in the news under the media event, after meeting the publication condition of Special Topics in Journalism, publication should
Special Topics in Journalism.Therefore, after having executed above-mentioned steps 108, method provided by the embodiments of the present application further includes following steps one, step
Rapid two and step 3;
Step 1: judging whether the media event after merging meets the condition as Special Topics in Journalism publication;If so, executing
Step 2;
Step 2: determining the event information of the media event after merging;Wherein, which includes the news after merging
The representative news of the temperature information of media event after the identification information of event, merging and the media event after merging;
Step 3: by after merging media event and corresponding temporal information be sent to News demand side.
It should be noted that in the above step 1, judging whether the media event after merging meets Special Topics in Journalism publication
Condition, whether the quantity of mainly news corresponding to the media event after combining data detection reach present count magnitude, this is default
The minimum for the news quantity that quantitative value can require for Special Topics in Journalism.
Specifically, the mark of above-mentioned media event includes the uniform resource locator of media event in above-mentioned steps two
The information such as the version number of (Uniform Resource Locator, URL), the title of media event, media event.
In addition, at the beginning of above-mentioned event information further includes media event.
Specifically, can be calculated when determining the representative news of the media event after merging media event and each news it
Between similarity value, choose similarity value be greater than or equal to default similarity value news it is new as the representativeness of the media event
It hears.If default similarity value is all larger than or be equal to there are multiple similarity values, then the highest news of similarity value is chosen
Representative news as the media event.If in addition, there is no the news that similarity value is greater than or equal to default similarity value,
It then chooses the news occurred recently and media is given a mark representative news of the higher news as the media event.
It can be carried out according to comment amount, transfer amount, amount of reading of media event etc. when calculating the hot value of media event
It calculates.Specifically, comment amount, transfer amount, weight corresponding to amount of reading can be preset, comment amount, forwarding are then calculated
Amount, amount of reading distinguish the product of corresponding weight, by the temperature for being determined as the media event with value of above three product
Value.
Above-mentioned News demand side is generally referred to as the number of site that can be released news.
Specifically, in the embodiment of the present application, when being clustered to news, generally by being set on server
Cluster service execution, news corresponding to news subevent is merged into and reaches default with the event similarity between it
The history media event of threshold value is usually to be executed by the event merging module that is set on server, media event and belongs to this
Mapping between the news of media event is typically stored in the event buffer being set on server, by being set on server
Event analysis service event is analyzed, to determine the event information of the media event after merging, and by the news of generation
Special topic is sent to party in request's (news briefing application program) and is issued.
Fig. 2 shows a kind of media event generations of the generation method in the embodiment of the present application for executing the media event
The concrete structure schematic diagram of server, server shown in Fig. 2 include that cluster (cluster) services 201, event merging module
202, event buffer 203 and event analysis service 204.
Specifically, the news data crawled from each news website is sent to cluster service by MQ by news crawler
201, the news after 201 pairs of accumulation, one section of event is serviced by cluster clusters, and gathers and constitutes a news for a kind of news
Subevent;Then, the news subevent that cluster obtains is sent to event merging module 202, event by cluster service 201
Merging module 202 by the news subevent and with the event similarity between it reach the history news subevent of preset threshold into
Row matching, the media event after being merged;The news pair that the media event after merging is corresponding of event merging module 202
The writing events caching 203 answered, so that event buffer 203 stores the mapping of the corresponding news of the media event after merging
Relationship;In addition, event merging module 202 is also sent the news subevent newly increased in the media event after merging by MQ
To event analysis service 204, so that 204 pairs of the event analysis service news subevents newly increased are analyzed, and entire new
After news event meets the publication condition of Special Topics in Journalism, which is sent to each party in request.
In addition, in the embodiment of the present application, server is general by being set to service during generating media event
Each sub thread on device executes each step, to generate media event.Fig. 3, which is shown, passes through setting in the embodiment of the present application
A kind of flow diagram of media event is generated in each sub thread that media event generates server.
In Fig. 3, the distribution of the pipeline page is for collecting news data from each news website, and by the news number of collection
According to page cache is sent to, news data is stored by page cache;Then distributing sub thread by the page will deposit in page cache
The news data of storage is sent to pages clusters sub thread, is clustered by pages clusters sub thread to the news received, generates
News subevent;The news subevent of generation is sent to event and merges sub thread by pages clusters sub thread, merges son by event
Thread carries out the news subevent received and the history media event for reaching preset threshold with the event similarity between it
Merge, the media event after being merged;One side event merges sub thread and the media event obtained after merging is passed through event
It sends sub thread and is sent to event analysis module, judge whether the media event obtained after the merging can be with by event analysis module
It is issued as Special Topics in Journalism, and after determining that can be used as Special Topics in Journalism issues, determines the event information of the media event,
Then the media event and event information News demand side is sent to issue;Another aspect event, which will merge sub thread, to close
The media event writing events caching obtained after and, also, the expired events in event buffer are carried out clearly by cleaning sub thread
Reason.
In event buffer, media event is stored in the form of Key-Value and corresponding news, above-mentioned Key can be
Media event ID, Value can be news ID corresponding with the media event.
Specifically, in the embodiment of the present application, it is believed that the time span generated between time gap current time reaches
Media event to setting time length is expired events.The above-mentioned generation time refers to the generation time of media event.
Fig. 4 shows the second method flow chart of the generation method of media event provided by the embodiments of the present application, Fig. 4 institute
The method shown includes at least following steps:
Step 402, news data is crawled from each news website by news crawler, and the news data crawled is sent to
Cluster service.
Wherein, above-mentioned news data includes the company of the news content of each news, the affiliated type of news, News Publishing Web
It connects and the publication medium of news etc..
Step 404, cluster service clusters the news of accumulation preset duration, gathers and constitutes one for a kind of news
A news subevent.
Wherein, belong to of a sort news in the news that above-mentioned news subevent is used to describe to crawl.
Step 406, keyword corresponding to each news subevent is extracted.
Specifically, keyword corresponding to news subevent can be extracted by TD-IDF algorithm.
Step 408, for above-mentioned each news subevent, screening and the news subevent from history media event set
History media event subclass with same keyword.
Step 410, each history media event in history media event subclass and the news subevent are calculated separately
Between event similarity value.
Step 412, the news subevent and event similarity value are greater than or equal to the history media event of preset threshold
It merges.
It, can be above or equal to the similar of preset threshold if there are multiple similarity values to be greater than or equal to preset threshold
The maximum history media event of similarity value is chosen in angle value is determined as the history media event to match with the news subevent.
In step 412, which is merged with the history media event to match, can actually think by
News corresponding to the news subevent is added under history media event.
Step 414, the media event writing events corresponding with the news for belonging to the media event after merging are cached.
Specifically, storing media event and corresponding news, above-mentioned Key in the form of Key-Value in event buffer
It can be media event ID, Value can be news ID corresponding with the media event.
Step 416, judge whether the media event after merging meets the condition of Special Topics in Journalism publication;If so, executing step
Rapid 418;Otherwise, terminate.
Step 418, event information corresponding to the media event after merging is determined, by media event after merging and right
The event information answered is sent to News demand side.
Each step in embodiment corresponding to the specific implementation process of each step and Fig. 1 to Fig. 3 in embodiment corresponding to Fig. 4
Rapid specific implementation process is identical, and therefore, the specific implementation process of each step can refer to Fig. 1 extremely in embodiment corresponding to Fig. 4
Embodiment corresponding to Fig. 3, details are not described herein again.
The generation method of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time
Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach
History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by
In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time
It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event
Real-time.
Corresponding above-mentioned method, the embodiment of the present application also provides a kind of generating means of media event, for executing sheet
Apply for the generation method of media event provided by embodiment, Fig. 5 is that the generation of media event provided by the embodiments of the present application fills
The module composition schematic diagram set, device shown in fig. 5, comprising:
Module 501 is obtained, for obtaining the news data collected in preset time period;Wherein, it is wrapped in above-mentioned news data
Include news content corresponding to each news of collection;
Cluster module 502 clusters the news of collection for the news content according to corresponding to above-mentioned each news;
Wherein, gather and constitute a news subevent for a kind of news;What the news subevent was used to describe to collect in preset time period
Belong to of a sort news in news;
Searching module 503 is searched new with this for being directed to each above-mentioned news subevent from history media event set
Hear the history media event that the event similarity between subevent reaches preset threshold;Wherein, above-mentioned event similarity is based on upper
The event keyword feature for stating history media event and the news subevent is determined;
Merging module 504 is obtained for the news for belonging to the news subevent to be incorporated into the history media event found
Media event after to merging.
Optionally, above-mentioned searching module 503, comprising:
Extraction unit, for extracting keyword corresponding to each above-mentioned news subevent;
Screening unit is screened and is somebody's turn to do from above-mentioned history media event set for being directed to each above-mentioned news subevent
News subevent has the history media event subclass of same keyword;
First searching unit, for being searched and the news subevent from the above-mentioned history media event subclass filtered out
Between event similarity reach the history media event of preset threshold.
Optionally, above-mentioned searching unit, is specifically used for:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass
Event similarity value;Each event similarity value will be compared with preset threshold, to search the thing for reaching preset threshold
History media event corresponding to part similarity.
Optionally, above-mentioned searching unit, also particularly useful for:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads
In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, device provided by the embodiments of the present application, further includes:
Judgment module, for judging whether the media event after above-mentioned merging meets the condition issued as Special Topics in Journalism;
Determining module, if meeting the condition as Special Topics in Journalism publication for the media event after above-mentioned merging, it is determined that
The event information of media event after above-mentioned merging;Wherein, above-mentioned event information includes the mark of the media event after above-mentioned merging
The temperature information of media event after knowing information, above-mentioned merging and the representative news of the media event after above-mentioned merging;
Sending module, for by after above-mentioned merging media event and corresponding above-mentioned event information be sent to news need
The side of asking.
Optionally, device provided by the embodiments of the present application, further includes:
Update module, the mapping for updating between the media event after merging and the news for belonging to the media event are closed
System.
The generating means of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time
Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach
History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by
In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time
It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event
Real-time.
Further, the generation method based on above-mentioned media event, the embodiment of the present application also provides a kind of media events
Generating device, Fig. 6 be media event provided by the embodiments of the present application generating device structural schematic diagram.
It, can be with as shown in fig. 6, the generating device of media event can generate bigger difference because configuration or performance are different
Including one or more processor 601 and memory 602, one or more are can store in memory 602
Store application program or data.Wherein, memory 602 can be of short duration storage or persistent storage.It is stored in answering for memory 602
It may include one or more modules (diagram is not shown) with program, each module may include the generation to media event
Series of computation machine executable instruction in equipment.Further, processor 601 can be set to communicate with memory 602,
The series of computation machine executable instruction in memory 602 is executed in the generating device of media event.The generation of media event
Equipment can also include one or more power supplys 603, one or more wired or wireless network interfaces 604, one
Or more than one input/output interface 605, one or more keyboards 606 etc..
In a specific embodiment, the generating device of media event includes processor, and memory is stored in memory
Computer program that is upper and can running on the processor, the computer program realize above-mentioned news thing when being executed by processor
Each process of the generation method embodiment of part, specifically includes the following steps:
Obtain the news data collected in preset time period;It wherein, include each news collected in above-mentioned news data
Corresponding news content;
The news of collection is clustered according to news content corresponding to above-mentioned each news;Wherein, it is a kind of for gathering
News constitutes a news subevent;Belong to same class in the news that the news subevent is used to describe to collect in preset time period
News;
For each above-mentioned news subevent, from the thing searched in history media event set between the news subevent
Part similarity reaches the history media event of preset threshold;Wherein, above-mentioned event similarity be based on above-mentioned history media event and
The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged
Part.
Optionally, computer executable instructions are when executed, above-mentioned for each above-mentioned news subevent, new from history
It hears and searches the history media event that the event similarity between the news subevent reaches preset threshold in event sets, packet
It includes:
Extract keyword corresponding to each above-mentioned news subevent;
For each above-mentioned news subevent, screening from above-mentioned history media event set has with the news subevent
The history media event subclass of same keyword;
From the event similarity searched in the above-mentioned history media event subclass filtered out between the news subevent
Reach the history media event of preset threshold.
Optionally, computer executable instructions are when executed, above-mentioned from the above-mentioned history media event subset filtered out
The history media event that the event similarity between the news subevent reaches preset threshold is searched in conjunction, comprising:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass
Event similarity value;
Each event similarity value is compared with the preset threshold, reaches the preset threshold to search
History media event corresponding to event similarity value.
Optionally, computer executable instructions when executed, calculate separately the news subevent and above-mentioned history news
The event similarity value between each history media event in event subclass, comprising:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads
In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, computer executable instructions are when executed, above-mentioned to be incorporated into the news for belonging to the news subevent
The history media event found, after the media event after being merged, the above method further include:
Whether the media event after judging above-mentioned merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after above-mentioned merging;Wherein, above-mentioned event information includes above-mentioned conjunction
The temperature information and the news thing after above-mentioned merging of media event after the identification information of media event after and, above-mentioned merging
The representative news of part;
By after above-mentioned merging media event and corresponding above-mentioned event information be sent to News demand side.
Optionally, computer executable instructions are when executed, above-mentioned to be incorporated into the news for belonging to the news subevent
The history media event found, after the media event after being merged, the above method further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
The generating device of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time
Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach
History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by
In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time
It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event
Real-time.
Further, the generation method based on above-mentioned media event, the embodiment of the present application also provides a kind of storage Jie
Matter, for storing computer executable instructions, in a kind of specific embodiment, which can be USB flash disk, CD, hard disk
Computer executable instructions Deng the storage of, the storage medium when being executed by processor, be able to achieve it is following be applied to main program and
Process between target program:
Obtain the news data collected in preset time period;It wherein, include each news collected in above-mentioned news data
Corresponding news content;
The news of collection is clustered according to news content corresponding to above-mentioned each news;Wherein, it is a kind of for gathering
News constitutes a news subevent;Belong to same class in the news that the news subevent is used to describe to collect in preset time period
News;
For each above-mentioned news subevent, from the thing searched in history media event set between the news subevent
Part similarity reaches the history media event of preset threshold;Wherein, above-mentioned event similarity be based on above-mentioned history media event and
The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged
Part.
Optionally, the computer executable instructions of storage medium storage are above-mentioned for each when being executed by processor
Above-mentioned news subevent reaches default from the event similarity between the news subevent is searched in history media event set
The history media event of threshold value, comprising:
Extract keyword corresponding to each above-mentioned news subevent;
For each above-mentioned news subevent, screening from above-mentioned history media event set has with the news subevent
The history media event subclass of same keyword;
From the event similarity searched in the above-mentioned history media event subclass filtered out between the news subevent
Reach the history media event of preset threshold.
Optionally, the computer executable instructions of storage medium storage are above-mentioned from filtering out when being executed by processor
Above-mentioned history media event subclass in search and the news subevent between event similarity reach going through for preset threshold
History media event, comprising:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass
Event similarity value;
History media event corresponding to the event similarity value for meeting preset condition is determined as and the news subevent
The history media event to match.
Optionally, it is new to calculate separately this when being executed by processor for the computer executable instructions of storage medium storage
Hear the event similarity value between each history media event in subevent and above-mentioned history media event subclass, comprising:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads
In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, the computer executable instructions of storage medium storage are above-mentioned to belong to this when being executed by processor
The news of news subevent is incorporated into the history media event found, after the media event after being merged, the above method
Further include:
Whether the media event after judging above-mentioned merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after above-mentioned merging;Wherein, above-mentioned event information includes above-mentioned conjunction
The temperature information and the news thing after above-mentioned merging of media event after the identification information of media event after and, above-mentioned merging
The representative news of part;
By after above-mentioned merging media event and corresponding above-mentioned event information be sent to News demand side.
Optionally, the computer executable instructions of storage medium storage are above-mentioned to belong to this when being executed by processor
The news of news subevent is incorporated into the history media event found, after the media event after being merged, the above method
Further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
The computer executable instructions of storage medium storage provided by the embodiments of the present application pass through when being executed by processor
The news collected in a period of time is clustered, the news subevent one by one in this time is generated;Then, then and with this
The history media event that event similarity between news subevent reaches given threshold merges, by the news thing after merging
Part is as the media event generated.In this way, due to the merging for constantly carrying out media event, every time when generating media event, only
It needs the news to kainogenesis in a period of time to cluster, reduces the workload of cluster, shorten and generate news thing
The time-consuming of part, to ensure that the real-time of media event.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.
Claims (12)
1. a kind of generation method of media event characterized by comprising
Obtain the news data collected in preset time period;Wherein, each news institute in the news data including collecting is right
The news content answered;
The news of collection is clustered according to news content corresponding to each news;Wherein, gather for a kind of news
Constitute a news subevent;Belong in the news that the news subevent is used to describe to collect in the preset time period same
The news of class;
For each news subevent, searched and the event phase between the news subevent from history media event set
Reach the history media event of preset threshold like degree;Wherein, the event similarity is based on the history media event and this is new
The event keyword feature for hearing subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged
Part.
2. the method as described in claim 1, which is characterized in that it is described to be directed to each news subevent, from history news
The history media event that the event similarity between the news subevent reaches preset threshold is searched in event sets, comprising:
Extract keyword corresponding to each news subevent;
For each news subevent, screening has identical with the news subevent from the history media event set
The history media event subclass of keyword;
Reach from the event similarity between the news subevent is searched in the history media event subclass filtered out
The history media event of the preset threshold.
3. method according to claim 2, which is characterized in that described from the history media event subclass filtered out
Search the history media event that the event similarity between the news subevent reaches the preset threshold, comprising:
Calculate separately the thing between each history media event in the news subevent and the history media event subclass
Part similarity value;
Each event similarity value is compared with the preset threshold, to search the event for reaching the preset threshold
History media event corresponding to similarity value.
4. method as claimed in claim 3, which is characterized in that described to calculate separately the news subevent and the history news
The event similarity value between each history media event in event subclass, comprising:
It is calculated in each news subevent and the history media event subclass by multiple similarity calculation threads
Event similarity value between each history media event;Wherein, the multiple similarity calculation thread parallel executes.
5. method according to any of claims 1-4, which is characterized in that described to close the news for belonging to the news subevent
And to the history media event found, after the media event after being merged, the method also includes:
Whether the media event after judging the merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after the merging;Wherein, after the event information includes the merging
The identification information of media event, media event after the merging temperature information and the media event after the merging
Represent news;
By after the merging media event and the corresponding event information be sent to News demand side.
6. method according to any of claims 1-4, which is characterized in that described to close the news for belonging to the news subevent
And to the history media event found, after the media event after being merged, the method also includes:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
7. a kind of generating means of media event characterized by comprising
Module is obtained, for obtaining the news data collected in preset time period;It wherein, include collecting in the news data
News content corresponding to each news;
Cluster module clusters the news of collection for the news content according to corresponding to each news;Wherein, gather
A news subevent is constituted for a kind of news;The news subevent is used for the institute for describing to collect in the preset time period
It states and belongs to of a sort news in news;
Searching module is searched and the sub- thing of the news from history media event set for being directed to each news subevent
Event similarity between part reaches the history media event of preset threshold;Wherein, the event similarity is based on the history
Media event and the event keyword feature of the news subevent are determined;
Merging module is obtained for the news for belonging to the news subevent to be incorporated into the history media event found
Media event after merging.
8. device as claimed in claim 7, which is characterized in that the searching module, comprising:
Extraction unit, for extracting keyword corresponding to each news subevent;
Screening unit, for being directed to each news subevent, screening and the news from the history media event set
Subevent has the history media event subclass of same keyword;
First searching unit, for being searched between the news subevent from the history media event subclass filtered out
Event similarity reach the history media event of the preset threshold.
9. device as claimed in claim 8, which is characterized in that the searching unit is specifically used for:
Calculate separately the thing between each history media event in the news subevent and the history media event subclass
Part similarity value;Each event similarity value is compared with the preset threshold, reaches the default threshold to search
History media event corresponding to the event similarity value of value.
10. device as claimed in claim 9, which is characterized in that the searching unit, also particularly useful for:
It is calculated in each news subevent and the history media event subclass by multiple similarity calculation threads
Event similarity value between each history media event;Wherein, the multiple similarity calculation thread parallel executes.
11. such as the described in any item devices of claim 7-10, which is characterized in that described device further include:
Judgment module, for judging whether the media event after the merging meets the condition issued as Special Topics in Journalism;
Determining module, if meeting the condition as Special Topics in Journalism publication for the media event after the merging, it is determined that described
The event information of media event after merging;Wherein, the event information includes the mark letter of the media event after the merging
The temperature information of media event after breath, the merging and the representative news of the media event after the merging;
Sending module, for by after the merging media event and the corresponding event information be sent to News demand
Side.
12. such as the described in any item devices of claim 7-10, which is characterized in that described device further include:
Update module, for updating the mapping relations between the media event after merging and the news for belonging to the media event.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810938137.XA CN109947935A (en) | 2018-08-17 | 2018-08-17 | The generation method and device of media event |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810938137.XA CN109947935A (en) | 2018-08-17 | 2018-08-17 | The generation method and device of media event |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109947935A true CN109947935A (en) | 2019-06-28 |
Family
ID=67005801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810938137.XA Withdrawn CN109947935A (en) | 2018-08-17 | 2018-08-17 | The generation method and device of media event |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109947935A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990705A (en) * | 2019-12-06 | 2020-04-10 | 腾讯科技(深圳)有限公司 | News processing method, device, equipment and medium |
CN111460289A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | News information pushing method and device |
CN112307366A (en) * | 2020-10-30 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Information display method and device and computer storage medium |
CN113420153A (en) * | 2021-08-23 | 2021-09-21 | 人民网科技(北京)有限公司 | Topic making method, device and equipment based on topic library and event library |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
CN103020251A (en) * | 2012-12-20 | 2013-04-03 | 人民搜索网络股份公司 | Automatic mining system and method of news events in large-scale data |
CN105787095A (en) * | 2016-03-16 | 2016-07-20 | 广州索答信息科技有限公司 | Automatic generation method and device for internet news |
CN106021418A (en) * | 2016-05-13 | 2016-10-12 | 北京奇虎科技有限公司 | News event clustering method and device |
CN107832444A (en) * | 2017-11-21 | 2018-03-23 | 北京百度网讯科技有限公司 | Event based on search daily record finds method and device |
-
2018
- 2018-08-17 CN CN201810938137.XA patent/CN109947935A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
CN103020251A (en) * | 2012-12-20 | 2013-04-03 | 人民搜索网络股份公司 | Automatic mining system and method of news events in large-scale data |
CN105787095A (en) * | 2016-03-16 | 2016-07-20 | 广州索答信息科技有限公司 | Automatic generation method and device for internet news |
CN106021418A (en) * | 2016-05-13 | 2016-10-12 | 北京奇虎科技有限公司 | News event clustering method and device |
CN107832444A (en) * | 2017-11-21 | 2018-03-23 | 北京百度网讯科技有限公司 | Event based on search daily record finds method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990705A (en) * | 2019-12-06 | 2020-04-10 | 腾讯科技(深圳)有限公司 | News processing method, device, equipment and medium |
CN110990705B (en) * | 2019-12-06 | 2024-04-12 | 深圳市雅阅科技有限公司 | News processing method, device, equipment and medium |
CN111460289A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | News information pushing method and device |
CN111460289B (en) * | 2020-03-27 | 2024-03-29 | 北京百度网讯科技有限公司 | News information pushing method and device |
CN112307366A (en) * | 2020-10-30 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Information display method and device and computer storage medium |
CN112307366B (en) * | 2020-10-30 | 2023-09-19 | 抖音视界有限公司 | Information display method and device and computer storage medium |
CN113420153A (en) * | 2021-08-23 | 2021-09-21 | 人民网科技(北京)有限公司 | Topic making method, device and equipment based on topic library and event library |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104182389B (en) | A kind of big data analyzing business intelligence service system based on semanteme | |
Heymann et al. | Can social bookmarking improve web search? | |
CN109947935A (en) | The generation method and device of media event | |
CN103488680A (en) | Combinators to build a search engine | |
CN106383887A (en) | Environment-friendly news data acquisition and recommendation display method and system | |
CN105718590A (en) | Multi-tenant oriented SaaS public opinion monitoring system and method | |
CN104182482B (en) | A kind of news list page determination methods and the method for screening news list page | |
Gossen et al. | iCrawl: Improving the freshness of web collections by integrating social web and focused web crawling | |
Rousseau | A view on big data and its relation to Informetrics | |
Yao et al. | Provenance-based indexing support in micro-blog platforms | |
CN102253939A (en) | Searching method and system based on cloud computing technology | |
Gupta et al. | A review on search engine optimization: Basics | |
CN106021418A (en) | News event clustering method and device | |
Lee et al. | An automatic topic ranking approach for event detection on microblogging messages | |
CN104298669A (en) | Person geographic information mining model based on social network | |
Zobaed et al. | Big Data in the Cloud. | |
Khodaei et al. | Temporal-textual retrieval: Time and keyword search in web documents | |
Kumar et al. | Design of a mobile Web crawler for hidden Web | |
US9092338B1 (en) | Multi-level caching event lookup | |
Hurst et al. | Social streams blog crawler | |
Huang et al. | Design a batched information retrieval system based on a concept-lattice-like structure | |
Di Crescenzo et al. | HERMEVENT: a news collection for emerging-event detection | |
Han et al. | A real-time knowledge extracting system from social big data using distributed architecture | |
Mokbel et al. | Microblogs data management systems: querying, analysis, and visualization | |
Wang et al. | Efficiently identify local frequent keyword co-occurrence patterns in geo-tagged Twitter stream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190628 |
|
WW01 | Invention patent application withdrawn after publication |