CN109947935A - The generation method and device of media event - Google Patents

The generation method and device of media event Download PDF

Info

Publication number
CN109947935A
CN109947935A CN201810938137.XA CN201810938137A CN109947935A CN 109947935 A CN109947935 A CN 109947935A CN 201810938137 A CN201810938137 A CN 201810938137A CN 109947935 A CN109947935 A CN 109947935A
Authority
CN
China
Prior art keywords
news
event
media event
subevent
history
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810938137.XA
Other languages
Chinese (zh)
Inventor
刘利超
石涛
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kylin Seing Network Technology Ltd By Share Ltd
Original Assignee
Kylin Seing Network Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kylin Seing Network Technology Ltd By Share Ltd filed Critical Kylin Seing Network Technology Ltd By Share Ltd
Priority to CN201810938137.XA priority Critical patent/CN109947935A/en
Publication of CN109947935A publication Critical patent/CN109947935A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides the generation method and device of a kind of media event, comprising: obtains the news data collected in preset time period;It include news content corresponding to each news of collection in news data;It is clustered according to news of the news content corresponding to each news to collection;Gather and constitutes a news subevent for a kind of news;Belong to of a sort news in the news that news subevent is used to describe to collect in preset time period;For each news subevent, the history media event of preset threshold is reached from the event similarity searched between the news subevent in history media event set;Event similarity is determined based on the event keyword feature of history media event and the news subevent;The news for belonging to the news subevent is incorporated into the history media event found, the media event after being merged.In the application, reduce the workload of cluster, the time-consuming for generating media event is shortened, to ensure that the real-time of media event.

Description

The generation method and device of media event
Technical field
This application involves field of information processing more particularly to the generation methods and device of a kind of media event.
Background technique
With the rapid development of Internet technology, network becomes each media releasing news information and user obtains news The main channel of information.And due to the open feature of internet, the news information issued on website may be many and diverse unordered, The news information for describing same media event may be dispersed on different websites, be unfavorable for understanding in depth for user.In order to Media event is understood in depth convenient for user, is generally integrated together to by the news under same media event with special topic Form shows user.
In the prior art, it when needing to generate the media event for being directed to some news every time, requires to obtain news front and back Then the news data released news in a very long time carries out clustering to all news data collection of collection, from And it clusters and generates media event.It requires to carry out clustering processing to mass data when in this way, generating media event every time, due to new It is larger to hear data volume, therefore calculation amount is larger when being clustered, and causes to take a long time, to not can guarantee the reality of media event Shi Xing.
Therefore, it is necessary to a kind of technical solution be proposed, to accelerate the formation speed of media event, to guarantee media event Real-time.
Summary of the invention
The purpose of the embodiment of the present application is to provide the generation method and device of a kind of media event, to solve in the prior art When generating media event, calculative data volume is larger, takes a long time, thus not can guarantee the real-time of media event Problem.
In order to solve the above technical problems, the embodiment of the present application is achieved in that
The embodiment of the present application provides a kind of generation method of media event, comprising:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering News constitutes a news subevent;Belong in the news that the news subevent is used to describe to collect in the preset time period Of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged News event.
The application also embodiment provides a kind of generating means of media event, comprising:
Module is obtained, for obtaining the news data collected in preset time period;It wherein, include searching in the news data News content corresponding to each news of collection;
Cluster module clusters the news of collection for the news content according to corresponding to each news;Its In, gather and constitutes a news subevent for a kind of news;The news subevent is for describing to search in the preset time period Belong to of a sort news in the news of collection;
Searching module is searched and the news from history media event set for being directed to each news subevent Event similarity between subevent reaches the history media event of preset threshold;Wherein, the event similarity is based on described History media event and the event keyword feature of the news subevent are determined;
Merging module, for the news for belonging to the news subevent to be incorporated into the history media event found, Media event after being merged.
The embodiment of the present application also provides a kind of generating devices of media event, comprising:
Processor;And it is arranged to the memory of storage computer executable instructions, the computer executable instructions The processor is set to realize following below scheme when executed:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering News constitutes a news subevent;In the news that the news subevent is used to describe to collect in the preset time period Belong to of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged News event.
The embodiment of the present application also provides a kind of storage mediums, for storing computer executable instructions, the computer Executable instruction realizes following below scheme when executed:
Obtain the news data collected in preset time period;It wherein, include each news collected in the news data Corresponding news content;
The news of collection is clustered according to news content corresponding to each news;Wherein, it is a kind of for gathering News constitutes a news subevent;In the news that the news subevent is used to describe to collect in the preset time period Belong to of a sort news;
For each news subevent, from the thing searched in history media event set between the news subevent Part similarity reaches the history media event of preset threshold;Wherein, the event similarity be based on the history media event and The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, it is new after being merged News event.
The generation method and device of media event provided by the embodiments of the present application, by the news collected in a period of time It is clustered, generates the news subevent one by one in this time;Then and the event phase between the news subevent then, It is merged like the history media event that degree reaches given threshold, using the media event after merging as the media event generated. In this way, due to the merging for constantly carrying out media event, every time when generating media event, it is only necessary to kainogenesis in a period of time News clustered, reduce the workload of cluster, shorten generate media event time-consuming, to ensure that news The real-time of event.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the first method flow chart of the generation method of media event provided by the embodiments of the present application;
Fig. 2 is a kind of concrete structure schematic diagram that media event provided by the embodiments of the present application generates server;
Fig. 3 is the flow diagram of media event generation method provided by the embodiments of the present application;
Fig. 4 is the second method flow chart of the generation method of media event provided by the embodiments of the present application;
Fig. 5 is the module diagram of the generating means of media event provided by the embodiments of the present application;
Fig. 6 is the structural schematic diagram of the generating device of media event provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without creative efforts Range.
The thought of the embodiment of the present application is, just gathers to the news collected in this time at interval of a period of time Class obtains the news subevent one by one in this time;It then again will the obtained each news subevent of cluster and new with this The history media event that event similarity between news reaches preset threshold merges, to obtain media event.Due to not The disconnected merging for carrying out media event, every time when generating media event, it is only necessary to which the news of kainogenesis in a period of time is carried out Cluster avoids clustering a large amount of news in some time section, reduces the quantity of the news clustered, thus The cluster spent time is reduced, can guarantee the real-time of media event.Based on this, the embodiment of the present application provides one kind The generation method and device of media event.It is following that the generation side of media event provided by the embodiments of the present application will be discussed in detail respectively Method and device.
It should be noted that multiple of a sort news constitute a media event, when the quantity of news in media event When reaching preset threshold, which can serve as Special Topics in Journalism and is issued by related web site.
The embodiment of the present application provides a kind of generation method of media event, and this method can be applied to server side, i.e., The executing subject of method provided by the embodiments of the present application can be server, specifically, can be set to it is new on server The generating means of news event.
Fig. 1 is the first method flow chart of the generation method of media event provided by the embodiments of the present application, shown in Fig. 1 Method, include at least following steps:
Step 102, the news data collected in preset time period is obtained;It wherein, include collecting in above-mentioned news data News content corresponding to each news.
In the embodiment of the present application, constantly can to crawl the website from each news website by web crawlers currently newest The news data released news.Wherein, the news data of the crawled news of web crawlers may include the news news content, The information such as the issuing time of news, the issuing web site address of news, the publication medium of news and the affiliated type of news.Specifically , above-mentioned news content may include headline and body etc..
In the specific implementation, it is new can to crawl in this time this from each news website at interval of setting duration for news crawler Hear the relevant information of the news of website orientation.For example, news crawler can at interval of 2 minutes from each news website crawl this 2 The relevant information of the news of news website publication in minute.Specifically, news crawler is when each news website crawls news Between the length that is spaced can be configured according to practical application scene, the embodiment of the present application is not to the specific value of time interval It is defined.
In certain embodiments, news crawler can also crawl the newest publication of the news website from each news website in real time The relevant information of news.
It in the embodiment of the present application, may be every if the news crawler frequency that crawls news from each news website is higher The secondary news negligible amounts crawled, need not execute a cluster operation.It therefore, can be new what is crawled in order to save resource Cluster is executed again after hearing accumulation a period of time.
In the specific implementation, caching can be written into the news data that news crawler crawls every time, be clustered when needing to be implemented When operation, then news data is read from caching.
In addition, the news data crawled can be written by message queue (MQ) when caching is written in news data Caching, in this manner it is achieved that decoupling services and solve the problems, such as that peak value is handled.
It should be noted that in some cases, the news data that news crawler crawls from each news website there may be The case where missing data, invalid data.Therefore, it in order to guarantee the accuracy of news data, is written by the news data crawled News data can be read before caching to be cleaned, and caching then is written into the news data after cleaning.
Therefore, in the embodiment of the present application, it is pre- can to obtain this at interval of preset time period for the generating means of media event If the news data collected in the period.Wherein, which can be any for half an hour, one hour or two hours etc. Time span.Specific value in above-mentioned preset time can be configured according to practical application scene, and the embodiment of the present application is simultaneously The specific value of above-mentioned preset time period is not defined.
Step 104, the news content according to corresponding to each news clusters the news of collection;Wherein, gather is one The news of class constitutes a news subevent, belongs in the news which is used to describe to collect in preset time period same A kind of news.
In the embodiment of the present application, the news of collection can be clustered using clustering algorithm.Specifically, can use Hierarchical clustering algorithm clusters news.For example, chain (centroid linkage) clustering algorithm pair of mass center can be used The news of acquisition is clustered.
When clustering using the chain clustering algorithm of mass center to news, following process is specifically included:
It chooses in the news collected and belongs to same type of news, then, each news is classified as a class cluster, a class Only include a news in cluster, calculates the distance between each class cluster and other class clusters, distance is met to the class of preset condition Cluster merges;Then, the distance between the class cluster for recalculating newly generated class cluster and being had been friends in the past will preset item apart from meeting The class cluster of part merges, until preset condition is not satisfied in calculated all distances;It can be with by the above method By same type of news cluster at news subevent one by one.
Wherein, the type of above-mentioned news refers to entertainment news, current political news, social news etc..Belong to same type of News then refers to the identical news of type.In addition, in the detailed process of above-mentioned cluster, each class cluster calculated and its The distance between his class cluster can be the distance of class cluster central point.
Specifically, belonging in the news that above-mentioned news subevent is used to describe to collect in preset time period of a sort new It hears, for example, belonging to of a sort news all is that " U.S. government is outstanding to announce to imported large-scale washing machine and photovoltaic with news subevent Product takes the global safeguard measure of 4 years by a definite date and 3 years respectively, and imposes maximum tariff respectively up to 30% and 50% tariff " Relevant news, the news subevent are then used to describe to belong to this kind of news in the news collected.
Step 106, it for each news subevent, is searched between the news subevent from history media event set Event similarity reach the history media event of preset threshold;Wherein, above-mentioned event similarity be based on history media event and The event keyword feature of the news subevent is determined.
In above-mentioned steps 106, for each news subevent, searched from history media event set and news Event similarity between event reaches the history media event of given threshold, include at least following steps (1), step (2) and Step (3);
Step (1) extracts keyword corresponding to each news subevent;
Step (2) is directed to each news subevent, and screening from history media event set has with the news subevent The history media event subclass of same keyword;
Step (3) is searched and the event phase between the news subevent from the history media event subclass filtered out Reach the history media event of preset threshold like degree.
Wherein, the keyword in above-mentioned steps (1) is the representative word that can characterize the news subevent.For example, if It is news subevent for " on March 10th, 2018, US President's same day announce that the U.S. will impose import steel products at the White House 25% tariff imposes 10% tariff to Imported Aluminium product ", then keyword corresponding to the news subevent can be with are as follows: " beauty State president ", " U.S. ", " import ", " tariff ", " steel ", " aluminium product " etc..
It should be noted that in the embodiment of the present application, can be extracted in news subevent using keyword extraction algorithm Keyword, for example, term frequency-inverse document frequency (Term Frequency-Inverse Document can be used Frequency, TF-IDF) algorithm extract news subevent in keyword.Specific extraction process is as follows:
News content corresponding to news subevent is split into multiple words first, each words is calculated by formula 1 The frequency of appearance, i.e. word frequency (Term Frequency, TF);Then, the inverse document frequency of the news subevent is calculated by formula 2 Rate (Inverse Document Frequency, IDF);Finally, calculating the TF-IDF value of the news subevent by formula 3.
Word frequency=total words number the formula 1 in certain words frequency of occurrence/news subevent
Inverse document frequency=log (total quantity of news subevent/(quantity+1 of the news subevent comprising the word))
Formula 2
TF-IDF value=word frequency * inverse document frequency formula 3
It in the embodiment of the present application, can will be under each news subevent when calculating the TF-IDF value of each words All news are regarded a document as and are handled.Therefore, in above-mentioned formula 1, the frequency of occurrence of certain words is that some words exists The frequency of occurrence in all news under the news subevent.In above-mentioned formula 2, the total quantity of news subevent is referred to The quantity of the news subevent of generation is clustered in above-mentioned steps 104.
It, can be according to TF-IDF value from big to small in calculating the news subevent after the TF-IDF value of each words Sequence chooses keyword of the preset quantity words as the news subevent;Alternatively, TF-IDF value can also be greater than default Keyword of the words of value as the news subevent.
For ease of understanding, following to be illustrated citing.
For example, in a specific embodiment, after being clustered to the news collected in preset time period, obtaining new Hear subevent 1, news subevent 2 and news subevent, 3 three, news subevent, also, news subevent 1 include news 1, it is new 2 and news 3 are heard, news subevent 2 includes news 4 and news 5, and news subevent 3 includes news 6, news 7 and news 8.
When calculating the keyword of news subevent 1, news 1, news 2 and news 3 can be merged into a news text Shelves, are denoted as the corresponding document 1 in news subevent 1, news 4 and news 5 are merged into a news documents, are denoted as news subevent News 6, news 7 and news 8 are merged into a news documents by 2 corresponding documents 2, are denoted as the corresponding document in news subevent 3 3。
Therefore, in calculating news subevent 1 when the word frequency of some words, the words going out in document 1 can be calculated The ratio of the total degree of occurrence number and document 1, using the ratio as the word frequency of the words;Then the inverse of news subevent 1 is calculated The product of the word frequency of the words and inverse document frequency is finally determined as the TF-IDF value of the words by document frequency.
Specifically, being pre-established with history news library in above-mentioned steps (2), it is stored in history news library each The mapping relations of the keyword of history media event and the history media event.Specifically, storing news in history news library When the mapping relations of historical events and keyword, can be used history media event unique encodings (identification, ID the history media event) is characterized.Specifically, one kind of the history media event and keyword stored in history news library is reflected It is as shown in table 1 to penetrate relationship.
Certainly, table 1 is exemplary illustration, does not constitute the number to the history media event stored in history news library The restriction of amount, history media event ID and corresponding keyword.
Table 1
Media event ID Keyword
5669865 The U.S., US President, import, tariff ...
7628921 China, Wenchuan, earthquake ...
8559872 World cup, football ...
In a specific embodiment, pass corresponding to each news subevent is extracted in (1) through the above steps After keyword, by pass corresponding to each history media event in keyword corresponding to each news subevent and history news library Keyword is compared, and judges in history news library with the presence or absence of the history news thing with news subevent with same keyword Part.
Then, with each news subevent will there is the history media event of same keyword to screen respectively, forms One set, becomes history media event subclass.It should be noted that the corresponding history news thing in a news subevent Part subclass.
In addition, in the embodiment of the present application, as long as history media event is identical there are one with some news subevent Keyword is considered as the history media event and the news subevent keyword having the same.
For example, keyword corresponding to some history media event is " U.S., US President, pass in history news library Tax ", clustering keyword corresponding to one of news subevent of generation is " U.S., in emerging, ban ", above-mentioned history news Event and the news subevent have an identical keyword " U.S. ", it is therefore contemplated that the history media event with it is upper News subevent is stated with same keyword.
In the embodiment of the present application, from history media event set search it is similar to the event between news subevent When degree reaches the history media event of preset threshold, first filtering out from history media event set has phase with news subevent With the history media event subclass of keyword, then, then lookup and news subevent phase from history media event subclass Matched history media event.In this way, first passing through the mode of keyword to history media event in such a way that two steps are searched Screened roughly, then from the history media event subclass filtered out search and news subevent between event similarity Reach the history media event of preset threshold, it is possible to reduce search the workload when history media event to match, raising is looked into Look for efficiency.
Specifically, being searched and the sub- thing of the news from the history media event subclass filtered out in above-mentioned steps (3) Event similarity between part reaches the history media event of preset threshold, specifically includes following process:
Calculate separately the thing between each history media event in the news subevent and history media event subclass Part similarity value;Each event similarity value is compared with preset threshold, it is similar with the event that lookup reaches preset threshold History media event corresponding to angle value.
Specifically, cosine similarity, Simhash algorithm or longest common subsequence (Longest can be used Common Subsequence, LSC) algorithm calculates each history news thing in news subevent and history media event subclass Event similarity value between part.
It is following by by taking the method using cosine similarity as an example, calculate between news subevent and history media event in detail Event similarity value detailed process.
It determines first each corresponding to the word frequency and history media event of each keyword corresponding to news subevent The word frequency of a keyword;Keyword corresponding to above-mentioned news subevent and the corresponding keyword of history media event are closed Keyword set is constituted together, is determined according to word frequency of each keyword in above-mentioned keyword set in news subevent new Vector corresponding to subevent is heard, vector a is denoted as, according to the keyword in above-mentioned keyword set in history media event Word frequency determines vector corresponding to history media event, is denoted as vector b;Wherein, above-mentioned vector a is denoted as corresponding to news subevent Keyword feature, above-mentioned vector b is denoted as keyword feature corresponding to history media event.Finally, according to vector a and vector B calculates the similarity value between above-mentioned news subevent and above-mentioned history media event by following formula 4.
Wherein, in above-mentioned formula 4, cos θ indicates similar between above-mentioned news subevent and above-mentioned history media event Angle value, θ indicate the angle between vector a and vector b.
It should be noted that the event similarity value between calculated two events is higher, then illustrate the two events Content be the same event probability it is higher.
For ease of understanding, following to be illustrated citing.
For example, in a specific embodiment, news subevent is denoted as event A, history media event is denoted as thing The corresponding keyword of part B, event A is " U.S. ", " taxation ", " washing machine ", " solar energy ", and the corresponding keyword of event B is " beauty State ", " taxation ", " steel products ", " aluminium product ".Therefore, keyword set corresponding to event A and event B is combined into [U.S., sign Tax, washing machine, solar energy, steel products, aluminium product].Also, in event A, " U.S. ", " taxation ", " washing machine ", " sun The word frequency that energy ", " steel products ", " aluminium product " are corresponding in turn to is m1、m2、m3、m4、m5、m6;In event B, " U.S. ", " sign The word frequency that tax ", " washing machine ", " solar energy ", " steel products ", " aluminium product " are corresponding in turn to is n1、n2、n3、n4、n5、n6;Cause This, it can be deduced that the corresponding vector a of event A is [m1, m2, m3, m4, m5, m6], the corresponding vector b of event B is [n1, n2, n3, n4, n5, n6];Finally, vector a and vector b then to be substituted into the similarity value between the calculating of formula 4 event A and event B.
In the specific implementation, when calculating each history news in the news subevent and history media event subclass Between event similarity value after, each event similarity value is compared with preset threshold, if some event similarity Value is greater than or equal to preset threshold, then history media event corresponding to the event similarity value is new as the history found News event.
In some cases, it is possible that there are multiple event similarity values be greater than or equal to preset threshold the case where, At this moment, it can choose the maximum history media event of event similarity value as the history news subevent found.
When clustering at step 104 to the news collected in preset time period, multiple news may be clustered out Event, also, the history media event subclass corresponding to each news subevent includes one or more history news things Part, if by serial mode calculate in each news subevent and history media event subclass each history media event it Between event similarity value, take a long time.Therefore, it in order to shorten the time-consuming of the above process, can adopt in the embodiment of the present application Event similarity value is calculated with the mode of multi-threading parallel process.
Therefore, above calculate separately each history media event in the news subevent and history media event subclass it Between event similarity value, specifically include following process:
Each of each news subevent and history media event subclass are calculated by multiple similarity calculation threads Event similarity value between history media event;Wherein, multiple similarity thread parallels execute.
For example, cluster obtains news subevent A and news subevent B, having with news subevent A for filtering out is identical In the history media event subclass of keyword include history media event 1, history media event 2 and history media event 3, Filter out with news subevent B have same keyword history media event subclass in include history media event 4 and History media event 5.Event similarity between determining and news subevent A reaches the history media event of preset threshold When, need to calculate news subevent A respectively between history media event 1, history media event 2 and history media event 3 Event similarity value, when the event similarity between determining and news subevent B reaches the history media event of preset threshold, Need to calculate event similarity value of the news subevent B respectively between history media event 4 and history media event 5.
If being handled by single thread, then 5 event similarity values of serial computing are needed, if passing through multiple a phases Above-mentioned event similarity value is calculated like degree computational threads, then multiple similarity calculation thread parallels calculate, a similarity calculation Thread calculates an event similarity value.If so, the number of similarity calculation thread is 5, it is denoted as similarity calculation line respectively Journey 1, similarity calculation thread 2, similarity calculation thread 3, similarity calculation thread 4 and similarity calculation thread 5 are calculating thing When part similarity value, the event phase between the calculating news subevent A of similarity calculation thread 1 and history media event 1 can be Like angle value, similarity calculation thread 2 calculates the event similarity value between news subevent A and history media event 2, with such It pushes away, then only needs 1/5 original time that can complete the calculating of event similarity value, it is similar to substantially reduce calculating event The time-consuming of degree.
It should be noted that each similarity calculation thread can read news for not calculating event similarity value at random Event and history media event are calculated;The news for not calculating event similarity value can also be successively read according to setting rule Subevent and history media event are calculated.
In the embodiment of the present application, new by multiple similarity calculation thread parallel parallel computation news subevents and history Event similarity value in news event subclass between each history media event, substantially reduce calculating event similarity value this The time-consuming of one step, and then the generation time of media event is shortened, to further ensure the real-time of media event.
Step 108, the news for belonging to the news subevent is merged to the history media event found, after being merged Media event.
Specifically, the news for belonging to the news subevent to be incorporated into the history news found in above-mentioned steps 108 The news for belonging to the news subevent is actually added under the history media event found by event.
For example, in a specific embodiment, above-mentioned news subevent includes that news 1, news 2 and news 3 three are new It hears, certain history media event includes 7 four news 4, news 5, news 6 and news news, if determining the sub- thing of above-mentioned news When event similarity between part and above-mentioned history media event reaches given threshold, then simultaneously by news 1, news 2 and news 3 Enter under above-mentioned history media event, the media event after merging includes news 1, news 2, news 3, news 4, news 5, news 6 With totally seven news of news 7.
In addition, in the embodiment of the present application, the news belonged under above-mentioned news subevent is being incorporated into going through of finding After in history media event, news corresponding to the media event after merging is changed, in order to guarantee the news thing of storage The accuracy of the mapping relations of the corresponding news of part, after having executed above-mentioned steps 108, the embodiment of the present application is provided Method further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
Continue to use the example above, before merging, is stored with above-mentioned history media event ID and news 4, news 5, news 6 and new The mapping relations between 7 four news are heard, after news corresponding to above-mentioned news subevent is incorporated to the history media event, Above-mentioned news 1, news 2 and news 3 also belong to the history media event, therefore, it is necessary to more new historical media event ID with belong to Mapping relations between the news of the history media event, storing after update is history news ID and news 1, news 2, news 3, the mapping relations between news 4, news 5, news 6 and news 7.
Specifically, the mapping relations writing events between media event and the news for belonging to the media event can be delayed It deposits, is stored in the form of key-value (Key-Value), Key can be the ID of media event, and Value can be the media event The ID of corresponding news.
In the embodiment of the present application, merge with newly generated news subevent with the continuous of history media event, each News corresponding to media event is constantly increasing, by constantly updating the mapping relations between media event and news, Improve the accuracy of the mapping relations between the media event of storage and news.
In addition, in the embodiment of the present application, for the ease of subsequent history media event and newly generated new of continuing The merging between subevent is heard, after having executed above-mentioned steps 108, it is also necessary to the history stored in more new historical news library The mapping relations of media event ID and keyword.
Specifically, can be incited somebody to action in more new historical news library when the mapping relations of the history media event ID and keyword Keyword of the keyword that above-mentioned news subevent and history media event all have as the media event after merging.
The above process for ease of understanding, it is following to be illustrated citing.
For example, the news subevent clustered is to declare with U.S. government when clustering to the news of collection Cloth " is taken respectively import leviathan and photovoltaic products the global safeguard of 4 years by a definite date and 3 years, and is imposed respectively most The relevant news subevent of tariff of the high tax rate up to 30% and 50% ", the event id of the news subevent are 4569865.
Wherein, the representative news for constituting the news subevent is as shown in table 2.
Keyword corresponding to above-mentioned news subevent is extracted, the keyword extracted is " U.S. ", " trade ", " levies Tax ", " import ", " washing machine " and " solar energy ".It is not being found between the news subevent in history media event set Event similarity value reach the history media event of preset threshold therefore can be using the news subevent as newly-increased news The mapping relations of the ID of the news subevent and keyword are stored in history news library by event.Wherein, one kind is possible deposits Storage form is as follows:
U.S.'s --- --- 4569865
Trade --- --- 4569865
Taxation --- --- 4569865
Import --- --- 4569865
Washing machine --- --- 4569865
Solar energy --- --- 4569865
And then when clustering to the news newly collected, the news subevent clustered is and US President Announce that " 10% tariff being imposed to Imported Aluminium product by the tariff of import steel products collection 25% in the U.S. " is related at the White House News subevent, the event id of the news subevent is 5789123.
Table 2
Wherein, the representative news for constituting the news subevent is as described in Table 3.
Keyword corresponding to above-mentioned news subevent is extracted, the keyword extracted is " trade war ", " U.S. ", " levies Tax ", " import ", " steel ", " aluminium product ".The news subevent is found in history media event set with ID is Event similarity between 4569865 history media event reaches preset threshold, therefore, can be by the news subevent and ID Merge for 4569865 history media event.
The ID of the media event obtained after merging is denoted as 4569865, and the news under the media event includes news D1, new Hear D2, news D3, news D4, news D5, news D6, news D7, news D8 and news D9.Media event institute after merging is right The keyword answered are as follows: " U.S. ", " trade war ", " US President ", " import ", " taxation ", it should by what is stored in history news library Media event and the mapping relations of keyword are updated, wherein updated mapping relations are as follows:
U.S.'s --- --- 4569865
Trade war --- --- 4569865
US President's --- --- 4569865
Import --- --- 4569865
Taxation --- --- 4569865
Table 3
In addition, in the embodiment of the present application, after the media event after being merged, it is also necessary to the news thing after merging Part is further edited, so that in the news under the media event, after meeting the publication condition of Special Topics in Journalism, publication should Special Topics in Journalism.Therefore, after having executed above-mentioned steps 108, method provided by the embodiments of the present application further includes following steps one, step Rapid two and step 3;
Step 1: judging whether the media event after merging meets the condition as Special Topics in Journalism publication;If so, executing Step 2;
Step 2: determining the event information of the media event after merging;Wherein, which includes the news after merging The representative news of the temperature information of media event after the identification information of event, merging and the media event after merging;
Step 3: by after merging media event and corresponding temporal information be sent to News demand side.
It should be noted that in the above step 1, judging whether the media event after merging meets Special Topics in Journalism publication Condition, whether the quantity of mainly news corresponding to the media event after combining data detection reach present count magnitude, this is default The minimum for the news quantity that quantitative value can require for Special Topics in Journalism.
Specifically, the mark of above-mentioned media event includes the uniform resource locator of media event in above-mentioned steps two The information such as the version number of (Uniform Resource Locator, URL), the title of media event, media event.
In addition, at the beginning of above-mentioned event information further includes media event.
Specifically, can be calculated when determining the representative news of the media event after merging media event and each news it Between similarity value, choose similarity value be greater than or equal to default similarity value news it is new as the representativeness of the media event It hears.If default similarity value is all larger than or be equal to there are multiple similarity values, then the highest news of similarity value is chosen Representative news as the media event.If in addition, there is no the news that similarity value is greater than or equal to default similarity value, It then chooses the news occurred recently and media is given a mark representative news of the higher news as the media event.
It can be carried out according to comment amount, transfer amount, amount of reading of media event etc. when calculating the hot value of media event It calculates.Specifically, comment amount, transfer amount, weight corresponding to amount of reading can be preset, comment amount, forwarding are then calculated Amount, amount of reading distinguish the product of corresponding weight, by the temperature for being determined as the media event with value of above three product Value.
Above-mentioned News demand side is generally referred to as the number of site that can be released news.
Specifically, in the embodiment of the present application, when being clustered to news, generally by being set on server Cluster service execution, news corresponding to news subevent is merged into and reaches default with the event similarity between it The history media event of threshold value is usually to be executed by the event merging module that is set on server, media event and belongs to this Mapping between the news of media event is typically stored in the event buffer being set on server, by being set on server Event analysis service event is analyzed, to determine the event information of the media event after merging, and by the news of generation Special topic is sent to party in request's (news briefing application program) and is issued.
Fig. 2 shows a kind of media event generations of the generation method in the embodiment of the present application for executing the media event The concrete structure schematic diagram of server, server shown in Fig. 2 include that cluster (cluster) services 201, event merging module 202, event buffer 203 and event analysis service 204.
Specifically, the news data crawled from each news website is sent to cluster service by MQ by news crawler 201, the news after 201 pairs of accumulation, one section of event is serviced by cluster clusters, and gathers and constitutes a news for a kind of news Subevent;Then, the news subevent that cluster obtains is sent to event merging module 202, event by cluster service 201 Merging module 202 by the news subevent and with the event similarity between it reach the history news subevent of preset threshold into Row matching, the media event after being merged;The news pair that the media event after merging is corresponding of event merging module 202 The writing events caching 203 answered, so that event buffer 203 stores the mapping of the corresponding news of the media event after merging Relationship;In addition, event merging module 202 is also sent the news subevent newly increased in the media event after merging by MQ To event analysis service 204, so that 204 pairs of the event analysis service news subevents newly increased are analyzed, and entire new After news event meets the publication condition of Special Topics in Journalism, which is sent to each party in request.
In addition, in the embodiment of the present application, server is general by being set to service during generating media event Each sub thread on device executes each step, to generate media event.Fig. 3, which is shown, passes through setting in the embodiment of the present application A kind of flow diagram of media event is generated in each sub thread that media event generates server.
In Fig. 3, the distribution of the pipeline page is for collecting news data from each news website, and by the news number of collection According to page cache is sent to, news data is stored by page cache;Then distributing sub thread by the page will deposit in page cache The news data of storage is sent to pages clusters sub thread, is clustered by pages clusters sub thread to the news received, generates News subevent;The news subevent of generation is sent to event and merges sub thread by pages clusters sub thread, merges son by event Thread carries out the news subevent received and the history media event for reaching preset threshold with the event similarity between it Merge, the media event after being merged;One side event merges sub thread and the media event obtained after merging is passed through event It sends sub thread and is sent to event analysis module, judge whether the media event obtained after the merging can be with by event analysis module It is issued as Special Topics in Journalism, and after determining that can be used as Special Topics in Journalism issues, determines the event information of the media event, Then the media event and event information News demand side is sent to issue;Another aspect event, which will merge sub thread, to close The media event writing events caching obtained after and, also, the expired events in event buffer are carried out clearly by cleaning sub thread Reason.
In event buffer, media event is stored in the form of Key-Value and corresponding news, above-mentioned Key can be Media event ID, Value can be news ID corresponding with the media event.
Specifically, in the embodiment of the present application, it is believed that the time span generated between time gap current time reaches Media event to setting time length is expired events.The above-mentioned generation time refers to the generation time of media event.
Fig. 4 shows the second method flow chart of the generation method of media event provided by the embodiments of the present application, Fig. 4 institute The method shown includes at least following steps:
Step 402, news data is crawled from each news website by news crawler, and the news data crawled is sent to Cluster service.
Wherein, above-mentioned news data includes the company of the news content of each news, the affiliated type of news, News Publishing Web It connects and the publication medium of news etc..
Step 404, cluster service clusters the news of accumulation preset duration, gathers and constitutes one for a kind of news A news subevent.
Wherein, belong to of a sort news in the news that above-mentioned news subevent is used to describe to crawl.
Step 406, keyword corresponding to each news subevent is extracted.
Specifically, keyword corresponding to news subevent can be extracted by TD-IDF algorithm.
Step 408, for above-mentioned each news subevent, screening and the news subevent from history media event set History media event subclass with same keyword.
Step 410, each history media event in history media event subclass and the news subevent are calculated separately Between event similarity value.
Step 412, the news subevent and event similarity value are greater than or equal to the history media event of preset threshold It merges.
It, can be above or equal to the similar of preset threshold if there are multiple similarity values to be greater than or equal to preset threshold The maximum history media event of similarity value is chosen in angle value is determined as the history media event to match with the news subevent.
In step 412, which is merged with the history media event to match, can actually think by News corresponding to the news subevent is added under history media event.
Step 414, the media event writing events corresponding with the news for belonging to the media event after merging are cached.
Specifically, storing media event and corresponding news, above-mentioned Key in the form of Key-Value in event buffer It can be media event ID, Value can be news ID corresponding with the media event.
Step 416, judge whether the media event after merging meets the condition of Special Topics in Journalism publication;If so, executing step Rapid 418;Otherwise, terminate.
Step 418, event information corresponding to the media event after merging is determined, by media event after merging and right The event information answered is sent to News demand side.
Each step in embodiment corresponding to the specific implementation process of each step and Fig. 1 to Fig. 3 in embodiment corresponding to Fig. 4 Rapid specific implementation process is identical, and therefore, the specific implementation process of each step can refer to Fig. 1 extremely in embodiment corresponding to Fig. 4 Embodiment corresponding to Fig. 3, details are not described herein again.
The generation method of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event Real-time.
Corresponding above-mentioned method, the embodiment of the present application also provides a kind of generating means of media event, for executing sheet Apply for the generation method of media event provided by embodiment, Fig. 5 is that the generation of media event provided by the embodiments of the present application fills The module composition schematic diagram set, device shown in fig. 5, comprising:
Module 501 is obtained, for obtaining the news data collected in preset time period;Wherein, it is wrapped in above-mentioned news data Include news content corresponding to each news of collection;
Cluster module 502 clusters the news of collection for the news content according to corresponding to above-mentioned each news; Wherein, gather and constitute a news subevent for a kind of news;What the news subevent was used to describe to collect in preset time period Belong to of a sort news in news;
Searching module 503 is searched new with this for being directed to each above-mentioned news subevent from history media event set Hear the history media event that the event similarity between subevent reaches preset threshold;Wherein, above-mentioned event similarity is based on upper The event keyword feature for stating history media event and the news subevent is determined;
Merging module 504 is obtained for the news for belonging to the news subevent to be incorporated into the history media event found Media event after to merging.
Optionally, above-mentioned searching module 503, comprising:
Extraction unit, for extracting keyword corresponding to each above-mentioned news subevent;
Screening unit is screened and is somebody's turn to do from above-mentioned history media event set for being directed to each above-mentioned news subevent News subevent has the history media event subclass of same keyword;
First searching unit, for being searched and the news subevent from the above-mentioned history media event subclass filtered out Between event similarity reach the history media event of preset threshold.
Optionally, above-mentioned searching unit, is specifically used for:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass Event similarity value;Each event similarity value will be compared with preset threshold, to search the thing for reaching preset threshold History media event corresponding to part similarity.
Optionally, above-mentioned searching unit, also particularly useful for:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, device provided by the embodiments of the present application, further includes:
Judgment module, for judging whether the media event after above-mentioned merging meets the condition issued as Special Topics in Journalism;
Determining module, if meeting the condition as Special Topics in Journalism publication for the media event after above-mentioned merging, it is determined that The event information of media event after above-mentioned merging;Wherein, above-mentioned event information includes the mark of the media event after above-mentioned merging The temperature information of media event after knowing information, above-mentioned merging and the representative news of the media event after above-mentioned merging;
Sending module, for by after above-mentioned merging media event and corresponding above-mentioned event information be sent to news need The side of asking.
Optionally, device provided by the embodiments of the present application, further includes:
Update module, the mapping for updating between the media event after merging and the news for belonging to the media event are closed System.
The generating means of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event Real-time.
Further, the generation method based on above-mentioned media event, the embodiment of the present application also provides a kind of media events Generating device, Fig. 6 be media event provided by the embodiments of the present application generating device structural schematic diagram.
It, can be with as shown in fig. 6, the generating device of media event can generate bigger difference because configuration or performance are different Including one or more processor 601 and memory 602, one or more are can store in memory 602 Store application program or data.Wherein, memory 602 can be of short duration storage or persistent storage.It is stored in answering for memory 602 It may include one or more modules (diagram is not shown) with program, each module may include the generation to media event Series of computation machine executable instruction in equipment.Further, processor 601 can be set to communicate with memory 602, The series of computation machine executable instruction in memory 602 is executed in the generating device of media event.The generation of media event Equipment can also include one or more power supplys 603, one or more wired or wireless network interfaces 604, one Or more than one input/output interface 605, one or more keyboards 606 etc..
In a specific embodiment, the generating device of media event includes processor, and memory is stored in memory Computer program that is upper and can running on the processor, the computer program realize above-mentioned news thing when being executed by processor Each process of the generation method embodiment of part, specifically includes the following steps:
Obtain the news data collected in preset time period;It wherein, include each news collected in above-mentioned news data Corresponding news content;
The news of collection is clustered according to news content corresponding to above-mentioned each news;Wherein, it is a kind of for gathering News constitutes a news subevent;Belong to same class in the news that the news subevent is used to describe to collect in preset time period News;
For each above-mentioned news subevent, from the thing searched in history media event set between the news subevent Part similarity reaches the history media event of preset threshold;Wherein, above-mentioned event similarity be based on above-mentioned history media event and The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged Part.
Optionally, computer executable instructions are when executed, above-mentioned for each above-mentioned news subevent, new from history It hears and searches the history media event that the event similarity between the news subevent reaches preset threshold in event sets, packet It includes:
Extract keyword corresponding to each above-mentioned news subevent;
For each above-mentioned news subevent, screening from above-mentioned history media event set has with the news subevent The history media event subclass of same keyword;
From the event similarity searched in the above-mentioned history media event subclass filtered out between the news subevent Reach the history media event of preset threshold.
Optionally, computer executable instructions are when executed, above-mentioned from the above-mentioned history media event subset filtered out The history media event that the event similarity between the news subevent reaches preset threshold is searched in conjunction, comprising:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass Event similarity value;
Each event similarity value is compared with the preset threshold, reaches the preset threshold to search History media event corresponding to event similarity value.
Optionally, computer executable instructions when executed, calculate separately the news subevent and above-mentioned history news The event similarity value between each history media event in event subclass, comprising:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, computer executable instructions are when executed, above-mentioned to be incorporated into the news for belonging to the news subevent The history media event found, after the media event after being merged, the above method further include:
Whether the media event after judging above-mentioned merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after above-mentioned merging;Wherein, above-mentioned event information includes above-mentioned conjunction The temperature information and the news thing after above-mentioned merging of media event after the identification information of media event after and, above-mentioned merging The representative news of part;
By after above-mentioned merging media event and corresponding above-mentioned event information be sent to News demand side.
Optionally, computer executable instructions are when executed, above-mentioned to be incorporated into the news for belonging to the news subevent The history media event found, after the media event after being merged, the above method further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
The generating device of media event provided by the embodiments of the present application, by gathering to the news collected in a period of time Class generates the news subevent one by one in this time;Then, then with the event similarity between the news subevent reach History media event to given threshold merges, using the media event after merging as the media event generated.In this way, by In the merging for constantly carrying out media event, every time when generating media event, it is only necessary to the news of kainogenesis in a period of time It is clustered, reduces the workload of cluster, the time-consuming for generating media event is shortened, to ensure that media event Real-time.
Further, the generation method based on above-mentioned media event, the embodiment of the present application also provides a kind of storage Jie Matter, for storing computer executable instructions, in a kind of specific embodiment, which can be USB flash disk, CD, hard disk Computer executable instructions Deng the storage of, the storage medium when being executed by processor, be able to achieve it is following be applied to main program and Process between target program:
Obtain the news data collected in preset time period;It wherein, include each news collected in above-mentioned news data Corresponding news content;
The news of collection is clustered according to news content corresponding to above-mentioned each news;Wherein, it is a kind of for gathering News constitutes a news subevent;Belong to same class in the news that the news subevent is used to describe to collect in preset time period News;
For each above-mentioned news subevent, from the thing searched in history media event set between the news subevent Part similarity reaches the history media event of preset threshold;Wherein, above-mentioned event similarity be based on above-mentioned history media event and The event keyword feature of the news subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged Part.
Optionally, the computer executable instructions of storage medium storage are above-mentioned for each when being executed by processor Above-mentioned news subevent reaches default from the event similarity between the news subevent is searched in history media event set The history media event of threshold value, comprising:
Extract keyword corresponding to each above-mentioned news subevent;
For each above-mentioned news subevent, screening from above-mentioned history media event set has with the news subevent The history media event subclass of same keyword;
From the event similarity searched in the above-mentioned history media event subclass filtered out between the news subevent Reach the history media event of preset threshold.
Optionally, the computer executable instructions of storage medium storage are above-mentioned from filtering out when being executed by processor Above-mentioned history media event subclass in search and the news subevent between event similarity reach going through for preset threshold History media event, comprising:
It calculates separately between each history media event in the news subevent and above-mentioned history media event subclass Event similarity value;
History media event corresponding to the event similarity value for meeting preset condition is determined as and the news subevent The history media event to match.
Optionally, it is new to calculate separately this when being executed by processor for the computer executable instructions of storage medium storage Hear the event similarity value between each history media event in subevent and above-mentioned history media event subclass, comprising:
Each above-mentioned news subevent and above-mentioned history media event subclass are calculated by multiple similarity calculation threads In each history media event between event similarity value;Wherein, above-mentioned multiple similarity calculation thread parallels execute.
Optionally, the computer executable instructions of storage medium storage are above-mentioned to belong to this when being executed by processor The news of news subevent is incorporated into the history media event found, after the media event after being merged, the above method Further include:
Whether the media event after judging above-mentioned merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after above-mentioned merging;Wherein, above-mentioned event information includes above-mentioned conjunction The temperature information and the news thing after above-mentioned merging of media event after the identification information of media event after and, above-mentioned merging The representative news of part;
By after above-mentioned merging media event and corresponding above-mentioned event information be sent to News demand side.
Optionally, the computer executable instructions of storage medium storage are above-mentioned to belong to this when being executed by processor The news of news subevent is incorporated into the history media event found, after the media event after being merged, the above method Further include:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
The computer executable instructions of storage medium storage provided by the embodiments of the present application pass through when being executed by processor The news collected in a period of time is clustered, the news subevent one by one in this time is generated;Then, then and with this The history media event that event similarity between news subevent reaches given threshold merges, by the news thing after merging Part is as the media event generated.In this way, due to the merging for constantly carrying out media event, every time when generating media event, only It needs the news to kainogenesis in a period of time to cluster, reduces the workload of cluster, shorten and generate news thing The time-consuming of part, to ensure that the real-time of media event.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims (12)

1. a kind of generation method of media event characterized by comprising
Obtain the news data collected in preset time period;Wherein, each news institute in the news data including collecting is right The news content answered;
The news of collection is clustered according to news content corresponding to each news;Wherein, gather for a kind of news Constitute a news subevent;Belong in the news that the news subevent is used to describe to collect in the preset time period same The news of class;
For each news subevent, searched and the event phase between the news subevent from history media event set Reach the history media event of preset threshold like degree;Wherein, the event similarity is based on the history media event and this is new The event keyword feature for hearing subevent is determined;
The news for belonging to the news subevent is incorporated into the history media event found, the news thing after being merged Part.
2. the method as described in claim 1, which is characterized in that it is described to be directed to each news subevent, from history news The history media event that the event similarity between the news subevent reaches preset threshold is searched in event sets, comprising:
Extract keyword corresponding to each news subevent;
For each news subevent, screening has identical with the news subevent from the history media event set The history media event subclass of keyword;
Reach from the event similarity between the news subevent is searched in the history media event subclass filtered out The history media event of the preset threshold.
3. method according to claim 2, which is characterized in that described from the history media event subclass filtered out Search the history media event that the event similarity between the news subevent reaches the preset threshold, comprising:
Calculate separately the thing between each history media event in the news subevent and the history media event subclass Part similarity value;
Each event similarity value is compared with the preset threshold, to search the event for reaching the preset threshold History media event corresponding to similarity value.
4. method as claimed in claim 3, which is characterized in that described to calculate separately the news subevent and the history news The event similarity value between each history media event in event subclass, comprising:
It is calculated in each news subevent and the history media event subclass by multiple similarity calculation threads Event similarity value between each history media event;Wherein, the multiple similarity calculation thread parallel executes.
5. method according to any of claims 1-4, which is characterized in that described to close the news for belonging to the news subevent And to the history media event found, after the media event after being merged, the method also includes:
Whether the media event after judging the merging meets the condition as Special Topics in Journalism publication;
If so, determining the event information of the media event after the merging;Wherein, after the event information includes the merging The identification information of media event, media event after the merging temperature information and the media event after the merging Represent news;
By after the merging media event and the corresponding event information be sent to News demand side.
6. method according to any of claims 1-4, which is characterized in that described to close the news for belonging to the news subevent And to the history media event found, after the media event after being merged, the method also includes:
Update the mapping relations between the media event after merging and the news for belonging to the media event.
7. a kind of generating means of media event characterized by comprising
Module is obtained, for obtaining the news data collected in preset time period;It wherein, include collecting in the news data News content corresponding to each news;
Cluster module clusters the news of collection for the news content according to corresponding to each news;Wherein, gather A news subevent is constituted for a kind of news;The news subevent is used for the institute for describing to collect in the preset time period It states and belongs to of a sort news in news;
Searching module is searched and the sub- thing of the news from history media event set for being directed to each news subevent Event similarity between part reaches the history media event of preset threshold;Wherein, the event similarity is based on the history Media event and the event keyword feature of the news subevent are determined;
Merging module is obtained for the news for belonging to the news subevent to be incorporated into the history media event found Media event after merging.
8. device as claimed in claim 7, which is characterized in that the searching module, comprising:
Extraction unit, for extracting keyword corresponding to each news subevent;
Screening unit, for being directed to each news subevent, screening and the news from the history media event set Subevent has the history media event subclass of same keyword;
First searching unit, for being searched between the news subevent from the history media event subclass filtered out Event similarity reach the history media event of the preset threshold.
9. device as claimed in claim 8, which is characterized in that the searching unit is specifically used for:
Calculate separately the thing between each history media event in the news subevent and the history media event subclass Part similarity value;Each event similarity value is compared with the preset threshold, reaches the default threshold to search History media event corresponding to the event similarity value of value.
10. device as claimed in claim 9, which is characterized in that the searching unit, also particularly useful for:
It is calculated in each news subevent and the history media event subclass by multiple similarity calculation threads Event similarity value between each history media event;Wherein, the multiple similarity calculation thread parallel executes.
11. such as the described in any item devices of claim 7-10, which is characterized in that described device further include:
Judgment module, for judging whether the media event after the merging meets the condition issued as Special Topics in Journalism;
Determining module, if meeting the condition as Special Topics in Journalism publication for the media event after the merging, it is determined that described The event information of media event after merging;Wherein, the event information includes the mark letter of the media event after the merging The temperature information of media event after breath, the merging and the representative news of the media event after the merging;
Sending module, for by after the merging media event and the corresponding event information be sent to News demand Side.
12. such as the described in any item devices of claim 7-10, which is characterized in that described device further include:
Update module, for updating the mapping relations between the media event after merging and the news for belonging to the media event.
CN201810938137.XA 2018-08-17 2018-08-17 The generation method and device of media event Withdrawn CN109947935A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810938137.XA CN109947935A (en) 2018-08-17 2018-08-17 The generation method and device of media event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810938137.XA CN109947935A (en) 2018-08-17 2018-08-17 The generation method and device of media event

Publications (1)

Publication Number Publication Date
CN109947935A true CN109947935A (en) 2019-06-28

Family

ID=67005801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810938137.XA Withdrawn CN109947935A (en) 2018-08-17 2018-08-17 The generation method and device of media event

Country Status (1)

Country Link
CN (1) CN109947935A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990705A (en) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 News processing method, device, equipment and medium
CN111460289A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 News information pushing method and device
CN112307366A (en) * 2020-10-30 2021-02-02 北京字节跳动网络技术有限公司 Information display method and device and computer storage medium
CN113420153A (en) * 2021-08-23 2021-09-21 人民网科技(北京)有限公司 Topic making method, device and equipment based on topic library and event library

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN103020251A (en) * 2012-12-20 2013-04-03 人民搜索网络股份公司 Automatic mining system and method of news events in large-scale data
CN105787095A (en) * 2016-03-16 2016-07-20 广州索答信息科技有限公司 Automatic generation method and device for internet news
CN106021418A (en) * 2016-05-13 2016-10-12 北京奇虎科技有限公司 News event clustering method and device
CN107832444A (en) * 2017-11-21 2018-03-23 北京百度网讯科技有限公司 Event based on search daily record finds method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN103020251A (en) * 2012-12-20 2013-04-03 人民搜索网络股份公司 Automatic mining system and method of news events in large-scale data
CN105787095A (en) * 2016-03-16 2016-07-20 广州索答信息科技有限公司 Automatic generation method and device for internet news
CN106021418A (en) * 2016-05-13 2016-10-12 北京奇虎科技有限公司 News event clustering method and device
CN107832444A (en) * 2017-11-21 2018-03-23 北京百度网讯科技有限公司 Event based on search daily record finds method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990705A (en) * 2019-12-06 2020-04-10 腾讯科技(深圳)有限公司 News processing method, device, equipment and medium
CN110990705B (en) * 2019-12-06 2024-04-12 深圳市雅阅科技有限公司 News processing method, device, equipment and medium
CN111460289A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 News information pushing method and device
CN111460289B (en) * 2020-03-27 2024-03-29 北京百度网讯科技有限公司 News information pushing method and device
CN112307366A (en) * 2020-10-30 2021-02-02 北京字节跳动网络技术有限公司 Information display method and device and computer storage medium
CN112307366B (en) * 2020-10-30 2023-09-19 抖音视界有限公司 Information display method and device and computer storage medium
CN113420153A (en) * 2021-08-23 2021-09-21 人民网科技(北京)有限公司 Topic making method, device and equipment based on topic library and event library

Similar Documents

Publication Publication Date Title
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
Heymann et al. Can social bookmarking improve web search?
CN109947935A (en) The generation method and device of media event
CN103488680A (en) Combinators to build a search engine
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN105718590A (en) Multi-tenant oriented SaaS public opinion monitoring system and method
CN104182482B (en) A kind of news list page determination methods and the method for screening news list page
Gossen et al. iCrawl: Improving the freshness of web collections by integrating social web and focused web crawling
Rousseau A view on big data and its relation to Informetrics
Yao et al. Provenance-based indexing support in micro-blog platforms
CN102253939A (en) Searching method and system based on cloud computing technology
Gupta et al. A review on search engine optimization: Basics
CN106021418A (en) News event clustering method and device
Lee et al. An automatic topic ranking approach for event detection on microblogging messages
CN104298669A (en) Person geographic information mining model based on social network
Zobaed et al. Big Data in the Cloud.
Khodaei et al. Temporal-textual retrieval: Time and keyword search in web documents
Kumar et al. Design of a mobile Web crawler for hidden Web
US9092338B1 (en) Multi-level caching event lookup
Hurst et al. Social streams blog crawler
Huang et al. Design a batched information retrieval system based on a concept-lattice-like structure
Di Crescenzo et al. HERMEVENT: a news collection for emerging-event detection
Han et al. A real-time knowledge extracting system from social big data using distributed architecture
Mokbel et al. Microblogs data management systems: querying, analysis, and visualization
Wang et al. Efficiently identify local frequent keyword co-occurrence patterns in geo-tagged Twitter stream

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190628

WW01 Invention patent application withdrawn after publication