CN105354186A - News event extraction method and system - Google Patents

News event extraction method and system Download PDF

Info

Publication number
CN105354186A
CN105354186A CN201510749707.7A CN201510749707A CN105354186A CN 105354186 A CN105354186 A CN 105354186A CN 201510749707 A CN201510749707 A CN 201510749707A CN 105354186 A CN105354186 A CN 105354186A
Authority
CN
China
Prior art keywords
news
sentence
container
timestamp
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510749707.7A
Other languages
Chinese (zh)
Inventor
蒋昌俊
闫春钢
陈闳中
丁志军
吴亚光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201510749707.7A priority Critical patent/CN105354186A/en
Priority to PCT/CN2016/070992 priority patent/WO2017075912A1/en
Publication of CN105354186A publication Critical patent/CN105354186A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

A news event extraction method and system are provided. The news event extraction method comprises: according to a query word, acquiring a news sentence set comprising the query word from a news corpus; for news sentences with accurate time, extracting the time thereof; classifying news sentences with same time into a same time stamp container; for each time stamp container, collecting statistics on frequency of occurrence of each word in news sentences in the time stamp container, and establishing a corresponding feature vector; for a news sentence without accurate time, and for different time stamp containers, establishing a phrase vector of same dimensions as the feature vectors of the time stamp containers, and calculating similarity between the phrase vector and the feature vectors of the time stamp containers; and if a greatest value of the calculated similarity is larger than a set threshold, adding the news sentence without accurate time to a time stamp container corresponding to the highest similarity. According to the method and system provided by the present invention, sentences without accurate time can be correctly classified.

Description

A kind of media event abstracting method and system
Technical field
The present invention relates to a kind of data processing technique, particularly relate to a kind of media event abstracting method and system.
Background technology
News report has true, fresh, important, ageing extremely strong feature, can give people a large amount of information in short width.Due to the opening flag of internet, cause the news above internet to have isomery, redundancy, the dynamically characteristic such as changeable, the information describing same news is dispersed on different web sites usually, and the form of expression is also different.In order to can from bundle disorderly without the information finding user to need quickly and accurately the data mighty torrent of chapter, media event extraction technique be one of most important instrument.In the media event abstracting method of existing unsupervised learning, generally employ and give up not containing the mode of the news sentence of time, determine the importance of event according to the frequency of the media event be drawn into.Owing to having in quite a few news sentence the mode that have employed and give tacit consent to nearest news and not comprising the concrete time, these media events just can not be reproduced to have in news extraction technique and be extracted, thus easily cause the extraction deviation of major event, reduce the accuracy of event importance ranking.
Given this, how to comprise when media event extracts and just do not become those skilled in the art's problem demanding prompt solution containing the news of time to reduce extraction deviation.
Summary of the invention
The shortcoming of prior art in view of the above, the object of the present invention is to provide a kind of media event abstracting method and system, not comprising when media event extracts not containing the inaccurate problem of event importance ranking that the news of time causes for solving in prior art.
For achieving the above object and other relevant objects, the invention provides a kind of media event abstracting method, described media event abstracting method comprises: in news corpus storehouse, obtain the news sentence collection comprising described query word according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time; For the news sentence containing correct time, extract the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector; For the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
Alternatively, described similarity comprises cosine similarity.
Alternatively, described media event abstracting method also comprises: for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.
Alternatively, described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, adds up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.
Alternatively, described media event abstracting method also comprises: revise described threshold value.
Alternatively, the data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.
Alternatively, described timestamp container wherein, t iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.
Alternatively, feature phrase w jrepresent each word in the related phrases of q; Proper vector represent each word w jdocument word frequency, document word frequency represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Alternatively, described phrase vector is described similarity comprises cosine similarity, and described cosine similarity is S i m i l a r i t y ( s → i , W → ) = Σ i = 1 k a i × F w i ( Σ i = 1 k a i 2 ) 1 2 + ( Σ i = 1 k F w i 2 ) 1 2 .
Alternatively, described query word is determined according to media event.
The invention provides a kind of media event extraction system, described media event extraction system comprises: news sentence acquisition module, for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time; Free news processing module, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector; Without time news processing module, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
Alternatively, described news sentence acquisition module also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
Alternatively, described similarity comprises cosine similarity.
Alternatively, described media event extraction system also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.
As mentioned above, a kind of media event abstracting method of the present invention and system, have following beneficial effect: the sentence not containing time element can correctly be sorted out by (1), makes the sequence of media event importance more accurate.(2) enrich the quantity of the sentence be drawn into, make the difference of importance of different media event more obvious.(3) utilize the incoherent sentence of timestamp container rejection, reduce interference when other news sort to highlight.
Accompanying drawing explanation
Fig. 1 is shown as the schematic flow sheet of an embodiment of media event abstracting method of the present invention.
Fig. 2 is shown as the extraction schematic flow sheet of an embodiment of media event abstracting method of the present invention.
Fig. 3 is shown as the schematic flow sheet sorted out not containing correct time sentence of an embodiment of media event abstracting method of the present invention.
Fig. 4 is shown as the module diagram of an embodiment of media event extraction system of the present invention.
Element numbers explanation
1 media event extraction system
11 news sentence acquisition modules
12 free news processing modules
13 without time news processing module
S1 ~ S3 step
Embodiment
Below by way of specific instantiation, embodiments of the present invention are described, those skilled in the art the content disclosed by this instructions can understand other advantages of the present invention and effect easily.The present invention can also be implemented or be applied by embodiments different in addition, and the every details in this instructions also can based on different viewpoints and application, carries out various modification or change not deviating under spirit of the present invention.
It should be noted that, the diagram provided in the present embodiment only illustrates basic conception of the present invention in a schematic way, then only the assembly relevant with the present invention is shown in graphic but not component count, shape and size when implementing according to reality is drawn, it is actual when implementing, and the kenel of each assembly, quantity and ratio can be a kind of change arbitrarily, and its assembly layout kenel also may be more complicated.
The invention provides a kind of media event abstracting method.In one embodiment, as shown in Figure 1, described media event abstracting method comprises:
Step S1, obtains the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time.Data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.Described query word can be determined according to media event.Can represent a query word with symbol " q ", symbol " C " represents a corpus, and symbol " s " represents a sentence.In one embodiment, described query word according to the high report from each side of attention rate and can be determined in mentioning the event that quantity is maximum.
Step S2, for the news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector.In one embodiment, described timestamp container wherein, t iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.Feature phrase w jrepresent each word in the related phrases of q; Proper vector represent each word w jdocument word frequency, document word frequency represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Step S3, for the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.Described similarity comprises cosine similarity.Described media event abstracting method also comprises: revise described threshold value.User can revise described threshold value according to actual conditions when using.In one embodiment, described phrase vector is described similarity comprises cosine similarity, and described cosine similarity is S i m i l a r i t y ( s → i , W → ) = Σ i = 1 k a i × F w i ( Σ i = 1 k a i 2 ) 1 2 + ( Σ i = 1 k F w i 2 ) 1 2 . The maximal value of similarity and maximum similarity MaxSimilarity Vt i = { M a x ( S i m i l a r i t y ( s → i , W → ) ) , s i ∈ ( V t i , V t 0 ) } . If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join in the highest timestamp container of similarity containing the news sentence of correct time by described, the sentence joined in timestamp container is called effective sentence, effective sentence the time of expression sentence s is t i, for the threshold value adjusted according to actual conditions.May have corresponding effectively sentence in each timestamp container, the time is t iall effective sentence of timestamp container be called valid sentence subclass
In one embodiment, described media event abstracting method also comprises: for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.In one embodiment, described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, add up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
In one embodiment, the general frame that described media event abstracting method comprises as shown in Figure 2, will not contain the process of correct time sentence classification as shown in Figure 3.Its process comprises: by the news corpus collected according to title, the time, the form of content is stored in database.Afterwards according to the sentence terminating symbol of Chinese, as ".", "! ", "? " be divided into Sentence-level Deng by the content part of every section of document, equally according to title, the time, the form of content (sentence) stores.Sentence in corpus can be divided three classes: 1, comprise the sentence of precise date: AD (AbsoluteDate) represents complete and is accurate to the temporal expressions mode of " day ", as 2010.10.1, on May 12nd, 2008, the form of YYYY-MM-DD directly can be processed into.2, comprise the sentence on issuing time relevant date: DCT-RD (dateofcreation-relativedate) expression itself does not possess precise date, can be obtained by semantic analysis by document issuing time, and then be processed into the form of YYYY-MM-DD.3, do not comprise the sentence of precise date: UD (UnderspecifiedDate) expression can not get precise date, cannot be processed into the form of YYYY-MM-DD.
Then obtain Sentence-level language material by query word, then adopt algorithm below, extract the sentence time according to step and temporally stab classification: (1) is set up not containing the timestamp container V of precise date 0.(2) use regex (regular expression of time) to s i∈ S (q|c) mates, and obtains ( represent sentence S ithe precise date comprised); If do not exist, by S imate with R-Words (such as " the year before ", " after the week "), obtain DateDistance (DateDistance represents the distance with DCT on the date); If DateDistance does not exist, by S iput into V 0; If DateDistance exists, calculate DateDistance and DCT and obtain (date such as reported is on May 12nd, 2013, is exactly on May 12nd, 2012 so the year before).(3) if ( represent that the date is the timestamp container of t) exist, will put into if do not exist, create will put into
Then the similarity of sentence and feature phrase is calculated: the object calculating the similarity of sentence and feature phrase has two, one is that part is not included into correct timestamp container containing the sentence of correct time, and two is reject sentence incoherent with feature in each timestamp container.Concrete algorithm steps is as follows: (1) is to all sentence s i∈ V tcarry out participle, add up each word W ithe frequency occurred and set up proper vector (2) be each sentence s i∈ V tset up the vector that dimension is k (identical with the feature vector dimension of timestamp container) (3) each sentence s is calculated i∈ V tand proper vector cosine similarity (4) the sentence s the highest with proper vector similarity is found out wand remember that this similarity is M a x S i m i l a r i t y = M a x ( S i m i l a r i t y ( s i → , W → ) ) . (5) threshold value is set for s i∈ V tif, by s ifrom V tin remove; For s i∈ V 0if, by s iput into V t.Individual in practice, rule of thumb, MaxSimilarity more close to 1 time, sentence differs larger with feature phrase similarity may be but still same event, so threshold value can arrange lower.And when MaxSimilarity is away from 1, threshold value need arrange higher make reject more accurate.It is obtained by repetition test and manual observation that the threshold value of similarity is arranged, and can need as the case may be to modify.
Finally, sentence quantity is added up.The sentence quantity corresponding according to query word sorts, thus the event corresponding to query word is shown time shaft by importance ranking.Such as, in search database, text comprises the document of " Obama ", obtains 6418 records.Subordinate sentence is carried out to these records, is comprised the different sentences totally 20468 of " Obama ".And then extract the time of sentence, obtain the sentence that 3209 have correct time.The relatively time of 3209 sentences, finally obtain 158 different timestamps, and these sentences are inserted in corresponding timestamp container." earthquake " keyword is so done too, obtains following results and see table.After correctly being sorted out by sentence not containing time element as can be seen here, the sentence of average about 14.6% will be made to obtain correct rank.
Keyword Obama Earthquake
Total number of events 158 197
The event number of rank change 22 30
Accounting 13.9% 15.2%
The invention provides a kind of media event extraction system, described media event extraction system can use media event abstracting method as above.In one embodiment, as shown in Figure 4, described media event extraction system 1 comprises news sentence acquisition module 11, free news processing module 12 and without time news processing module 13, wherein:
News sentence acquisition module 11 for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time.Data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.Described query word can be determined according to media event.In one embodiment, described news sentence acquisition module 11 also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
Free news processing module 12 is connected with news sentence acquisition module 11, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector.In one embodiment, described timestamp container wherein, t iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.Feature phrase w jrepresent each word in the related phrases of q; Proper vector represent each word w jdocument word frequency, document word frequency represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Be connected with free news processing module 12 and news sentence acquisition module 11 without time news processing module 13, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.Described similarity comprises cosine similarity.Described media event abstracting method also comprises: revise described threshold value.User can revise described threshold value according to actual conditions when using.In one embodiment, described phrase vector is that described similarity comprises cosine similarity, and described cosine similarity is.
In one embodiment, described media event extraction system 1 also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.Media event statistical module also for for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.
In sum, sentence not containing time element can correctly be sorted out by a kind of media event abstracting method of the present invention and system, itself will express the sentence of media event put into correct time containers not containing the time, thus add the quantity of the media event extracted, improve the accuracy of event importance ranking.So the present invention effectively overcomes various shortcoming of the prior art and tool high industrial utilization.
Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all without prejudice under spirit of the present invention and category, can modify above-described embodiment or changes.Therefore, such as have in art usually know the knowledgeable do not depart from complete under disclosed spirit and technological thought all equivalence modify or change, must be contained by claim of the present invention.

Claims (10)

1. a media event abstracting method, is characterized in that, described media event abstracting method comprises:
In news corpus storehouse, the news sentence collection comprising described query word is obtained according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time;
For the news sentence containing correct time, extract the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector;
For the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
2. media event abstracting method according to claim 1, is characterized in that: described similarity comprises cosine similarity.
3. media event abstracting method according to claim 1, it is characterized in that: described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, add up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
4. media event abstracting method according to claim 1, is characterized in that: described media event abstracting method also comprises: revise described threshold value.
5. media event abstracting method according to claim 1, is characterized in that: described timestamp container wherein, t iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.
6. media event abstracting method according to claim 5, is characterized in that: described feature phrase w jrepresent each word in the related phrases of q; Described proper vector represent each word w jdocument word frequency, document word frequency represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
7. media event abstracting method according to claim 6, is characterized in that: described phrase vector is described similarity comprises cosine similarity, and described cosine similarity is S i m i l a r i t y ( s → i , W → ) = Σ i = 1 k a i × F w i ( Σ i = 1 k a i 2 ) 1 2 + ( Σ i = 1 k F w i 2 ) 1 2 .
8. a media event extraction system, is characterized in that: described media event extraction system comprises:
News sentence acquisition module, for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time;
Free news processing module, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector;
Without time news processing module, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
9. media event extraction system according to claim 8, is characterized in that: described news sentence acquisition module also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
10. media event extraction system according to claim 8, it is characterized in that: described media event extraction system also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
CN201510749707.7A 2015-11-05 2015-11-05 News event extraction method and system Pending CN105354186A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510749707.7A CN105354186A (en) 2015-11-05 2015-11-05 News event extraction method and system
PCT/CN2016/070992 WO2017075912A1 (en) 2015-11-05 2016-01-15 News events extracting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510749707.7A CN105354186A (en) 2015-11-05 2015-11-05 News event extraction method and system

Publications (1)

Publication Number Publication Date
CN105354186A true CN105354186A (en) 2016-02-24

Family

ID=55330160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510749707.7A Pending CN105354186A (en) 2015-11-05 2015-11-05 News event extraction method and system

Country Status (2)

Country Link
CN (1) CN105354186A (en)
WO (1) WO2017075912A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990437A (en) * 2019-12-05 2020-04-10 大众问问(北京)信息科技有限公司 Data fusion method and device and computer equipment

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241438B (en) * 2018-09-27 2022-06-24 国家计算机网络与信息安全管理中心 Element-based cross-channel hot event discovery method and device and storage medium
CN110162651B (en) * 2019-04-23 2023-07-14 南京邮电大学 News content image-text disagreement identification system and identification method based on semantic content abstract
CN112199601B (en) * 2020-11-09 2022-11-08 中国电子科技集团公司第二十八研究所 News recommendation method based on event popularity of mass news data
CN112528625B (en) * 2020-12-11 2024-02-23 北京百度网讯科技有限公司 Event extraction method, device, computer equipment and readable storage medium
CN112765950A (en) * 2021-01-08 2021-05-07 首都师范大学 Template library generation method and system based on cosine similarity and storage medium
CN114880588A (en) * 2022-06-13 2022-08-09 四川封面传媒科技有限责任公司 News popularity prediction method based on knowledge graph

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
CN101872343A (en) * 2009-04-24 2010-10-27 罗彤 Semi-supervised mass data hierarchy classification method
CN103020251A (en) * 2012-12-20 2013-04-03 人民搜索网络股份公司 Automatic mining system and method of news events in large-scale data
CN103020159A (en) * 2012-11-26 2013-04-03 百度在线网络技术(北京)有限公司 Method and device for news presentation facing events
CN103164427A (en) * 2011-12-13 2013-06-19 中国移动通信集团公司 Method and device of news aggregation
JP5238339B2 (en) * 2008-04-24 2013-07-17 日本放送協会 Receiver and program for digital broadcasting
US20140156642A1 (en) * 2012-12-04 2014-06-05 At&T Intellectual Property I, L.P. Generating And Using Temporal Metadata Partitions
US20140181131A1 (en) * 2012-12-26 2014-06-26 David Ross Timeline wrinkling system and method
CN104077391A (en) * 2014-06-30 2014-10-01 北京奇虎科技有限公司 Method, server, client and system for providing special news search
CN104090918A (en) * 2014-06-16 2014-10-08 北京理工大学 Sentence similarity calculation method based on information amount
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN104408093A (en) * 2014-11-14 2015-03-11 中国科学院计算技术研究所 News event element extracting method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273A (en) * 2007-12-04 2008-05-07 清华大学 News event detecting method based on metadata analysis
JP5238339B2 (en) * 2008-04-24 2013-07-17 日本放送協会 Receiver and program for digital broadcasting
CN101872343A (en) * 2009-04-24 2010-10-27 罗彤 Semi-supervised mass data hierarchy classification method
CN103164427A (en) * 2011-12-13 2013-06-19 中国移动通信集团公司 Method and device of news aggregation
CN103020159A (en) * 2012-11-26 2013-04-03 百度在线网络技术(北京)有限公司 Method and device for news presentation facing events
US20140156642A1 (en) * 2012-12-04 2014-06-05 At&T Intellectual Property I, L.P. Generating And Using Temporal Metadata Partitions
CN103020251A (en) * 2012-12-20 2013-04-03 人民搜索网络股份公司 Automatic mining system and method of news events in large-scale data
US20140181131A1 (en) * 2012-12-26 2014-06-26 David Ross Timeline wrinkling system and method
CN104090918A (en) * 2014-06-16 2014-10-08 北京理工大学 Sentence similarity calculation method based on information amount
CN104077391A (en) * 2014-06-30 2014-10-01 北京奇虎科技有限公司 Method, server, client and system for providing special news search
CN104318242A (en) * 2014-10-08 2015-01-28 中国人民解放军空军工程大学 High-efficiency SVM active half-supervision learning algorithm
CN104408093A (en) * 2014-11-14 2015-03-11 中国科学院计算技术研究所 News event element extracting method and device

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
HECTOR LLORENS 等: "TimeML Events Recognition and Classification:Learning CRF Models with Semantic Roles", 《PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS (COLING 2010)》 *
仲兆满 等: "一种高效的Web新闻发表时间提取方法", 《小型微型计算机系统》 *
夏威 等: "基于马尔可夫模型的新闻事件抽取方法", 《桂林电子科技大学学报》 *
张先飞 等: "事件检测与描述中的时间信息提取", 《情报学报》 *
林静 等: "中文时间信息的TIMEX2自动标注", 《清华大学学报(自然科学版)》 *
梁军 等: "基于时间不确定性的HLA乐观时间同步", 《计算机工程》 *
索红光 等: "基于时间戳的多文档自动文摘", 《计算机工程》 *
蔡华利 等: "突发事件Web 新闻中时间信息分析及抽取", 《计算机工程与应用》 *
黄卿 等: "基于事件顺序的时间戳协议处理", 《计算机工程》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990437A (en) * 2019-12-05 2020-04-10 大众问问(北京)信息科技有限公司 Data fusion method and device and computer equipment

Also Published As

Publication number Publication date
WO2017075912A1 (en) 2017-05-11

Similar Documents

Publication Publication Date Title
CN105354186A (en) News event extraction method and system
Trupthi et al. Sentiment analysis on twitter using streaming API
CN106874378B (en) Method for constructing knowledge graph based on entity extraction and relation mining of rule model
CN106599054B (en) Method and system for classifying and pushing questions
CN106651696B (en) Approximate question pushing method and system
CN103279478B (en) A kind of based on distributed mutual information file characteristics extracting method
CN106951438A (en) A kind of event extraction system and method towards open field
CN103605658B (en) A kind of search engine system analyzed based on text emotion
CN104408093A (en) News event element extracting method and device
CN106055538A (en) Automatic extraction method for text labels in combination with theme model and semantic analyses
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN104881458A (en) Labeling method and device for web page topics
CN104077417A (en) Figure tag recommendation method and system in social network
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN104484380A (en) Personalized search method and personalized search device
CN106844482B (en) Search engine-based retrieval information matching method and device
CN107239564A (en) A kind of text label based on supervision topic model recommends method
CN112347352A (en) Course recommendation method and device and storage medium
CN106681985A (en) Establishment system of multi-field dictionaries based on theme automatic matching
CN106294786A (en) A kind of code search method and system
AU2018100678A4 (en) News events extracting method and system
Sitorus et al. Sensing trending topics in twitter for greater Jakarta area
CN106776724B (en) Question classification method and system
CN114077705A (en) Method and system for portraying media account on social platform

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160224