CN105354186A - News event extraction method and system - Google Patents
News event extraction method and system Download PDFInfo
- Publication number
- CN105354186A CN105354186A CN201510749707.7A CN201510749707A CN105354186A CN 105354186 A CN105354186 A CN 105354186A CN 201510749707 A CN201510749707 A CN 201510749707A CN 105354186 A CN105354186 A CN 105354186A
- Authority
- CN
- China
- Prior art keywords
- news
- sentence
- container
- timestamp
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Abstract
A news event extraction method and system are provided. The news event extraction method comprises: according to a query word, acquiring a news sentence set comprising the query word from a news corpus; for news sentences with accurate time, extracting the time thereof; classifying news sentences with same time into a same time stamp container; for each time stamp container, collecting statistics on frequency of occurrence of each word in news sentences in the time stamp container, and establishing a corresponding feature vector; for a news sentence without accurate time, and for different time stamp containers, establishing a phrase vector of same dimensions as the feature vectors of the time stamp containers, and calculating similarity between the phrase vector and the feature vectors of the time stamp containers; and if a greatest value of the calculated similarity is larger than a set threshold, adding the news sentence without accurate time to a time stamp container corresponding to the highest similarity. According to the method and system provided by the present invention, sentences without accurate time can be correctly classified.
Description
Technical field
The present invention relates to a kind of data processing technique, particularly relate to a kind of media event abstracting method and system.
Background technology
News report has true, fresh, important, ageing extremely strong feature, can give people a large amount of information in short width.Due to the opening flag of internet, cause the news above internet to have isomery, redundancy, the dynamically characteristic such as changeable, the information describing same news is dispersed on different web sites usually, and the form of expression is also different.In order to can from bundle disorderly without the information finding user to need quickly and accurately the data mighty torrent of chapter, media event extraction technique be one of most important instrument.In the media event abstracting method of existing unsupervised learning, generally employ and give up not containing the mode of the news sentence of time, determine the importance of event according to the frequency of the media event be drawn into.Owing to having in quite a few news sentence the mode that have employed and give tacit consent to nearest news and not comprising the concrete time, these media events just can not be reproduced to have in news extraction technique and be extracted, thus easily cause the extraction deviation of major event, reduce the accuracy of event importance ranking.
Given this, how to comprise when media event extracts and just do not become those skilled in the art's problem demanding prompt solution containing the news of time to reduce extraction deviation.
Summary of the invention
The shortcoming of prior art in view of the above, the object of the present invention is to provide a kind of media event abstracting method and system, not comprising when media event extracts not containing the inaccurate problem of event importance ranking that the news of time causes for solving in prior art.
For achieving the above object and other relevant objects, the invention provides a kind of media event abstracting method, described media event abstracting method comprises: in news corpus storehouse, obtain the news sentence collection comprising described query word according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time; For the news sentence containing correct time, extract the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector; For the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
Alternatively, described similarity comprises cosine similarity.
Alternatively, described media event abstracting method also comprises: for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.
Alternatively, described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, adds up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.
Alternatively, described media event abstracting method also comprises: revise described threshold value.
Alternatively, the data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.
Alternatively, described timestamp container
wherein, t
iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.
Alternatively, feature phrase
w
jrepresent each word in the related phrases of q; Proper vector
represent each word w
jdocument word frequency, document word frequency
represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Alternatively, described phrase vector is
described similarity comprises cosine similarity, and described cosine similarity is
Alternatively, described query word is determined according to media event.
The invention provides a kind of media event extraction system, described media event extraction system comprises: news sentence acquisition module, for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time; Free news processing module, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector; Without time news processing module, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
Alternatively, described news sentence acquisition module also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
Alternatively, described similarity comprises cosine similarity.
Alternatively, described media event extraction system also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.
As mentioned above, a kind of media event abstracting method of the present invention and system, have following beneficial effect: the sentence not containing time element can correctly be sorted out by (1), makes the sequence of media event importance more accurate.(2) enrich the quantity of the sentence be drawn into, make the difference of importance of different media event more obvious.(3) utilize the incoherent sentence of timestamp container rejection, reduce interference when other news sort to highlight.
Accompanying drawing explanation
Fig. 1 is shown as the schematic flow sheet of an embodiment of media event abstracting method of the present invention.
Fig. 2 is shown as the extraction schematic flow sheet of an embodiment of media event abstracting method of the present invention.
Fig. 3 is shown as the schematic flow sheet sorted out not containing correct time sentence of an embodiment of media event abstracting method of the present invention.
Fig. 4 is shown as the module diagram of an embodiment of media event extraction system of the present invention.
Element numbers explanation
1 media event extraction system
11 news sentence acquisition modules
12 free news processing modules
13 without time news processing module
S1 ~ S3 step
Embodiment
Below by way of specific instantiation, embodiments of the present invention are described, those skilled in the art the content disclosed by this instructions can understand other advantages of the present invention and effect easily.The present invention can also be implemented or be applied by embodiments different in addition, and the every details in this instructions also can based on different viewpoints and application, carries out various modification or change not deviating under spirit of the present invention.
It should be noted that, the diagram provided in the present embodiment only illustrates basic conception of the present invention in a schematic way, then only the assembly relevant with the present invention is shown in graphic but not component count, shape and size when implementing according to reality is drawn, it is actual when implementing, and the kenel of each assembly, quantity and ratio can be a kind of change arbitrarily, and its assembly layout kenel also may be more complicated.
The invention provides a kind of media event abstracting method.In one embodiment, as shown in Figure 1, described media event abstracting method comprises:
Step S1, obtains the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time.Data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.Described query word can be determined according to media event.Can represent a query word with symbol " q ", symbol " C " represents a corpus, and symbol " s " represents a sentence.In one embodiment, described query word according to the high report from each side of attention rate and can be determined in mentioning the event that quantity is maximum.
Step S2, for the news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector.In one embodiment, described timestamp container
wherein, t
iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.Feature phrase
w
jrepresent each word in the related phrases of q; Proper vector
represent each word w
jdocument word frequency, document word frequency
represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Step S3, for the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.Described similarity comprises cosine similarity.Described media event abstracting method also comprises: revise described threshold value.User can revise described threshold value according to actual conditions when using.In one embodiment, described phrase vector is
described similarity comprises cosine similarity, and described cosine similarity is
The maximal value of similarity and maximum similarity
If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join in the highest timestamp container of similarity containing the news sentence of correct time by described, the sentence joined in timestamp container is called effective sentence, effective sentence
the time of expression sentence s is t
i,
for the threshold value adjusted according to actual conditions.May have corresponding effectively sentence in each timestamp container, the time is t
iall effective sentence of timestamp container be called valid sentence subclass
In one embodiment, described media event abstracting method also comprises: for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.In one embodiment, described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, add up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
In one embodiment, the general frame that described media event abstracting method comprises as shown in Figure 2, will not contain the process of correct time sentence classification as shown in Figure 3.Its process comprises: by the news corpus collected according to title, the time, the form of content is stored in database.Afterwards according to the sentence terminating symbol of Chinese, as ".", "! ", "? " be divided into Sentence-level Deng by the content part of every section of document, equally according to title, the time, the form of content (sentence) stores.Sentence in corpus can be divided three classes: 1, comprise the sentence of precise date: AD (AbsoluteDate) represents complete and is accurate to the temporal expressions mode of " day ", as 2010.10.1, on May 12nd, 2008, the form of YYYY-MM-DD directly can be processed into.2, comprise the sentence on issuing time relevant date: DCT-RD (dateofcreation-relativedate) expression itself does not possess precise date, can be obtained by semantic analysis by document issuing time, and then be processed into the form of YYYY-MM-DD.3, do not comprise the sentence of precise date: UD (UnderspecifiedDate) expression can not get precise date, cannot be processed into the form of YYYY-MM-DD.
Then obtain Sentence-level language material by query word, then adopt algorithm below, extract the sentence time according to step and temporally stab classification: (1) is set up not containing the timestamp container V of precise date
0.(2) use regex (regular expression of time) to s
i∈ S (q|c) mates, and obtains
(
represent sentence S
ithe precise date comprised); If
do not exist, by S
imate with R-Words (such as " the year before ", " after the week "), obtain DateDistance (DateDistance represents the distance with DCT on the date); If DateDistance does not exist, by S
iput into V
0; If DateDistance exists, calculate DateDistance and DCT and obtain
(date such as reported is on May 12nd, 2013, is exactly on May 12nd, 2012 so the year before).(3) if
(
represent that the date is the timestamp container of t) exist, will
put into
if
do not exist, create
will
put into
Then the similarity of sentence and feature phrase is calculated: the object calculating the similarity of sentence and feature phrase has two, one is that part is not included into correct timestamp container containing the sentence of correct time, and two is reject sentence incoherent with feature in each timestamp container.Concrete algorithm steps is as follows: (1) is to all sentence s
i∈ V
tcarry out participle, add up each word W
ithe frequency occurred
and set up proper vector
(2) be each sentence s
i∈ V
tset up the vector that dimension is k (identical with the feature vector dimension of timestamp container)
(3) each sentence s is calculated
i∈ V
tand proper vector
cosine similarity
(4) the sentence s the highest with proper vector similarity is found out
wand remember that this similarity is
(5) threshold value is set
for s
i∈ V
tif,
by s
ifrom V
tin remove; For s
i∈ V
0if,
by s
iput into V
t.Individual in practice, rule of thumb, MaxSimilarity more close to 1 time, sentence differs larger with feature phrase similarity may be but still same event, so threshold value can arrange lower.And when MaxSimilarity is away from 1, threshold value need arrange higher make reject more accurate.It is obtained by repetition test and manual observation that the threshold value of similarity is arranged, and can need as the case may be to modify.
Finally, sentence quantity is added up.The sentence quantity corresponding according to query word sorts, thus the event corresponding to query word is shown time shaft by importance ranking.Such as, in search database, text comprises the document of " Obama ", obtains 6418 records.Subordinate sentence is carried out to these records, is comprised the different sentences totally 20468 of " Obama ".And then extract the time of sentence, obtain the sentence that 3209 have correct time.The relatively time of 3209 sentences, finally obtain 158 different timestamps, and these sentences are inserted in corresponding timestamp container." earthquake " keyword is so done too, obtains following results and see table.After correctly being sorted out by sentence not containing time element as can be seen here, the sentence of average about 14.6% will be made to obtain correct rank.
Keyword | Obama | Earthquake |
Total number of events | 158 | 197 |
The event number of rank change | 22 | 30 |
Accounting | 13.9% | 15.2% |
The invention provides a kind of media event extraction system, described media event extraction system can use media event abstracting method as above.In one embodiment, as shown in Figure 4, described media event extraction system 1 comprises news sentence acquisition module 11, free news processing module 12 and without time news processing module 13, wherein:
News sentence acquisition module 11 for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time.Data acquiring mode in described news corpus storehouse comprises: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in news corpus storehouse.Described query word can be determined according to media event.In one embodiment, described news sentence acquisition module 11 also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
Free news processing module 12 is connected with news sentence acquisition module 11, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector.In one embodiment, described timestamp container
wherein, t
iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.Feature phrase
w
jrepresent each word in the related phrases of q; Proper vector
represent each word w
jdocument word frequency, document word frequency
represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
Be connected with free news processing module 12 and news sentence acquisition module 11 without time news processing module 13, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.Described similarity comprises cosine similarity.Described media event abstracting method also comprises: revise described threshold value.User can revise described threshold value according to actual conditions when using.In one embodiment, described phrase vector is that described similarity comprises cosine similarity, and described cosine similarity is.
In one embodiment, described media event extraction system 1 also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtains the ranking results of described query word.Media event statistical module also for for each timestamp container, adds up the sentence quantity that described timestamp container comprises described query word.
In sum, sentence not containing time element can correctly be sorted out by a kind of media event abstracting method of the present invention and system, itself will express the sentence of media event put into correct time containers not containing the time, thus add the quantity of the media event extracted, improve the accuracy of event importance ranking.So the present invention effectively overcomes various shortcoming of the prior art and tool high industrial utilization.
Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all without prejudice under spirit of the present invention and category, can modify above-described embodiment or changes.Therefore, such as have in art usually know the knowledgeable do not depart from complete under disclosed spirit and technological thought all equivalence modify or change, must be contained by claim of the present invention.
Claims (10)
1. a media event abstracting method, is characterized in that, described media event abstracting method comprises:
In news corpus storehouse, the news sentence collection comprising described query word is obtained according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time;
For the news sentence containing correct time, extract the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector;
For the news sentence not containing correct time, do not set up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than the threshold value of setting, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
2. media event abstracting method according to claim 1, is characterized in that: described similarity comprises cosine similarity.
3. media event abstracting method according to claim 1, it is characterized in that: described media event abstracting method also comprises: process according to above-mentioned media event abstracting method for different query words, add up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
4. media event abstracting method according to claim 1, is characterized in that: described media event abstracting method also comprises: revise described threshold value.
5. media event abstracting method according to claim 1, is characterized in that: described timestamp container
wherein, t
iit is time variable; C (q) represents the sentence set matched with query word q in the C of news corpus storehouse; S.t represents the time tag of sentence s.
6. media event abstracting method according to claim 5, is characterized in that: described feature phrase
w
jrepresent each word in the related phrases of q; Described proper vector
represent each word w
jdocument word frequency, document word frequency
represent the frequency that i-th word occurs in a document, k represents the number of the word comprised in document.
7. media event abstracting method according to claim 6, is characterized in that: described phrase vector is
described similarity comprises cosine similarity, and described cosine similarity is
8. a media event extraction system, is characterized in that: described media event extraction system comprises:
News sentence acquisition module, for obtaining the news sentence collection comprising described query word in news corpus storehouse according to query word; Described news sentence collection is divided into the news sentence containing correct time and does not contain the news sentence of correct time;
Free news processing module, for for the described news sentence containing correct time, extracts the time wherein; Set up multiple timestamp container for the different time, and the news sentence with same time is referred to same timestamp container; For each timestamp container, the frequency that in statistics news sentence wherein, each word occurs, and set up corresponding proper vector;
Without time news processing module, for not setting up the phrase vector identical with the proper vector dimension of described timestamp container for different time stamp container respectively containing the participle of the news sentence of correct time according to described, and calculate the similarity between described phrase vector and the proper vector of described timestamp container; If the maximal value of the similarity calculated is greater than setting threshold value, then do not join described in the highest timestamp container of similarity containing the news sentence of correct time.
9. media event extraction system according to claim 8, is characterized in that: described news sentence acquisition module also for: be news sentence by the division of teaching contents of the news documents collected, and by described news sentence stored in described news corpus storehouse.
10. media event extraction system according to claim 8, it is characterized in that: described media event extraction system also comprises media event statistical module, for adding up the sentence quantity of the different query words in each timestamp container, obtain the ranking results of described query word.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510749707.7A CN105354186A (en) | 2015-11-05 | 2015-11-05 | News event extraction method and system |
PCT/CN2016/070992 WO2017075912A1 (en) | 2015-11-05 | 2016-01-15 | News events extracting method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510749707.7A CN105354186A (en) | 2015-11-05 | 2015-11-05 | News event extraction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105354186A true CN105354186A (en) | 2016-02-24 |
Family
ID=55330160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510749707.7A Pending CN105354186A (en) | 2015-11-05 | 2015-11-05 | News event extraction method and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105354186A (en) |
WO (1) | WO2017075912A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990437A (en) * | 2019-12-05 | 2020-04-10 | 大众问问(北京)信息科技有限公司 | Data fusion method and device and computer equipment |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241438B (en) * | 2018-09-27 | 2022-06-24 | 国家计算机网络与信息安全管理中心 | Element-based cross-channel hot event discovery method and device and storage medium |
CN110162651B (en) * | 2019-04-23 | 2023-07-14 | 南京邮电大学 | News content image-text disagreement identification system and identification method based on semantic content abstract |
CN112199601B (en) * | 2020-11-09 | 2022-11-08 | 中国电子科技集团公司第二十八研究所 | News recommendation method based on event popularity of mass news data |
CN112528625B (en) * | 2020-12-11 | 2024-02-23 | 北京百度网讯科技有限公司 | Event extraction method, device, computer equipment and readable storage medium |
CN112765950A (en) * | 2021-01-08 | 2021-05-07 | 首都师范大学 | Template library generation method and system based on cosine similarity and storage medium |
CN114880588A (en) * | 2022-06-13 | 2022-08-09 | 四川封面传媒科技有限责任公司 | News popularity prediction method based on knowledge graph |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
CN101872343A (en) * | 2009-04-24 | 2010-10-27 | 罗彤 | Semi-supervised mass data hierarchy classification method |
CN103020251A (en) * | 2012-12-20 | 2013-04-03 | 人民搜索网络股份公司 | Automatic mining system and method of news events in large-scale data |
CN103020159A (en) * | 2012-11-26 | 2013-04-03 | 百度在线网络技术(北京)有限公司 | Method and device for news presentation facing events |
CN103164427A (en) * | 2011-12-13 | 2013-06-19 | 中国移动通信集团公司 | Method and device of news aggregation |
JP5238339B2 (en) * | 2008-04-24 | 2013-07-17 | 日本放送協会 | Receiver and program for digital broadcasting |
US20140156642A1 (en) * | 2012-12-04 | 2014-06-05 | At&T Intellectual Property I, L.P. | Generating And Using Temporal Metadata Partitions |
US20140181131A1 (en) * | 2012-12-26 | 2014-06-26 | David Ross | Timeline wrinkling system and method |
CN104077391A (en) * | 2014-06-30 | 2014-10-01 | 北京奇虎科技有限公司 | Method, server, client and system for providing special news search |
CN104090918A (en) * | 2014-06-16 | 2014-10-08 | 北京理工大学 | Sentence similarity calculation method based on information amount |
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN104408093A (en) * | 2014-11-14 | 2015-03-11 | 中国科学院计算技术研究所 | News event element extracting method and device |
-
2015
- 2015-11-05 CN CN201510749707.7A patent/CN105354186A/en active Pending
-
2016
- 2016-01-15 WO PCT/CN2016/070992 patent/WO2017075912A1/en active Application Filing
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101174273A (en) * | 2007-12-04 | 2008-05-07 | 清华大学 | News event detecting method based on metadata analysis |
JP5238339B2 (en) * | 2008-04-24 | 2013-07-17 | 日本放送協会 | Receiver and program for digital broadcasting |
CN101872343A (en) * | 2009-04-24 | 2010-10-27 | 罗彤 | Semi-supervised mass data hierarchy classification method |
CN103164427A (en) * | 2011-12-13 | 2013-06-19 | 中国移动通信集团公司 | Method and device of news aggregation |
CN103020159A (en) * | 2012-11-26 | 2013-04-03 | 百度在线网络技术(北京)有限公司 | Method and device for news presentation facing events |
US20140156642A1 (en) * | 2012-12-04 | 2014-06-05 | At&T Intellectual Property I, L.P. | Generating And Using Temporal Metadata Partitions |
CN103020251A (en) * | 2012-12-20 | 2013-04-03 | 人民搜索网络股份公司 | Automatic mining system and method of news events in large-scale data |
US20140181131A1 (en) * | 2012-12-26 | 2014-06-26 | David Ross | Timeline wrinkling system and method |
CN104090918A (en) * | 2014-06-16 | 2014-10-08 | 北京理工大学 | Sentence similarity calculation method based on information amount |
CN104077391A (en) * | 2014-06-30 | 2014-10-01 | 北京奇虎科技有限公司 | Method, server, client and system for providing special news search |
CN104318242A (en) * | 2014-10-08 | 2015-01-28 | 中国人民解放军空军工程大学 | High-efficiency SVM active half-supervision learning algorithm |
CN104408093A (en) * | 2014-11-14 | 2015-03-11 | 中国科学院计算技术研究所 | News event element extracting method and device |
Non-Patent Citations (9)
Title |
---|
HECTOR LLORENS 等: "TimeML Events Recognition and Classification:Learning CRF Models with Semantic Roles", 《PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS (COLING 2010)》 * |
仲兆满 等: "一种高效的Web新闻发表时间提取方法", 《小型微型计算机系统》 * |
夏威 等: "基于马尔可夫模型的新闻事件抽取方法", 《桂林电子科技大学学报》 * |
张先飞 等: "事件检测与描述中的时间信息提取", 《情报学报》 * |
林静 等: "中文时间信息的TIMEX2自动标注", 《清华大学学报(自然科学版)》 * |
梁军 等: "基于时间不确定性的HLA乐观时间同步", 《计算机工程》 * |
索红光 等: "基于时间戳的多文档自动文摘", 《计算机工程》 * |
蔡华利 等: "突发事件Web 新闻中时间信息分析及抽取", 《计算机工程与应用》 * |
黄卿 等: "基于事件顺序的时间戳协议处理", 《计算机工程》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990437A (en) * | 2019-12-05 | 2020-04-10 | 大众问问(北京)信息科技有限公司 | Data fusion method and device and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
WO2017075912A1 (en) | 2017-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105354186A (en) | News event extraction method and system | |
Trupthi et al. | Sentiment analysis on twitter using streaming API | |
CN106874378B (en) | Method for constructing knowledge graph based on entity extraction and relation mining of rule model | |
CN106599054B (en) | Method and system for classifying and pushing questions | |
CN106651696B (en) | Approximate question pushing method and system | |
CN103279478B (en) | A kind of based on distributed mutual information file characteristics extracting method | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN103605658B (en) | A kind of search engine system analyzed based on text emotion | |
CN104408093A (en) | News event element extracting method and device | |
CN106055538A (en) | Automatic extraction method for text labels in combination with theme model and semantic analyses | |
CN104199972A (en) | Named entity relation extraction and construction method based on deep learning | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN104881458A (en) | Labeling method and device for web page topics | |
CN104077417A (en) | Figure tag recommendation method and system in social network | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN106844482B (en) | Search engine-based retrieval information matching method and device | |
CN107239564A (en) | A kind of text label based on supervision topic model recommends method | |
CN112347352A (en) | Course recommendation method and device and storage medium | |
CN106681985A (en) | Establishment system of multi-field dictionaries based on theme automatic matching | |
CN106294786A (en) | A kind of code search method and system | |
AU2018100678A4 (en) | News events extracting method and system | |
Sitorus et al. | Sensing trending topics in twitter for greater Jakarta area | |
CN106776724B (en) | Question classification method and system | |
CN114077705A (en) | Method and system for portraying media account on social platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160224 |