AU2018100678A4 - News events extracting method and system - Google Patents

News events extracting method and system Download PDF

Info

Publication number
AU2018100678A4
AU2018100678A4 AU2018100678A AU2018100678A AU2018100678A4 AU 2018100678 A4 AU2018100678 A4 AU 2018100678A4 AU 2018100678 A AU2018100678 A AU 2018100678A AU 2018100678 A AU2018100678 A AU 2018100678A AU 2018100678 A4 AU2018100678 A4 AU 2018100678A4
Authority
AU
Australia
Prior art keywords
news
timestamp
container
sentences
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
AU2018100678A
Inventor
Hongzhong CHEN
Zhijun Ding
Changjun JIANG
Yaguang WU
Chungang YAN
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/CN2016/070992 external-priority patent/WO2017075912A1/en
Application filed by Tongji University filed Critical Tongji University
Priority to AU2018100678A priority Critical patent/AU2018100678A4/en
Application granted granted Critical
Publication of AU2018100678A4 publication Critical patent/AU2018100678A4/en
Anticipated expiration legal-status Critical
Expired legal-status Critical Current

Links

Abstract

Abstract of Disclosure The present invention provides a news event extraction method and system. The news event extraction method comprises: acquiring a news sentence set containing a query word from a news corpus according to the query word; for news sentences containing accurate time, extracting time in the news sentences; classifying news sentences with the same time into the same timestamp container; for each timestamp container, statistically collecting occurrence frequency of each word in the news sentences in the timestamp container, and establishing a corresponding characteristic vector; aiming at news sentences containing no accurate time, establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity. The present invention can correctly classify sentences containing no time element. Acquire a news sentence set containing a query word from a news corpus according to Sl the query word; and divide the news sentence set into news sentences containing accurate time and news sentences containing no accurate time Extract time in the news sentences containing accurate time; establish a plurality of timestamp containers aiming at different time, and classify news sentences with the 2 same time into the same timestamp container; and for each timestamp container, statistically collect occurrence frequency of each word in the news sentences in the timestamp container, and establish a corresponding characteristic vector For the news sentences containing no accurate time, establish a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to S3 word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculatea similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, add the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity

Description

NEWS EVENT EXTRACTION METHOD AND SYSTEM
Background of the Present Invention
Field of Invention
The present invention relates to a data processing technology, in particular to a news event extraction method and system.
Description of Related Arts
News reports have the features of truth, freshness, importance, and extremely strong timeliness and can provide a great amount of information to people in short length. Due to the open characteristic of the Internet, news on the Internet has the features of heterogeneity, redundancy, dynamic change and the like, information which describes the same news is usually scattered on different websites, and patterns of presentation are different. In order to rapidly and accurately find information needed by users from disorder data floods, the news event extraction technology is one of the most important tools. In the existing unsupervised learning news event extraction methods, usually a mode of abandoning news sentences containing no time is adopted and the importance of events is determined according to the frequency of the extracted news events. Since a mode of defaulting latest news is adopted in quite a part of news sentences and no specific time is contained, these news events cannot be extracted by adopting the existing news extraction technologies, thereby deviation of extraction of major events is easily caused and the accuracy of event importance ranking is decreased.
In view of this, how to include news containing no time during news event extraction to reduce extraction deviation becomes a problem which needs to be urgently solved by one skilled in the art.
Summary of the Present Invention
In view of the above-mentioned disadvantages of the prior art, the purpose of the present invention is to provide a news event extraction method and system, which are used for solving the problem that the accuracy of event importance ranking is not high since news containing no time are not included during news event extraction in the prior art.
In order to realize the above-mentioned purpose and other related purposes, the present invention provides a news event extraction method. The news event extraction method comprises: acquiring a news sentence set containing a query word from a news corpus according to the query word; dividing the news sentence set into news sentences containing accurate time and news sentences containing no accurate time; aiming at the news sentence containing accurate time, extracting time in the news sentences; establishing a plurality of timestamp containers aiming at different time, and classifying news sentences with the same time into the same timestamp container; aiming at each timestamp container, statistically collecting occurrence frequency of each word in the news sentences in the timestamp container, and establishing a corresponding characteristic vector; for the news sentences containing no accurate time, establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity.
Alternatively, the similarity comprises cosine similarity.
Alternatively, the news event extraction method further comprises: for each timestamp container, statistically collecting a number of sentences containing the query word in the timestamp container.
Alternatively, the news event extraction method further comprises: processing different query words according to the news event extraction method, statistically collecting a number of sentences containing different query words in each timestamp container and obtaining a ranking result of the query words.
Alternatively, the news event extraction method further comprises: modifying the threshold.
Alternatively, a mode of acquiring data in the news corpus comprises: dividing contents of a collected news document into news sentences and storing the news sentences into the news corpus.
Alternatively, the timestamp container is
where t, is a time variable; C(q) denotes a set of sentences matched with a query word q in a news corpus C; and s.t denotes a time label of a sentence 5.
Alternatively, acharacteristic phrase is
, where
Wj denotes each word in sentences related to q; and the characteristic vector is
where Fwj denotes document word frequency of each word Wj, the document word frequency is
, NWj represents occurrence times of an ith word in a document and k denotes a number of words contained in the document.
Alternatively, the phrase vector is
, the similarity comprises cosine similarity and the cosine similarity is
Alternatively, the query word is determined according to news events.
The present invention further provides a news event extraction system. The news event extraction system comprises: a news sentence acquisition module used for acquiring a news sentence set containing a query word from a news corpus according to the query word; and dividing the news sentence set into news sentences containing accurate time and news sentences containing no accurate time; a time-containing news processing module used for, aiming at the news sentence containing accurate time, extracting time in the news sentences; establishing a plurality of timestamp containers aiming at different time, and classifying news sentences with the same time into the same timestamp container; and aiming at each timestamp container, statistically collecting occurrence frequency of each word in the news sentences in the timestamp container, and establishing a corresponding characteristic vector; and a no-time-containing news processing module used for establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity.
Alternatively, the news sentence acquisition module is further used for dividing contents of a collected news document into news sentences and storing the news sentences into the news corpus.
Alternatively, the similarity comprises cosine similarity.
Alternatively, the news event extraction system further comprises a news event statistics module used for statistically collecting a number of sentences containing different query' words in each timestamp container and obtaining a ranking result of the query words.
As described above, the news event extraction method and system provided by the present invention have the following beneficial effects: (1) the sentences containing no time element can be correctly classified such that the accuracy of news event importance ranking is higher; (2) the number of extracted sentences is increased such that the difference ofthe importance of different news events is more obvious; and (3) the unrelated sentences are removed by using the timestamp containers such that the interference of other news to the ranking of important news is decreased.
Brief Description of the Drawings FIG. 1 illustrates a flowchart of one embodiment accordingto news event extraction method ofthe present invention. FIG. 2 illustrates an extraction flowchart of one embodiment according to news event extraction method of the present invention. FIG. 3 illustrates a flowchart of classifying sentences containing no accurate time according to one embodiment of news event extraction methodin the present invention. FIG. 4 illustrates a modular schematic diagram of one embodiment according to news event extraction system of the present invention.
Description of component mark numbers: I News event extraction system II News sentence acquisition module 12 Time-containing news processing module 13 No-time-containing news processing module SI-S3 Steps
Detailed Description of the Preferred Embodiments
The implementation modes of the present invention will be described below through specific embodiments. One skilled in the art can easily understand other advantages and effects of the present invention according to contents disclosed by the description. The present invention may also be implemented or applied through other different specific implementation modes. Various modifications or changes may also be made to all details in the description based on different points of view and applications without departing from the spirit of the present invention.
It needs to be stated that the drawings provided in the following embodiments are just used for schematically describing the basic concept of the present invention, thus only illustrate components only related to the present invention and are not drawn according to the numbers, shapes and sizes of components during actual implementation, the configuration, number and scale of each component during actual implementation thereof may be freely changed, and the component layout configuration thereof may be more complex.
The present invention provides a news event extraction method. In one embodiment, as illustrated in FIG. 1, the news event extraction method comprises the following steps:
In step SI, a news sentence set containing a query word is acquired from a news corpus according to the query word; and the news sentence set is divided into news sentences containing accurate time and news sentences containing no accurate time. A mode of acquiring data in the news corpus comprises the following operations: dividing contents of a collected news document into news sentences, and storing the news sentences into the news corpus. The query word may be determined according to news events. A symbol “q” may be used for representing a query word, a symbol “C” may be used for representing a corpus and a symbol “s” may be used for representing a sentence. In one embodiment, the query word may be determined according toevents which are highly concerned and are the most reported and mentioned from all aspects.
In step S2, for the news sentence containing accurate time, time in the news sentences is extracted; a plurality of timestamp containers are established aiming at different time, and news sentences with the same time are classified into the same timestamp container; and for each timestamp container, occurrence frequency of each word in the news sentences in the timestamp container statistically collected, and a corresponding characteristic vector is established. In one embodiment, the timestamp container is
, where f is a time variable; C(q) denotes a set of sentences matched with a query word q in a news corpus C; and s.l denotes a time label of a sentence s. Acharacteristic phrase is
where w, denotes each word in sentences related to q; and the characteristic vector is
, where FWJ denotes document word frequency of each word Wj, the document word frequency is
Nwj represents occurrence times of an ith word in a document and k denotes a number of words contained in the document.
In step S3, aiming at the news sentences containing no accurate time, a phrase vector with the same dimensions as the characteristic vector of the timestamp container is established according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and a similarity between the phrase vector and the characteristic vector of the timestamp container is calculated; and if a maximum value of the calculated similarities is greater than a set threshold, the news sentence containing no accurate time is added into a timestamp container corresponding to the highest similarity. The similarity comprises cosine similarity. The news event extraction method further comprises the following operations: modifying the threshold. The user may modify the threshold according to the actual situation during use. In one embodiment, the phrase vector is
, the similarity comprises cosine similarity, and the cosine similarity is
A maximum value of the similarities, i.e., the maximum similarity, is
If the maximum value of the calculated similarities is greater than a preset threshold, the news sentence containing no accurate time is added into a timestamp container corresponding to the highest similarity, the sentence which is added into the timestamp container is called as an effective sentence and the effective sentence is
where su denotes that time of a sentence 5 is /, and d is a threshold adjusted according to the actual situation. Each timestamp container may have corresponding effective sentences, and all effective sentences of a timestamp container corresponding to time /, are called as an effective sentence set ,S’„.
In one embodiment, the news event extraction method further comprises the following operation: aiming at each timestamp container, statistically a number of sentences containing the query word in the timestamp container. In one embodiment, the news event extraction method further comprises the following operations: processing different query words according to the above news event extraction method, statistically collecting a number of sentences containing different query words in each timestamp container and obtaining a ranking result of the query words.
In one embodiment, an integral framework included in the news event extraction method is as illustrated in FIG. 2, and a process of classifying sentences containing no accurate time is as illustrated in FIG. 3. A processing process thereof comprises the following operations: storing collected news corpora in a database according to titles, time and contents. Then, contents of each document are divided into sentences according to sentence end symbols of Chinese such as ”, “! ” and “? ”, and are also stored according to titles, time and contents (sentences). Sentences in a corpus may be divided into the following three classes: 1) sentences containing accurate data: AD (Absolute Date) denotes an expression of time which is complete and accurate to “day ", e.g.. 2010.10.1 and May 12, 2008. and the time may be directly processed into the form of YYYY-MM-DD; 2) sentences containing date related to release time: DCT-RD (date of creation-relative date) denotes that the sentences themselves do not contain accurate date, may be obtained through semantic analysis according to document release time and may be processed into the form of YYYY-MM-DD; and 3) sentences containing no accurate date: UD (underspecified Date) denotes that accurate date cannot be obtained, and cannot be processed into the form of YYYY-MM-DD.
Then, sente nee-level corpora are acquired through query words, time in sentences is extracted according to steps by adopting the following algorithm and the sentences are classified according to time: (1) a timestamp container if containing no accurate date is established; (2) matching is performed to
by using regex (regular expression of time) to obtain S,, (¾ denotes accurate date containing in a sentence Sy); if S), does not exist. 5, is matched with R-Words (e.g.. “one year ago” and “one week later”) to obtain Date Distance (Date Distance denotes a distance in date from DCT); if Date Distance does not exist, Sjis put into Va\ and if Date Distance exists. Date Distance and DCT are calculated to obtain 5,, (e.g,, if a reported date is May 12, 2013, one year ago is May 12, 2012); and (3) if If, (If,denotes a timestamp container corresponding to date t) has already existed, S,·, is put into if,; and if If, does not exist, if, is created and 5,, is put into If,.
Then, a similarity between each sentence and each characteristic phrase is calculated. There are two purposes to calculate the similarity between each sentence and each characteristic phrase, wherein one is to classify partial sentences containing no accurate time into correct timestamp containers, and the other is to remove sentences which are not correlated to the characteristic in each timestamp container. Specific algorithm steps are as follows: (1) word segmentation is performed to all sentences s-t- if, occurrence frequency Fwi of each word fif is statistically collected and a characteristic vector
is established; (2) a vector
with k dimensions (which are the same as the dimensions of the characteristic vector of the timestamp container) is established for each sentence st
(3) the cosine similarity
between each sentence s,& V, and the characteristic vector
is calculated; (4) the sentence sw having the highest similarity to the characteristic vector is found out and the similarity is denoted as
; and (5) a threshold d is set; for s, G Vh if
, s,,· is removed from V,; and for s,· e Vo, if
Sj is put into Vt. In practice, according to experiences, when Max Similarity is closer to 1, since the events may be the same event even though the difference between the sentence and the characteristic phrase is relatively great, the threshold may be set to be lower. When MaxSimilarity is far away from 1, the threshold needs to be set to be higher to realize more accurate removal. The setting of the threshold of the similarity is obtained through repetitive tests and artificial observation and may be modified according to the actual needs.
Finally, the number of sentences is statistically collected. Ranking is performed according to the number of sentences corresponding to the query words, such that the events corresponding to the query words are ranked according to importance and a time axis is presented. For example, documents with texts containing “JilE 4¾’’(Obama) in a database are searched to obtain 6418 records. Sentence segmentation is performed to these records to obtain totally 20468 different sentences containing (Obama). Further, time in the sentences is extracted to obtain 3209 sentences containing accurate time. The time of the 3209 sentences is compared to finally obtain 158 different timestamps, and these sentences are put into corresponding timestamp containers. For a keyword “J4M” (earthquake), the same operations are performed to obtain the following results as shown in the following table. Accordingly, it can be seen that, after sentences containing time elements are correctly classified, averagely about 14.6% of sentences can be correctly ranked.
The present invention further provides a news event extraction system. The news event extraction system may adopt the above-mentioned news event extraction method. In one embodiment, as illustrated in FIG. 4, the news event extraction system 1 comprises a news sentence acquisition module 11, a time-containing news processing module 12 and a no-time-containing news processing module 13, wherein:
The news sentence acquisition module 11 is used for acquiring a news sentence set containing a query word from a news corpus according to the query word; and dividing the news sentence set into news sentences containing accurate time and news sentences containing no accurate time. A mode of acquiring data in the news corpus comprises the following operations: dividing contents of a collected news document into news sentences and storing the news sentences into the news corpus. The query word may be determined according to news events. In one embodiment, the news sentence acquisition module 11 is further used for dividing contents of a collected news document into news sentences and storing the news sentences into the news coipus.
The time-containing news processing module 12 is connected with the news sentence acquisition module 11 and is used for extracting time from the news sentences containing accurate time; establishing a plurality of timestamp containers aiming at different time, and classifying news sentences with the same time into the same timestamp container; and statistically collecting occurrence frequency of each word in the news sentences in each timestamp container, and establishing a corresponding characteristic vector. In one embodiment, the timestamp container is
, where t, is a time variable; C(q) denotes a set of sentences matched with a query word q in a news corpus C; and s.t denotes a time label of a sentence s. Acharacteristic phrase is
where w, denotes each word in sentences related to q; and the characteristic vector is
where Fwj denotes document word frequency of each word Wj, the document word frequency is
, Nwj represents occurrence times of an ith word in a document, and k denotes a number of words contained in the document.
The no-time-containing news processing module 13 is connected with the time-containing news processing module 12 and the news sentence acquisition module 11, and is used for establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity. The similarity comprises cosine similarity. The news event extraction method further comprises the following operation: modifying the threshold. The user may modify the threshold according to the actual situation during use. In one embodiment, the phrase vector is S the similarity comprises cosine similarity and the cosine similarity is
In one embodiment, the news event extraction system 1 further comprises a news event statistics module used for statistically collecting a number of sentences containing different query words in each timestamp container and obtaining a ranking result of the query words. The news event statistics module is further used for statistically collecting a number of sentences containing the quety word in each timestamp container.
To sum up, the news event extraction method and system provided by the present invention can correctly classify sentences containing no time element and put sentences which do not contain time but themselves express news events into correct timestamp containers, thereby the number of extracted new s events is increased and the accuracy οΓ event importance ranking is improved. Therefore, the present invention effectively overcomes various disadvantages in the prior art and thus has a great industrial utilization value.
The above-mentioned embodiments are just used for exemplarily describing the principle and effects of the present invention instead of limiting the present invention. One skilled in the art may make modifications or changes to the above-mentioned embodiments without departing from the spirit and the scope of the present invention. Therefore, all equivalent modifications or changes made by those who have common knowledge in the art without departing from the spirit and technical concept disclosed by the present invention shall be still covered by the claims of the present invention.

Claims (10)

  1. What is claimed is:
    1. A news event extraction method, characterized in that the method comprises: acquiring a news sentence set containing a query word from a news corpus according to the query word; and dividing the news sentence set into news sentences containing accurate time and news sentences containing no accurate time; for the news sentence containing accurate time, extracting time in the news sentences; establishing a plurality of timestamp containers for different time, and classifying news sentences with the same time into the same timestamp container; and for each timestamp container, statistically collecting occurrence frequency of each word in the news sentences in the timestamp container, and establishing a corresponding characteristic vector; and for the news sentences containing no accurate time, establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity.
  2. 2. The news event extraction method according to claim 1, characterized in that the similarity comprises cosine similarity.
  3. 3. The news event extraction method according to claim 1, characterized in that the method further comprises: processing different query words according to the news event extraction method, statistically collecting a number of sentences corresponding to different query words in each timestamp container and obtaining a ranking result of the query words.
  4. 4. The news event extraction method according to claim 1, characterized in that the method further comprises: modifying the threshold.
  5. 5. The news event extraction method according to claim 1, characterized in that the timestamp container is
    where f is a time variable; C(q) denotes a set of sentences matched with a query word q in a news corpus C; and s.t denotes a time label of a sentence s.
  6. 6. The news event extraction method according to claim 5, characterized in that the characteristic phrase is
    , where wj denotes each word insentences related to q\ and the characteristic vector is
    , where Fwj denotes document word frequency of each word wh the document word frequency is
    AHJ represents occurrence times of an ith word in a document and k denotes a number of words contained in the document.
  7. 7. The news event extraction method according to claim 6, characterized in that the phrase vector is
    , the similarity comprises cosine similarity and the cosine similarity is
  8. 8. A news event extraction system, characterized in that the system comprises: a news sentence acquisition module used for acquiring a news sentence set containing a query word from a news corpus according to the query word; and dividing the news sentence set into news sentences containing accurate time and news sentences containing no accurate time; a time-containing news processing module used for extracting time in the news sentences from news sentence containing accurate time; establishing a plurality of timestamp containers aiming at different time, and classifying news sentences with the same time into the same timestamp container; and aiming at each timestamp container, statistically collecting occurrence frequency of each word in the news sentences in the timestamp container, and establishing a corresponding characteristic vector; and a no-time-containing news processing module used for establishing a phrase vector with the same dimensions as the characteristic vector of the timestamp container according to word segmentation of the news sentence containing no accurate time respectively aiming at different timestamp containers, and calculating a similarity between the phrase vector and the characteristic vector of the timestamp container; and if a maximum value of the calculated similarities is greater than a set threshold, adding the news sentence containing no accurate time into a timestamp container corresponding to the highest similarity.
  9. 9. The news event extraction system according to claim 8, characterized in that the news sentence acquisition module is further used for dividing contents of a collected news document into news sentences and storing the news sentences into the news coipus.
  10. 10. The news event extraction system according to claim 8, characterized in that the news event extraction system further comprises a news event statistics module used for statistically collecting a number of sentences containing different query words in each timestamp container and obtaining a ranking result of the query words.
AU2018100678A 2015-11-05 2018-05-18 News events extracting method and system Expired AU2018100678A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2018100678A AU2018100678A4 (en) 2015-11-05 2018-05-18 News events extracting method and system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201510749707.7 2015-11-05
PCT/CN2016/070992 WO2017075912A1 (en) 2015-11-05 2016-01-15 News events extracting method and system
AU2018100678A AU2018100678A4 (en) 2015-11-05 2018-05-18 News events extracting method and system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/070992 Division WO2017075912A1 (en) 2015-11-05 2016-01-15 News events extracting method and system

Publications (1)

Publication Number Publication Date
AU2018100678A4 true AU2018100678A4 (en) 2018-06-14

Family

ID=62527811

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2018100678A Expired AU2018100678A4 (en) 2015-11-05 2018-05-18 News events extracting method and system

Country Status (1)

Country Link
AU (1) AU2018100678A4 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950199A (en) * 2020-08-11 2020-11-17 杭州叙简科技股份有限公司 Earthquake data structured automation method based on earthquake news event
CN112101022A (en) * 2020-08-12 2020-12-18 新华智云科技有限公司 Earthquake event entity linking method
CN112650919A (en) * 2020-11-30 2021-04-13 北京百度网讯科技有限公司 Entity information analysis method, apparatus, device and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950199A (en) * 2020-08-11 2020-11-17 杭州叙简科技股份有限公司 Earthquake data structured automation method based on earthquake news event
CN112101022A (en) * 2020-08-12 2020-12-18 新华智云科技有限公司 Earthquake event entity linking method
CN112101022B (en) * 2020-08-12 2024-02-20 新华智云科技有限公司 Entity linking method for seismic event
CN112650919A (en) * 2020-11-30 2021-04-13 北京百度网讯科技有限公司 Entity information analysis method, apparatus, device and storage medium
CN112650919B (en) * 2020-11-30 2023-09-01 北京百度网讯科技有限公司 Entity information analysis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110874531B (en) Topic analysis method and device and storage medium
AU2018100678A4 (en) News events extracting method and system
CN106599054B (en) Method and system for classifying and pushing questions
WO2017075912A1 (en) News events extracting method and system
CN108573045A (en) A kind of alignment matrix similarity retrieval method based on multistage fingerprint
CN106844482B (en) Search engine-based retrieval information matching method and device
CN108363694B (en) Keyword extraction method and device
CN110210038B (en) Core entity determining method, system, server and computer readable medium thereof
KR101651780B1 (en) Method and system for extracting association words exploiting big data processing technologies
CN107168953A (en) The new word discovery method and system that word-based vector is characterized in mass text
Suarez et al. Combining financial word embeddings and knowledge-based features for financial text summarization uc3m-mc system at fns-2020
JP6867963B2 (en) Summary Evaluation device, method, program, and storage medium
JP4873739B2 (en) Text multiple topic extraction apparatus, text multiple topic extraction method, program, and recording medium
Koirala et al. A Nepali Rule Based Stemmer and its performance on different NLP applications
CN106777140B (en) Method and device for searching unstructured document
CN106776724B (en) Question classification method and system
JP2009015795A (en) Text segmentation apparatus, text segmentation method, program, and recording medium
WO2021027085A1 (en) Method and device for automatically extracting text keyword, and storage medium
Wu et al. Wn-salience: A corpus of news articles with entity salience annotations
JP2013101679A (en) Text segmentation device, method, program, and computer-readable recording medium
Prayoga et al. Unsupervised Twitter Sentiment Analysis on The Revision of Indonesian Code Law and the Anti-Corruption Law using Combination Method of Lexicon Based and Agglomerative Hierarchical Clustering
JP5215051B2 (en) Text segmentation apparatus and method, program, and computer-readable recording medium
JP2008197952A (en) Text segmentation method, its device, its program and computer readable recording medium
CN109446516B (en) Data processing method and system based on theme recommendation model
Kiomourtzis et al. NOMAD: Linguistic Resources and Tools Aimed at Policy Formulation and Validation.

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry