CN108363688B - Named entity linking method fusing prior information - Google Patents
Named entity linking method fusing prior information Download PDFInfo
- Publication number
- CN108363688B CN108363688B CN201810103629.7A CN201810103629A CN108363688B CN 108363688 B CN108363688 B CN 108363688B CN 201810103629 A CN201810103629 A CN 201810103629A CN 108363688 B CN108363688 B CN 108363688B
- Authority
- CN
- China
- Prior art keywords
- entity
- article
- word
- candidate
- idf
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a named entity linking method fusing prior information. The method comprises the following steps: (1) extracting character strings, namely a candidate entity table, a human name list and a place name list, from Wikipedia data dump and Freebase data dump; (2) representing each article in the Wikipedia data dump as word frequency/inverse document frequency tf-idf characteristics, and extracting the universality characteristic of each character string relative to a candidate entity; (3) inquiring and expanding the entity mentions, and generating candidate entities for the entity mentions by using the character string-candidate entity table in the step (1); (4) extracting the characteristics of the article where the entity mentions to obtain the inverse document frequency and the important word collision rate of the article; (5) and (4) calculating the association degrees between the entity and each candidate entity thereof by using the extracted features in the steps (2) and (4), and taking the highest association degree as an entity link result. The invention breaks through the limitation of lack of the corpus and provides a reliable entity link recommendation result for the user, wherein prior information is added into the entity universality characteristic.
Description
Technical Field
The invention relates to natural language processing, in particular to a named entity linking method fusing prior information.
Background
Natural Language Processing (NLP) is a cross discipline integrating linguistics and computer disciplines. Named Entity Linking (NEL) is a basic task in natural language processing, and aims to eliminate ambiguity caused by linguistic phenomena such as alias, reference, one-word-multiple meaning and the like and establish a corresponding relation between proper nouns (Entity names) appearing in a text and entities referred by the proper nouns in a knowledge base. The problem is defined as finding the entities to which the mentions refer from a specified knowledge base, given a piece of text and the mentions in the text (the mentions, i.e. the strings to be linked).
Entity linking this technique of establishing links between text and knowledge bases plays a very important role in information extraction. A relationship Extraction (relationship Extraction) is a typical example of the information Extraction technology that requires entity linkage. The purpose of the relation extraction problem is to extract the association relation between different entities from the text, so that the premise that the entity mention in the text is found and the entity corresponding to the entity mention is found through the entity link is further analysis.
In addition, the entity link is equivalent to adding additional information for the original text, so that the entity link can also be used in some natural language processing and text mining problems, is favorable for further understanding the text and obtains better effect.
Implementing entity linking is typically done in multiple steps, the most important two of which are Candidate generation (CandidateGeneration) and Candidate disambiguation (candidateranking). The candidate generation step finds, based on the current mention (the name used to find which entities it may refer to as candidates; then the candidate disambiguation step selects the best candidate as the final link result based on the context of the mention and some characteristics of the candidate entities themselves.
The common practice for this step of candidate generation is to construct a dictionary in advance, store which entities each name may correspond to, and when entity linking is to be performed, find the candidate entities from the dictionary based on the name currently mentioned. The dictionary is typically built using information provided in a knowledge base.
In the candidate disambiguation step, common practice includes cooperative and non-cooperative approaches. In the collaborative method, when entity linking is performed, multiple references in the same context are generally considered at the same time, and it is desirable that the degree of association between their respective target entities in the linking result is as large as possible. While non-cooperative approaches are each referred to individually. The approach we used is a non-synergistic approach. Non-synergistic methods are generally faster and have less than optimal performance compared to synergistic methods.
Traditional non-collaborative methods may design a series of features including: name Features (Surface Features) for measuring similarity between the name string and the candidate entity name, such as the number of words in the name string and the candidate entity name; context Features (Context Features) for measuring semantic matching degree between the candidate entity and the mention Context, such as TF-IDF similarity between the document where the mention is located and the description of the candidate entity, whether words in the Wikipedia page title of the candidate entity appear in the document where the mention is located, and the like; other features such as the number of country names co-occurring in the document and candidate entity description of mention, the number of country names co-occurring in all aliases of candidate entities and in documents of mention, etc.
Since such heuristic features require expert knowledge, the original feature engineering is ineffective once the knowledge base or corpus is changed, and we hope to find a better effect by using as few features as possible.
Disclosure of Invention
The invention aims to provide a named entity linking method fusing prior information, which aims to link recognized entities in natural texts to a target knowledge base (Freebase) so as to provide a basis for subsequent tasks such as information extraction and the like.
Therefore, the IWHR (Important Word Hit Rate) feature is invented and combined with two features of common and tf/idf to judge the matching degree between the entity and the object, the original name feature is ensured by a candidate generation process, and the common feature adds prior information into the method.
Because the parameters of the general named entity link model need to be determined through the training corpora, the training corpora are difficult to obtain, and the three are combined together in a non-training mode by the method provided by the invention, and the suggested parameter setting is given.
In addition, in order to compensate the problem that the non-cooperative method does not consider the context, a query expansion is added before entity linking, and the query expansion is specially optimized for the names of people and places which possibly point to the same entity in the same article.
Tf/idf (term frequency/inverse document frequency feature, i.e., Tf-idf) is commonly used and measures the degree of similarity between articles, and we introduce this feature here to measure the degree of similarity between a reference context and a candidate entity context.
Common, reflecting the probability of candidate entities as mentions, the introduction of this feature amounts to the addition of a priori information that, in the case of insufficient context, can be used to judge the direction of mention. The calculation formula is as follows:
is provided withFor the anchor text set, shown as name string s and linked to the page corresponding to entity e, AsIs a set of anchor text displayed as a literal string s. Then there are:
IWHR remedies the deficiency of tf/idf, and focuses more on the consideration of important words appearing in the context of articles, and the calculation formula is as follows:
let e be a candidate entity of Wikipedia, m be the character string to be recognized, WdSet of words for the article in which m is located, WeE, then IWHR (e, m) may be calculated according to equation (2) where T is a manually set threshold, which the method sets to
The invention is realized by the following technical scheme:
the named entity linking method fusing prior information comprises the following steps:
s1: extracting character strings, namely a candidate entity table, a human name list and a place name list from the Wikipedia data dump and the Freebase data dump;
s2: representing each article in the Wikipedia data dump as a word frequency-inverse document frequency tf-idf characteristic, extracting and storing a commonality characteristic of each character string relative to a candidate entity;
s3: inquiring and expanding the name list and the place name list obtained in the S1, and generating candidate entities for entity mentions by using the character string-candidate entity table obtained in the S1;
s4: calculating the important word collision rate IWHR of the entity relative to the candidate entity;
s5: and calculating the association degrees of the entity and each candidate entity thereof according to the tf-idf feature, common feature and IWHR feature extracted in the S2 and the S4, and taking the highest association degree as an entity link result.
The steps can be realized in the following way:
s1 specifically includes the following steps:
s11: analyzing the Wikipedia data dump and extracting an article D containing entities in WikipediaeAnchor text a in an articleeEntity number W corresponding to the articleidRedirecting pages and disambiguating pages, and further generating a character string-candidate entity table str2 entry;
s12: all the person names and place names in the Freebase data dump are extracted to form a person name list Pname and a place name list Plocation.
S2 specifically includes the following steps:
s21: segmenting words of each article of Wikipedia by adopting a natural language processing tool StanfoldCoreNLP, and removing stop words by using a stop word dictionary to obtain a word list;
s22: and calculating the inverse document frequency idf of all words in each article based on the word list, wherein the idf calculation formula of the word is as follows:
the document number of the corpus is the total number of articles in Wikipedia;
s23: calculating the word frequency tf of all words in each article based on the word list, wherein the tf calculation formula of the word is as follows:
s24: and calculating word frequency-inverse document frequency tf-idf vectors of all words in each article of Wikipedia according to results of S22 and S23, wherein the tf-idf vector of the word is expressed as follows:
s25: tfidf obtained based on S24word(word), the word frequency-inverse document frequency values of the first 20 words arranged from big to small are kept as tf-idf characteristics of the article and are marked as tfidf (document);
s26: the commonality characteristic of each string relative to the candidate entity is calculated according to the following formula:
where e is a candidate entity, m is a string,set of anchor text with surface name m and linking entity e, AmFor an anchor text set with a surface name m, | · | represents the number of elements in the set;
s27: and storing the tf-idf characteristic of each article and the common characteristic of each character string relative to the candidate entity.
S3 specifically includes the following steps:
s31: inquiring Pname and Plocation according to the entity mention S, if the character string is in any table, then turning to step S32, otherwise, turning to step S33;
s32: checking whether a character string S ' exists in the text of S, wherein S is the character sub-string of S ', if so, replacing S with S ' and then switching to S33, and if not, directly switching to S33;
s33: and querying str2 entry by using the character string s to obtain all candidate entities of the character string.
S4 specifically includes the following steps:
s41: using a StanfoldCoreNLP tool to perform word segmentation on the article where the entity is mentioned, and removing stop words to obtain a word list;
s42: obtaining an idf value idf (w) of each word in the article where the entity is mentioned according to the idf calculation formula in S22;
s43: the important word collision rate IWHR (e, m) is calculated according to the following formula:
wherein e is a candidate entity, m is an entity mention, WdSet of words for the article in which m is located, WeT is the set of words in the article where e is located, and T is the set idf threshold. The idf threshold can be set to
In S5, the specific steps of calculating the association degrees between the entity and its respective candidate entities, and using the highest association degree as the entity link result are as follows:
s51: extracting article d of entity mentioning mmArticle d of the location of a candidate entity ee;
S52: obtaining article d from the stored result of S2mAnd article deTf-idf feature of (a);
s53: for each candidate entity e, computing the tf-idf similarity between the entity mention and its candidate entity:
wherein: | represents the modulus of the vector;
s54: obtaining common characteristics and IWHR characteristics of the entity mention m relative to the candidate entity e from the calculation results of S2 and S4;
s55: for each candidate entity e, calculating the similarity between the entity mention and the candidate entity according to the following formula:
similarity(e,m)=a×log(commoness(e,m))+b×log(tfidfsimilarity(e,m))+c×log(IWHR(e,m))
wherein a, b and c are constants;
s56: calculate the final entity linking result eresult:
eresult=argmaxe(similarity(e,m))。
a, b and c can be learned through a neural network, and can also be manually debugged and set, and the suggestions are respectively set to 1.0,6.0 and 1.0.
The invention only uses three characteristics of entity universality, word frequency/inverse document frequency, important word collision rate and the like, breaks through the limitation of lack of linguistic data, and provides a reliable entity link recommendation result for a user, wherein prior information is added into the entity universality characteristic.
Drawings
FIG. 1 is a flowchart of a work flow of extracting resources using Wikipedia data dump and Freebase data dump;
fig. 2 is a workflow diagram of the main steps of the named entity linking method with the prior information.
Detailed Description
The invention is further described with reference to the following figures and detailed description.
The invention mainly combines common, tf/idf and IWHR three characteristics aiming at the command entity link task, and realizes a named entity link method fusing prior information. Because the used characteristics are less, the parameters needing fitting are less finally, and the method is more convenient under the condition of migrating the knowledge base and the linguistic data.
As shown in fig. 1 and 2, a named entity linking method fusing prior information includes the following steps:
s1: extracting character strings, namely a candidate entity table, a human name list and a place name list from the Wikipedia data dump and the Freebase data dump; the method is implemented as follows:
s11: analyzing the Wikipedia data dump and extracting an article D containing entities in WikipediaeAnchor text a in an articleeEntity number W corresponding to the articleidRedirecting pages and disambiguating pages, and further generating a character string-candidate entity table str2 entry;
s12: all the person names and place names in the Freebase data dump are extracted to form a person name list Pname and a place name list Plocation.
S2: representing each article in the Wikipedia data dump as a word frequency-inverse document frequency tf-idf characteristic, extracting and storing a commonality characteristic of each character string relative to a candidate entity; the method is implemented as follows:
s21: segmenting words of each article of Wikipedia by adopting a natural language processing tool StanfoldCoreNLP, and removing stop words by using a stop word dictionary to obtain a word list;
s22: and calculating the inverse document frequency idf of all words in each article based on the word list, wherein the idf calculation formula of the word is as follows:
the document number of the corpus is the total number of articles in Wikipedia;
s23: calculating the word frequency tf of all words in each article based on the word list, wherein the tf calculation formula of the word is as follows:
s24: and calculating word frequency-inverse document frequency tf-idf vectors of all words in each article of Wikipedia according to results of S22 and S23, wherein the tf-idf vector of the word is expressed as follows:
s25: tfidf obtained based on S24word(word), the word frequency-inverse document frequency values of the first 20 words arranged from big to small are kept as tf-idf characteristics of the article and are marked as tfidf (document);
s26: the commonality characteristic of each string relative to the candidate entity is calculated according to the following formula:
where e is a candidate entity, m is a string,set of anchor text with surface name m and linking entity e, AmFor an anchor text set with a surface name m, | · | represents the number of elements in the set;
s27: and storing the tf-idf characteristic of each article and the common characteristic of each character string relative to the candidate entity.
S3: inquiring and expanding the name list and the place name list obtained in the S1, and generating candidate entities for entity mentions by using the character string-candidate entity table obtained in the S1; the method is implemented as follows:
s31: inquiring Pname and Plocation according to the entity mention S, if the character string is in any table, then turning to step S32, otherwise, turning to step S33;
s32: checking whether a character string S ' exists in the text of S, wherein S is the character sub-string of S ', if so, replacing S with S ' and then switching to S33, and if not, directly switching to S33;
s33: and querying str2 entry by using the character string s to obtain all candidate entities of the character string.
S4: calculating the important word collision rate IWHR of the entity relative to the candidate entity; the method is implemented as follows:
s41: using a StanfoldCoreNLP tool to perform word segmentation on the article where the entity is mentioned, and removing stop words to obtain a word list;
s42: obtaining an idf value idf (w) of each word in the article where the entity is mentioned according to the idf calculation formula in S22;
s43: the important word collision rate IWHR (e, m) is calculated according to the following formula:
wherein e is a candidate entity, m is an entity mention, WdSet of words for the article in which m is located, WeT is the set of words in the article where e is located, and T is the set idf threshold. The idf threshold may be set to
S5: and calculating the association degrees of the entity and each candidate entity thereof according to the tf-idf feature, common feature and IWHR feature extracted in the S2 and the S4, and taking the highest association degree as an entity link result. The method is implemented as follows:
s51: extracting article d of entity mentioning mmArticle d of the location of a candidate entity ee;
S52: obtaining article d from the stored result of S2mAnd article deTf-idf feature of (a);
s53: for each candidate entity e, computing the tf-idf similarity between the entity mention and its candidate entity:
wherein: | represents the modulus of the vector;
s54: obtaining common characteristics and IWHR characteristics of the entity mention m relative to the candidate entity e from the calculation results of S2 and S4;
s55: for each candidate entity e, calculating the similarity between the entity mention and the candidate entity according to the following formula:
similarity(e,m)=a×log(commoness(e,m))+b×log(tfidfsimilarity(e,m))+c×log(IWHR(e,m))
wherein a, b and c are constants;
s56: calculate the final entity linking result eresult:
eresult=argmaxe(similarity(e,m))。
a, b and c can be manually set to 1.0,6.0 and 1.0 respectively.
The method is applied to the following examples in order that those skilled in the art will better understand the specific implementation of the present invention.
Examples
Taking the document of the entity discovery and linking subtask of the text analysis conference (text analysis conference) in 2017 as an example, the method is applied to text naming entity linking (the resource extraction process is not detailed, and the process is more complex), and specific parameters and methods in each step are as follows:
1. obtaining a reference to be linked by using a named entity recognition tool or manual marking on an original document set, wherein the reference is specifically a triple of an article, a first letter position and a tail letter position of a given character string;
2. writing a script to extract all contents in the document set (removing xml tags), wherein each article is used as a file;
3. performing word segmentation on each article by using a natural language processing tool StanfoldCoreNLP, removing stop words, and counting the total word number of each article;
4. counting the occurrence times of each word for each article, and calculating the tf value of each word in each article, wherein the calculation formula is as follows:
5. and (3) calculating the tf-idf value of each word in each article according to the word list obtained by the Wikipedia statistics, the idf value of the word and the tf value obtained by the calculation, wherein the calculation formula is as follows:
6. in each article, the tf-idf value takes the first 20 words and corresponding words as the tf-idf characteristics of the article according to the descending order;
7. inquiring and expanding each mention identified in the step 1 according to whether the name of the person and the name of the place are inquired and expanded, if so, inquiring and expanding are carried out, and the judgment mode is as follows:
if the name is referred to in the name list and the place name list, the name is considered as the name and the place name;
the inquiry expanding mode is as follows:
determining whether a character string s 'is in the article where s is located before s, wherein s' is an abbreviation of s; or s is part of s ', e.g. s' is
Hilary Clinton, s is Clinton; if this is the case, s is replaced by s';
8. for each extracted query character string-candidate entity-common list, obtaining candidate entities of the character string and corresponding common characteristics;
9. for each mention, calculating the tf-idf similarity between the article in which the mention is made and their corresponding candidate entity article by the following formula:
10. for each mention, calculating the IWHR similarity between the article in which the mention is made and their corresponding candidate entities by the following formula:
let e be a candidate entity of Wikipedia, m be the character string to be recognized, WdSet of words for the article in which m is located, WeE, then IWHR (e, m) may be calculated according to the following equation (2), T being a manually set threshold value, which the method sets to
11. For each mention, and for each candidate entity for them, calculating the mention entity relevance by the following formula:
similarity(e,m)=a×log(commoness(e,m))+b×log(tfidfsimilarity(e,m))+c×log(IWHR(e,m))(5)
(a, b, c) is set to (1.0,6.0, 1.0);
12. take e, which maximizes the above equation, as the result of the linkage for m, i.e., the following equation:
eresult=argmaxe(similarity(e,m)) (6)
the following table is the partial final link result for the selected document.
WORD | Beg | End | KBid |
Turkey | 2279 | 2284 | m.01znc_ |
Microsoft | 2620 | 2628 | m.04sv4 |
Nam Dinh | 3703 | 3710 | m.07m1dj |
the Beatles | 2078 | 2088 | m.07c0j |
Gaisano mall | 2642 | 2653 | m.09rxbx2 |
Claims (3)
1. A named entity linking method fusing prior information is characterized by comprising the following steps:
s1: extracting character strings, namely a candidate entity table, a human name list and a place name list from the Wikipedia data dump and the Freebase data dump;
s2: representing each article in the Wikipedia data dump as a word frequency-inverse document frequency tf-idf characteristic, extracting and storing a commonality characteristic of each character string relative to a candidate entity;
s3: inquiring and expanding the name list and the place name list obtained in the S1, and generating candidate entities for entity mentions by using the character string-candidate entity table obtained in the S1;
s4: calculating the important word collision rate IWHR of the entity relative to the candidate entity;
s5: calculating the association degrees of the entity and each candidate entity thereof according to the tf-idf feature, common feature and IWHR feature extracted in S2 and S4, and taking the highest association degree as an entity link result;
s1 specifically includes the following steps:
s11: analyzing the Wikipedia data dump and extracting an article D containing entities in WikipediaeAnchor text a in an articleeEntity number W corresponding to the articleidRedirecting pages and disambiguating pages, and further generating a character string-candidate entity table str2 entry;
s12: extracting all the person names and place names in the Freebase data dump to form a person name list Pname and a place name list Plocation;
s2 specifically includes the following steps:
s21: segmenting words of each article of Wikipedia by adopting a natural language processing tool StanfoldCoreNLP, and removing stop words by using a stop word dictionary to obtain a word list;
s22: and calculating the inverse document frequency idf of all words in each article based on the word list, wherein the idf calculation formula of the word is as follows:
the document number of the corpus is the total number of articles in Wikipedia;
s23: calculating the word frequency tf of all words in each article based on the word list, wherein the tf calculation formula of the word is as follows:
s24: and calculating word frequency-inverse document frequency tf-idf vectors of all words in each article of Wikipedia according to results of S22 and S23, wherein the tf-idf vector of the word is expressed as follows:
s25: tfidf obtained based on S24word(word), the word frequency-inverse document frequency values of the first 20 words arranged from big to small are kept as tf-idf characteristics of the article and are marked as tfidf (document);
s26: the commonality characteristic of each string relative to the candidate entity is calculated according to the following formula:
where e is a candidate entity, m is a string,set of anchor text with surface name m and linking entity e, AmFor an anchor text set with a surface name m, | · | represents the number of elements in the set;
s27: storing the tf-idf characteristic of each article obtained by calculation and the common characteristic of each character string relative to the candidate entity;
s3 specifically includes the following steps:
s31: inquiring Pname and Plocation according to the corresponding character string S mentioned by the entity, if the character string S is in any table, then turning to step S32, otherwise, turning to step S33;
s32: checking whether a character string S ' exists in the text of S, wherein S is the character sub-string of S ', if so, replacing S with S ' and then switching to S33, and if not, directly switching to S33;
s33: querying str2 entry by using the character string s to obtain all candidate entities of the character string s;
s4 specifically includes the following steps:
s41: using a StanfoldCoreNLP tool to perform word segmentation on the article where the entity is mentioned, and removing stop words to obtain a word list;
s42: obtaining an idf value idf (w) of each word in the article where the entity is mentioned according to the idf calculation formula in S22;
s43: the important word collision rate IWHR (e, m) is calculated according to the following formula:
wherein e is a candidate entity, m is an entity mention, WdSet of words for the article in which m is located, WeThe word set of the article where e is located, and T is a set idf threshold value;
in S5, the specific steps of calculating the association degrees between the entity and its candidate entities, and using the highest association degree as the entity link result are as follows:
s51: extracting article d of entity mentioning mmArticle d of the location of a candidate entity ee;
S52: obtaining article d from the stored result of S2mAnd article deTf-idf feature of (a);
s53: for each candidate entity e, computing the tf-idf similarity between the entity mention and its candidate entity:
wherein: | represents the modulus of the vector;
s54: obtaining common characteristics and IWHR characteristics of the entity mention m relative to the candidate entity e from the calculation results of S2 and S4;
s55: for each candidate entity e, calculating the similarity between the entity mention and the candidate entity according to the following formula:
similarity(e,m)=a×log(commoness(e,m))+b×log(tfidfsimilarity(e,m))+c×log(IWHR(e,m))
wherein a, b and c are constants;
s56: calculate the final entity linking result eresult:
eresult=arg maxe(similarity(e,m))。
3. The method as claimed in claim 1, wherein a, b, and c are respectively set to 1.0,6.0, and 1.0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103629.7A CN108363688B (en) | 2018-02-01 | 2018-02-01 | Named entity linking method fusing prior information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103629.7A CN108363688B (en) | 2018-02-01 | 2018-02-01 | Named entity linking method fusing prior information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108363688A CN108363688A (en) | 2018-08-03 |
CN108363688B true CN108363688B (en) | 2020-04-28 |
Family
ID=63004109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810103629.7A Active CN108363688B (en) | 2018-02-01 | 2018-02-01 | Named entity linking method fusing prior information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363688B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866385B (en) * | 2018-08-17 | 2024-04-05 | 阿里巴巴(中国)有限公司 | Method and device for publishing outside of electronic book and readable storage medium |
CN109325230B (en) * | 2018-09-21 | 2021-06-15 | 广西师范大学 | Word semantic relevance judging method based on wikipedia bidirectional link |
CN110147401A (en) * | 2019-05-22 | 2019-08-20 | 苏州大学 | Merge the knowledge base abstracting method of priori knowledge and context-sensitive degree |
CN111814477B (en) * | 2020-07-06 | 2022-06-21 | 重庆邮电大学 | Dispute focus discovery method and device based on dispute focus entity and terminal |
CN113392220B (en) * | 2020-10-23 | 2024-03-26 | 腾讯科技(深圳)有限公司 | Knowledge graph generation method and device, computer equipment and storage medium |
CN113157861B (en) * | 2021-04-12 | 2022-05-24 | 山东浪潮科学研究院有限公司 | Entity alignment method fusing Wikipedia |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2251795A2 (en) * | 2009-05-12 | 2010-11-17 | Comcast Interactive Media, LLC | Disambiguation and tagging of entities |
CN104462126A (en) * | 2013-09-22 | 2015-03-25 | 富士通株式会社 | Entity linkage method and device |
CN107608960A (en) * | 2017-09-08 | 2018-01-19 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus for naming entity link |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9871702B2 (en) * | 2016-02-17 | 2018-01-16 | CENX, Inc. | Service information model for managing a telecommunications network |
-
2018
- 2018-02-01 CN CN201810103629.7A patent/CN108363688B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2251795A2 (en) * | 2009-05-12 | 2010-11-17 | Comcast Interactive Media, LLC | Disambiguation and tagging of entities |
CN104462126A (en) * | 2013-09-22 | 2015-03-25 | 富士通株式会社 | Entity linkage method and device |
CN107608960A (en) * | 2017-09-08 | 2018-01-19 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus for naming entity link |
Also Published As
Publication number | Publication date |
---|---|
CN108363688A (en) | 2018-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363688B (en) | Named entity linking method fusing prior information | |
CN109508414B (en) | Synonym mining method and device | |
US7346487B2 (en) | Method and apparatus for identifying translations | |
CN107870901B (en) | Method, recording medium, apparatus and system for generating similar text from translation source text | |
CN108681537A (en) | Chinese entity linking method based on neural network and word vector | |
JP6187877B2 (en) | Synonym extraction system, method and recording medium | |
CN108920633B (en) | Paper similarity detection method | |
US20140289238A1 (en) | Document creation support apparatus, method and program | |
CN111930929A (en) | Article title generation method and device and computing equipment | |
CN109033212B (en) | Text classification method based on similarity matching | |
US11537795B2 (en) | Document processing device, document processing method, and document processing program | |
WO2014002774A1 (en) | Synonym extraction system, method, and recording medium | |
CN111276149A (en) | Voice recognition method, device, equipment and readable storage medium | |
CN112069312A (en) | Text classification method based on entity recognition and electronic device | |
CN112015907A (en) | Method and device for quickly constructing discipline knowledge graph and storage medium | |
CN115033773A (en) | Chinese text error correction method based on online search assistance | |
JP6867963B2 (en) | Summary Evaluation device, method, program, and storage medium | |
Sagcan et al. | Toponym recognition in social media for estimating the location of events | |
CN110008312A (en) | A kind of document writing assistant implementation method, system and electronic equipment | |
JP6495124B2 (en) | Term semantic code determination device, term semantic code determination model learning device, method, and program | |
JP2010097239A (en) | Dictionary creation device, dictionary creation method, and dictionary creation program | |
Gaizauskas et al. | Extracting bilingual terms from the Web | |
Khodak et al. | Extending and improving wordnet via unsupervised word embeddings | |
CN109002508B (en) | Text information crawling method based on web crawler | |
CN110083817B (en) | Naming disambiguation method, device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |