CN105183770A - Chinese integrated entity linking method based on graph model - Google Patents

Chinese integrated entity linking method based on graph model Download PDF

Info

Publication number
CN105183770A
CN105183770A CN201510475469.5A CN201510475469A CN105183770A CN 105183770 A CN105183770 A CN 105183770A CN 201510475469 A CN201510475469 A CN 201510475469A CN 105183770 A CN105183770 A CN 105183770A
Authority
CN
China
Prior art keywords
entity
candidate
item
text
referent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510475469.5A
Other languages
Chinese (zh)
Inventor
刘峤
刘瑶
秦志光
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201510475469.5A priority Critical patent/CN105183770A/en
Publication of CN105183770A publication Critical patent/CN105183770A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention discloses a Chinese integrated entity linking method based on a graph model. An ambiguous entity in a text can be mapped into a specific entity in a real world, in order to provide aid for knowledge base expansion, information extraction and search engines. The method mainly comprises three parts of generating a candidate entity, constructing an entity indicator diagram, and disambiguating an integrated entity. For a given text, an entity referent item therein is recognized to obtain the candidate entity. The entity referent item and the candidate entity thereof are regarded as graph nodes to construct an entity referent graph. An in-degree and out-degree algorithm is applied to the entity indicator diagram for implementing disambiguation of multiple ambiguous entities in the text. The present invention does not depend on the knowledge base completely in the establishment of the entity indicator diagram, and also can implement incremental evidence mining to find evidence on an encyclopedia webpage. Dependence path analysis is employed to find the possibly related entity referent item. When the dependence path sizes of two entity referent items are within a set range, the two entity referent items are regarded as the possibly related entity referent items. Further, whether their candidate entities have relations in the real world is determined, so that the efficiency of disambiguation is greatly improved.

Description

The integrated entity link method of a kind of Chinese based on graph model
Technical field
The present invention relates to natural language processing (NLP) field, be specifically related to entity link, knowledge base expansion, information extraction, question answering system and search engine optimization.
Background technology
Traditional Chinese entity link method censures the context similarity of item and candidate's entity by comparing entity, then choose the destination object of the maximum candidate of similarity as link.But this method existing defects, first, it does not utilize the semantic dependency of inter-entity in text, and this correlativity exactly can improve the accuracy of the qi that disappears to a great extent; Secondly, traditional Chinese entity link method once can only carry out disambiguation to an ambiguity entity, and the method for the lower and similarity-rough set of efficiency can not obtain good effect for the entity link of short text.
Entities all in text, when building entity index map and calculating the degree of correlation, to be censured Xiang Jun and are regarded as and may be correlated with, then judging whether their candidate's entity exists relation truly in real world by existing integrated entity link method.The method is irrational because entity censure item only censure item to a few entities in text in the ordinary course of things may be relevant.If all entities in text are censured Xiang Jun be considered as being correlated with, much unnecessary computing time can be expended when building entity index map, adding the cost price of calculating.
Existing Chinese knowledge base is less, and the entity mobility models that knowledge base comprises is imperfect, is not well positioned to meet the requirement of entity link.Therefore, owing to being subject to the quantitative limitation of knowledge base knowledge, the whole structure of entity link will be greatly affected.
Summary of the invention
The invention provides the integrated entity link method of a kind of Chinese based on graph model.By seeking optimum possible related entities and increment evidence excavation structure entity index map, disambiguation is carried out to ambiguity entity multiple in text.In order to solve, knowledge base knowledge in existing entity link method is not enough, the inefficient defect of structure entity index map, provides one more effective entity link method.
The invention provides the integrated entity link method of a kind of Chinese based on graph model, comprising:
For given text, the entity first identified wherein censures item, obtains candidate's entity.Then entity is censured item and candidate's entity is considered as node of graph, the correlativity of inter-entity represents limit structure entity denotion figure.Finally apply out in-degree algorithm at entity index map, realize the disambiguation to ambiguity entity multiple in text.
The invention provides the integrated entity link method of a kind of Chinese based on graph model, also comprise:
When building entity index map computational entity correlativity, if the knowledge in current knowledge storehouse can not meet the requested knowledge (can not find the relation of inter-entity in knowledge base) needed for entity link, then the interactive encyclopaedia webpage excavating entity by increment evidence goes to find evidence.
In order to reduce the time cost built spent by entity index map, utilizing interdependent path analysis to find optimum possibility related entities and censuring item.Just think that they may be correlated with when the interdependent path size of two entities denotion items is in range of set value, judge whether their candidate's entity exists relation in real world further.
In order to carry out disambiguation to the multiple entities in text simultaneously, the present invention applies out in-degree algorithm on entity index map, and according to candidate's entity go out in-degree and prior probability carries out importance ranking to candidate's entity, select the maximum candidate's entity of importance as Object linking object.
Compared with prior art, the invention has the beneficial effects as follows mainly contain following some:
1. the present invention can carry out disambiguation to the multiple entities in text simultaneously, and accuracy rate is better than prior art.
2. to build the efficiency of entity index map higher in the present invention, and the entity index map of structure is more accurate.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention, be briefly described to the accompanying drawing of use required for the present invention below:
Fig. 1 is the process flow diagram of integrated entity link
Fig. 2 is the schematic diagram of candidate's solid generation
Fig. 3 is the schematic diagram of entity index map structure
Fig. 4 is the schematic diagram of integrated entity disambiguation
Embodiment
Core concept of the present invention is: utilize the relation of entity in knowledge base to build entity index map, the entity in text and their candidate's entity are regarded as the node of figure, and the limit between node represents their semantic dependency.If when they do not exist relation in knowledge base, excavate the encyclopaedia page corresponding to entity by increment evidence and find evidence structure entity index map.Finally on entity index map, apply out in-degree algorithm, realize the integrated link to ambiguity entity multiple in one text thus.
In order to make object of the present invention, method and a little clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is the process flow diagram of the integrated entity link method of the present invention, and as shown in Figure 1, the method for the integrated entity of the Chinese based on graph model link is primarily of candidate's solid generation, entity index map structure, integrated entity disambiguation three part composition.Specific embodiment is as follows:
100, candidate's solid generation
Candidate's solid generation is a step the most basic of whole method, and as shown in Figure 2, it mainly comprises generation two parts of Entity recognition and candidate's entity.For step 201 Entity recognition, the present invention carries out Entity recognition by the part-of-speech tagging (nr shows name, and ns shows place name, and nt shows mechanism's name, and nz shows other specific terms) of the participle instrument ICTCLAS of the Chinese Academy of Sciences.Because Chinese language has certain singularity, in order to ensure the accuracy of Entity recognition and comprehensive, while utilizing ICTCLAS part-of-speech tagging, create a name dictionary for some proper nouns and more indiscernible physical name.
For the generation of step 203 candidate entity, lucene is adopted to carry out index to knowledge base herein, relatively in input text, whether entity denotion item is identical with the index of entity in knowledge base, if identical, then by these entities as being candidate's entity (note: when setting up knowledge base, for each entity establishes index and the index of all candidate's entities of same ambiguity entity is identical) that in text, entity censures item.
101, entity index map structure
Entity is censured the node that item and their candidate's entity regard entity index map as, the relation of inter-entity represents limit.Entity index map is a digraph, and as shown in Figure 3, its structure mainly comprises calculating and the excavation of increment evidence of prior probability (context similarity) and the entity degree of correlation.
In the prior probability of candidate's entity is given input text, entity censures the probability size that item points to this candidate's entity, and to the number reducing node of graph, the speed accelerating entity link has vital role.The prior probability of the cosine similarity of entity being censured the input text of item and its candidate's entity encyclopaedia page alternatively entity, candidate's entity that prior probability is less than setting value will be deleted by from candidate's entity sets.
The calculating of step 301 degree of correlation is the core of graph model, is the foundation that in entity index map, limit is set up.Relatedness computation scheme of the present invention is as follows:
1) utilize dependency analysis to set to resolve input text, item is censured to each entity, censure item according to entity that may be relevant to its optimum in its text of interdependent path finding, when the interdependent path size when between two entities denotion items is in range of set value, the present invention then thinks that they may be correlated with.
2) item is censured for most possible relevant entity, obtain their candidate's entity sets, for all candidate's entity node, first judge whether two entity node exist direct relation in knowledge base, if there is direct relation, then add a directed edge between the two nodes, direction is by the terminal of the starting point points relationship of relation.If there is not direct relation in two entity node in knowledge base, then judge whether they exist indirect relation in knowledge base, namely whether two entity node all have relation with the 3rd node, if there is indirect relation, then add the contrary directed edge in two directions between the two nodes.
3) if when above-mentioned condition is all false or some entity there is not candidate's entity in knowledge base, then the encyclopaedia page excavating entity by step 303 increment evidence goes to find between these nodes whether there is semantic dependency.If the encyclopaedia page of an entity node directly contains another entity node, illustrate that between these two entity node be relevant, then between these two entity node, add a directed edge, the latter is pointed to by the former in direction.If the encyclopaedia page of an entity node does not directly comprise another entity node, then judge whether the encyclopaedia page of two entity node comprises one or more identical third party entity (certainly, this third party's node can not be " masses " node, such as China all occurs in a lot of physical page, but there is not any relation between these entities, rule-based method is adopted to filter out these links), if so, then between two entity node, the contrary directed edge in two directions is added.Note same entity censure item candidate's entity between do not add any directed edge.
102, integrated entity disappears qi
As shown in Figure 4, the core of integrated entity disambiguation is that step 401 candidate entity goes out in-degree calculating.According to the entity index map that step 304 exports, what calculate candidate's entity of each ambiguity entity goes out in-degree sum, going out in-degree and carrying out importance ranking with prior probability to candidate's entity then according to candidate's entity, the candidate's entity node selecting importance maximum is as final linked object.

Claims (4)

1., based on the integrated entity link method of Chinese of graph model, it is characterized in that:
For given text, the entity first identified wherein censures item, obtains candidate's entity.Then entity is censured item and candidate's entity is considered as node of graph, the correlativity of inter-entity represents limit structure entity denotion figure.Finally apply out in-degree algorithm at entity index map, realize the disambiguation to ambiguity entity multiple in text.
2. method according to claim 1, is characterized in that:
The present invention, when computational entity correlativity, not exclusively depends on the knowledge scale that knowledge base is intrinsic.When knowledge base can not meet the requested knowledge needed for link, the interactive encyclopaedia page being excavated entity by increment evidence finds evidence, so that computational entity correlativity the most all sidedly.
3. method according to claim 1, is characterized in that, also comprises:
The present invention, when finding optimum possibility related entities and censuring item, is not cursorily entities all in text are censured Xiang Jun to regard as and may be correlated with, but adopts dependency analysis tree to carry out interdependent path analysis.Just regarding that the entity that optimum may be correlated with censures item as when the interdependent path size of two entities denotion items is in range of set value, judging whether their candidate's entity exists relation in real world further, greatly can improve the efficiency of the qi that disappears like this.
4. method according to claim 1, is characterized in that, also comprises:
The while that the present invention's multiple ambiguity entity in text carrying out during disambiguation, adopt and apply out in-degree algorithm on entity index map, and going out in-degree and with prior probability, importance ranking being carried out to candidate's entity according to candidate's entity, the candidate's entity selecting importance maximum is as Object linking object.The method is simple, effective.
CN201510475469.5A 2015-08-06 2015-08-06 Chinese integrated entity linking method based on graph model Pending CN105183770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510475469.5A CN105183770A (en) 2015-08-06 2015-08-06 Chinese integrated entity linking method based on graph model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510475469.5A CN105183770A (en) 2015-08-06 2015-08-06 Chinese integrated entity linking method based on graph model

Publications (1)

Publication Number Publication Date
CN105183770A true CN105183770A (en) 2015-12-23

Family

ID=54905854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510475469.5A Pending CN105183770A (en) 2015-08-06 2015-08-06 Chinese integrated entity linking method based on graph model

Country Status (1)

Country Link
CN (1) CN105183770A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589976A (en) * 2016-03-08 2016-05-18 重庆文理学院 Object entity determining method and device based on semantic correlations
CN105843875A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Smart robot-oriented question and answer data processing method and apparatus
CN106407180A (en) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 Entity disambiguation method and apparatus
CN106503148A (en) * 2016-10-21 2017-03-15 东南大学 A kind of form entity link method based on multiple knowledge base
CN106909655A (en) * 2017-02-27 2017-06-30 中国科学院电子学研究所 Found and link method based on the knowledge mapping entity that production alias is excavated
CN106934020A (en) * 2017-03-10 2017-07-07 东南大学 A kind of entity link method based on multiple domain entity index
CN107316062A (en) * 2017-06-26 2017-11-03 中国人民解放军国防科学技术大学 A kind of name entity disambiguation method of improved domain-oriented
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product
CN109933785A (en) * 2019-02-03 2019-06-25 北京百度网讯科技有限公司 Method, apparatus, equipment and medium for entity associated
CN109977228A (en) * 2019-03-21 2019-07-05 浙江大学 The information identification method of grid equipment defect text
CN110569496A (en) * 2018-06-06 2019-12-13 腾讯科技(深圳)有限公司 Entity linking method, device and storage medium
CN110795527A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Candidate entity ordering method, training method and related device
CN110888946A (en) * 2019-12-05 2020-03-17 电子科技大学广东电子信息工程研究院 Entity linking method based on knowledge-driven query
CN111428031A (en) * 2020-03-20 2020-07-17 电子科技大学 Graph model filtering method fusing shallow semantic information
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities
CN112989804A (en) * 2021-04-14 2021-06-18 广东工业大学 Entity disambiguation method based on stacked multi-head feature extractor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000077690A1 (en) * 1999-06-15 2000-12-21 Kanisa Inc. System and method for document management based on a plurality of knowledge taxonomies
CN102291435A (en) * 2011-07-15 2011-12-21 武汉大学 Mobile information searching and knowledge discovery system based on geographic spatiotemporal data
CN104217026A (en) * 2014-09-28 2014-12-17 福州大学 Chinese microblog tendency retrieving method based on graph model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000077690A1 (en) * 1999-06-15 2000-12-21 Kanisa Inc. System and method for document management based on a plurality of knowledge taxonomies
CN102291435A (en) * 2011-07-15 2011-12-21 武汉大学 Mobile information searching and knowledge discovery system based on geographic spatiotemporal data
CN104217026A (en) * 2014-09-28 2014-12-17 福州大学 Chinese microblog tendency retrieving method based on graph model

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105589976A (en) * 2016-03-08 2016-05-18 重庆文理学院 Object entity determining method and device based on semantic correlations
CN105589976B (en) * 2016-03-08 2019-03-12 重庆文理学院 Method and device is determined based on the target entity of semantic relevancy
CN105843875A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Smart robot-oriented question and answer data processing method and apparatus
CN105843875B (en) * 2016-03-18 2019-09-13 北京光年无限科技有限公司 A kind of question and answer data processing method and device towards intelligent robot
CN106407180A (en) * 2016-08-30 2017-02-15 北京奇艺世纪科技有限公司 Entity disambiguation method and apparatus
CN106503148B (en) * 2016-10-21 2019-05-31 东南大学 A kind of table entity link method based on multiple knowledge base
CN106503148A (en) * 2016-10-21 2017-03-15 东南大学 A kind of form entity link method based on multiple knowledge base
CN106909655A (en) * 2017-02-27 2017-06-30 中国科学院电子学研究所 Found and link method based on the knowledge mapping entity that production alias is excavated
CN106909655B (en) * 2017-02-27 2019-03-26 中国科学院电子学研究所 The knowledge mapping entity discovery excavated based on production alias and link method
CN106934020A (en) * 2017-03-10 2017-07-07 东南大学 A kind of entity link method based on multiple domain entity index
CN107316062A (en) * 2017-06-26 2017-11-03 中国人民解放军国防科学技术大学 A kind of name entity disambiguation method of improved domain-oriented
CN107992480A (en) * 2017-12-25 2018-05-04 东软集团股份有限公司 A kind of method, apparatus for realizing entity disambiguation and storage medium, program product
CN110569496A (en) * 2018-06-06 2019-12-13 腾讯科技(深圳)有限公司 Entity linking method, device and storage medium
CN110569496B (en) * 2018-06-06 2022-05-17 腾讯科技(深圳)有限公司 Entity linking method, device and storage medium
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities
CN109933785A (en) * 2019-02-03 2019-06-25 北京百度网讯科技有限公司 Method, apparatus, equipment and medium for entity associated
CN109977228A (en) * 2019-03-21 2019-07-05 浙江大学 The information identification method of grid equipment defect text
CN109977228B (en) * 2019-03-21 2021-01-12 浙江大学 Information identification method for power grid equipment defect text
CN110795527B (en) * 2019-09-03 2022-04-29 腾讯科技(深圳)有限公司 Candidate entity ordering method, training method and related device
CN110795527A (en) * 2019-09-03 2020-02-14 腾讯科技(深圳)有限公司 Candidate entity ordering method, training method and related device
CN110888946A (en) * 2019-12-05 2020-03-17 电子科技大学广东电子信息工程研究院 Entity linking method based on knowledge-driven query
CN111428031A (en) * 2020-03-20 2020-07-17 电子科技大学 Graph model filtering method fusing shallow semantic information
CN111428031B (en) * 2020-03-20 2023-07-07 电子科技大学 Graph model filtering method integrating shallow semantic information
CN112989804A (en) * 2021-04-14 2021-06-18 广东工业大学 Entity disambiguation method based on stacked multi-head feature extractor

Similar Documents

Publication Publication Date Title
CN105183770A (en) Chinese integrated entity linking method based on graph model
JP7283009B2 (en) Dialogue understanding model training method, device, device and storage medium
CN103491205B (en) The method for pushing of a kind of correlated resources address based on video search and device
CN110704743B (en) Semantic search method and device based on knowledge graph
CN112560501B (en) Semantic feature generation method, model training method, device, equipment and medium
CN110837550A (en) Knowledge graph-based question and answer method and device, electronic equipment and storage medium
KR102521765B1 (en) Method and apparatus for determining causality, electronic device and storage medium
WO2019102411A1 (en) Structuring incoherent nodes by superimposing on a base knowledge graph
CN105468605A (en) Entity information map generation method and device
CN111325022B (en) Method and device for identifying hierarchical address
JP2022050379A (en) Semantic retrieval method, apparatus, electronic device, storage medium, and computer program product
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN103488759A (en) Method and device for searching application programs according to key words
CN103440314A (en) Semantic retrieval method based on Ontology
JP7203981B2 (en) Similarity model creation method, device, electronic device, storage medium and program for searching geographic location
CN112115232A (en) Data error correction method and device and server
KR102600018B1 (en) Method and apparatus for mining entity relationship, electronic device, storage medium and program
WO2023159767A1 (en) Target word detection method and apparatus, electronic device and storage medium
CN104834736A (en) Method and device for establishing index database and retrieval method, device and system
CN113761890A (en) BERT context sensing-based multi-level semantic information retrieval method
CN110209781B (en) Text processing method and device and related equipment
KR20220120545A (en) Method and apparatus for obtaining PIO status information
CN104391969A (en) User query statement syntactic structure determining method and device
CN113505190B (en) Address information correction method, device, computer equipment and storage medium
CN105573971A (en) Table reconstruction apparatus and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151223

WD01 Invention patent application deemed withdrawn after publication