CN105183770A - Chinese integrated entity linking method based on graph model - Google Patents
Chinese integrated entity linking method based on graph model Download PDFInfo
- Publication number
- CN105183770A CN105183770A CN201510475469.5A CN201510475469A CN105183770A CN 105183770 A CN105183770 A CN 105183770A CN 201510475469 A CN201510475469 A CN 201510475469A CN 105183770 A CN105183770 A CN 105183770A
- Authority
- CN
- China
- Prior art keywords
- entity
- candidate
- item
- text
- referent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The present invention discloses a Chinese integrated entity linking method based on a graph model. An ambiguous entity in a text can be mapped into a specific entity in a real world, in order to provide aid for knowledge base expansion, information extraction and search engines. The method mainly comprises three parts of generating a candidate entity, constructing an entity indicator diagram, and disambiguating an integrated entity. For a given text, an entity referent item therein is recognized to obtain the candidate entity. The entity referent item and the candidate entity thereof are regarded as graph nodes to construct an entity referent graph. An in-degree and out-degree algorithm is applied to the entity indicator diagram for implementing disambiguation of multiple ambiguous entities in the text. The present invention does not depend on the knowledge base completely in the establishment of the entity indicator diagram, and also can implement incremental evidence mining to find evidence on an encyclopedia webpage. Dependence path analysis is employed to find the possibly related entity referent item. When the dependence path sizes of two entity referent items are within a set range, the two entity referent items are regarded as the possibly related entity referent items. Further, whether their candidate entities have relations in the real world is determined, so that the efficiency of disambiguation is greatly improved.
Description
Technical field
The present invention relates to natural language processing (NLP) field, be specifically related to entity link, knowledge base expansion, information extraction, question answering system and search engine optimization.
Background technology
Traditional Chinese entity link method censures the context similarity of item and candidate's entity by comparing entity, then choose the destination object of the maximum candidate of similarity as link.But this method existing defects, first, it does not utilize the semantic dependency of inter-entity in text, and this correlativity exactly can improve the accuracy of the qi that disappears to a great extent; Secondly, traditional Chinese entity link method once can only carry out disambiguation to an ambiguity entity, and the method for the lower and similarity-rough set of efficiency can not obtain good effect for the entity link of short text.
Entities all in text, when building entity index map and calculating the degree of correlation, to be censured Xiang Jun and are regarded as and may be correlated with, then judging whether their candidate's entity exists relation truly in real world by existing integrated entity link method.The method is irrational because entity censure item only censure item to a few entities in text in the ordinary course of things may be relevant.If all entities in text are censured Xiang Jun be considered as being correlated with, much unnecessary computing time can be expended when building entity index map, adding the cost price of calculating.
Existing Chinese knowledge base is less, and the entity mobility models that knowledge base comprises is imperfect, is not well positioned to meet the requirement of entity link.Therefore, owing to being subject to the quantitative limitation of knowledge base knowledge, the whole structure of entity link will be greatly affected.
Summary of the invention
The invention provides the integrated entity link method of a kind of Chinese based on graph model.By seeking optimum possible related entities and increment evidence excavation structure entity index map, disambiguation is carried out to ambiguity entity multiple in text.In order to solve, knowledge base knowledge in existing entity link method is not enough, the inefficient defect of structure entity index map, provides one more effective entity link method.
The invention provides the integrated entity link method of a kind of Chinese based on graph model, comprising:
For given text, the entity first identified wherein censures item, obtains candidate's entity.Then entity is censured item and candidate's entity is considered as node of graph, the correlativity of inter-entity represents limit structure entity denotion figure.Finally apply out in-degree algorithm at entity index map, realize the disambiguation to ambiguity entity multiple in text.
The invention provides the integrated entity link method of a kind of Chinese based on graph model, also comprise:
When building entity index map computational entity correlativity, if the knowledge in current knowledge storehouse can not meet the requested knowledge (can not find the relation of inter-entity in knowledge base) needed for entity link, then the interactive encyclopaedia webpage excavating entity by increment evidence goes to find evidence.
In order to reduce the time cost built spent by entity index map, utilizing interdependent path analysis to find optimum possibility related entities and censuring item.Just think that they may be correlated with when the interdependent path size of two entities denotion items is in range of set value, judge whether their candidate's entity exists relation in real world further.
In order to carry out disambiguation to the multiple entities in text simultaneously, the present invention applies out in-degree algorithm on entity index map, and according to candidate's entity go out in-degree and prior probability carries out importance ranking to candidate's entity, select the maximum candidate's entity of importance as Object linking object.
Compared with prior art, the invention has the beneficial effects as follows mainly contain following some:
1. the present invention can carry out disambiguation to the multiple entities in text simultaneously, and accuracy rate is better than prior art.
2. to build the efficiency of entity index map higher in the present invention, and the entity index map of structure is more accurate.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention, be briefly described to the accompanying drawing of use required for the present invention below:
Fig. 1 is the process flow diagram of integrated entity link
Fig. 2 is the schematic diagram of candidate's solid generation
Fig. 3 is the schematic diagram of entity index map structure
Fig. 4 is the schematic diagram of integrated entity disambiguation
Embodiment
Core concept of the present invention is: utilize the relation of entity in knowledge base to build entity index map, the entity in text and their candidate's entity are regarded as the node of figure, and the limit between node represents their semantic dependency.If when they do not exist relation in knowledge base, excavate the encyclopaedia page corresponding to entity by increment evidence and find evidence structure entity index map.Finally on entity index map, apply out in-degree algorithm, realize the integrated link to ambiguity entity multiple in one text thus.
In order to make object of the present invention, method and a little clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.
Fig. 1 is the process flow diagram of the integrated entity link method of the present invention, and as shown in Figure 1, the method for the integrated entity of the Chinese based on graph model link is primarily of candidate's solid generation, entity index map structure, integrated entity disambiguation three part composition.Specific embodiment is as follows:
100, candidate's solid generation
Candidate's solid generation is a step the most basic of whole method, and as shown in Figure 2, it mainly comprises generation two parts of Entity recognition and candidate's entity.For step 201 Entity recognition, the present invention carries out Entity recognition by the part-of-speech tagging (nr shows name, and ns shows place name, and nt shows mechanism's name, and nz shows other specific terms) of the participle instrument ICTCLAS of the Chinese Academy of Sciences.Because Chinese language has certain singularity, in order to ensure the accuracy of Entity recognition and comprehensive, while utilizing ICTCLAS part-of-speech tagging, create a name dictionary for some proper nouns and more indiscernible physical name.
For the generation of step 203 candidate entity, lucene is adopted to carry out index to knowledge base herein, relatively in input text, whether entity denotion item is identical with the index of entity in knowledge base, if identical, then by these entities as being candidate's entity (note: when setting up knowledge base, for each entity establishes index and the index of all candidate's entities of same ambiguity entity is identical) that in text, entity censures item.
101, entity index map structure
Entity is censured the node that item and their candidate's entity regard entity index map as, the relation of inter-entity represents limit.Entity index map is a digraph, and as shown in Figure 3, its structure mainly comprises calculating and the excavation of increment evidence of prior probability (context similarity) and the entity degree of correlation.
In the prior probability of candidate's entity is given input text, entity censures the probability size that item points to this candidate's entity, and to the number reducing node of graph, the speed accelerating entity link has vital role.The prior probability of the cosine similarity of entity being censured the input text of item and its candidate's entity encyclopaedia page alternatively entity, candidate's entity that prior probability is less than setting value will be deleted by from candidate's entity sets.
The calculating of step 301 degree of correlation is the core of graph model, is the foundation that in entity index map, limit is set up.Relatedness computation scheme of the present invention is as follows:
1) utilize dependency analysis to set to resolve input text, item is censured to each entity, censure item according to entity that may be relevant to its optimum in its text of interdependent path finding, when the interdependent path size when between two entities denotion items is in range of set value, the present invention then thinks that they may be correlated with.
2) item is censured for most possible relevant entity, obtain their candidate's entity sets, for all candidate's entity node, first judge whether two entity node exist direct relation in knowledge base, if there is direct relation, then add a directed edge between the two nodes, direction is by the terminal of the starting point points relationship of relation.If there is not direct relation in two entity node in knowledge base, then judge whether they exist indirect relation in knowledge base, namely whether two entity node all have relation with the 3rd node, if there is indirect relation, then add the contrary directed edge in two directions between the two nodes.
3) if when above-mentioned condition is all false or some entity there is not candidate's entity in knowledge base, then the encyclopaedia page excavating entity by step 303 increment evidence goes to find between these nodes whether there is semantic dependency.If the encyclopaedia page of an entity node directly contains another entity node, illustrate that between these two entity node be relevant, then between these two entity node, add a directed edge, the latter is pointed to by the former in direction.If the encyclopaedia page of an entity node does not directly comprise another entity node, then judge whether the encyclopaedia page of two entity node comprises one or more identical third party entity (certainly, this third party's node can not be " masses " node, such as China all occurs in a lot of physical page, but there is not any relation between these entities, rule-based method is adopted to filter out these links), if so, then between two entity node, the contrary directed edge in two directions is added.Note same entity censure item candidate's entity between do not add any directed edge.
102, integrated entity disappears qi
As shown in Figure 4, the core of integrated entity disambiguation is that step 401 candidate entity goes out in-degree calculating.According to the entity index map that step 304 exports, what calculate candidate's entity of each ambiguity entity goes out in-degree sum, going out in-degree and carrying out importance ranking with prior probability to candidate's entity then according to candidate's entity, the candidate's entity node selecting importance maximum is as final linked object.
Claims (4)
1., based on the integrated entity link method of Chinese of graph model, it is characterized in that:
For given text, the entity first identified wherein censures item, obtains candidate's entity.Then entity is censured item and candidate's entity is considered as node of graph, the correlativity of inter-entity represents limit structure entity denotion figure.Finally apply out in-degree algorithm at entity index map, realize the disambiguation to ambiguity entity multiple in text.
2. method according to claim 1, is characterized in that:
The present invention, when computational entity correlativity, not exclusively depends on the knowledge scale that knowledge base is intrinsic.When knowledge base can not meet the requested knowledge needed for link, the interactive encyclopaedia page being excavated entity by increment evidence finds evidence, so that computational entity correlativity the most all sidedly.
3. method according to claim 1, is characterized in that, also comprises:
The present invention, when finding optimum possibility related entities and censuring item, is not cursorily entities all in text are censured Xiang Jun to regard as and may be correlated with, but adopts dependency analysis tree to carry out interdependent path analysis.Just regarding that the entity that optimum may be correlated with censures item as when the interdependent path size of two entities denotion items is in range of set value, judging whether their candidate's entity exists relation in real world further, greatly can improve the efficiency of the qi that disappears like this.
4. method according to claim 1, is characterized in that, also comprises:
The while that the present invention's multiple ambiguity entity in text carrying out during disambiguation, adopt and apply out in-degree algorithm on entity index map, and going out in-degree and with prior probability, importance ranking being carried out to candidate's entity according to candidate's entity, the candidate's entity selecting importance maximum is as Object linking object.The method is simple, effective.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510475469.5A CN105183770A (en) | 2015-08-06 | 2015-08-06 | Chinese integrated entity linking method based on graph model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510475469.5A CN105183770A (en) | 2015-08-06 | 2015-08-06 | Chinese integrated entity linking method based on graph model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105183770A true CN105183770A (en) | 2015-12-23 |
Family
ID=54905854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510475469.5A Pending CN105183770A (en) | 2015-08-06 | 2015-08-06 | Chinese integrated entity linking method based on graph model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105183770A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589976A (en) * | 2016-03-08 | 2016-05-18 | 重庆文理学院 | Object entity determining method and device based on semantic correlations |
CN105843875A (en) * | 2016-03-18 | 2016-08-10 | 北京光年无限科技有限公司 | Smart robot-oriented question and answer data processing method and apparatus |
CN106407180A (en) * | 2016-08-30 | 2017-02-15 | 北京奇艺世纪科技有限公司 | Entity disambiguation method and apparatus |
CN106503148A (en) * | 2016-10-21 | 2017-03-15 | 东南大学 | A kind of form entity link method based on multiple knowledge base |
CN106909655A (en) * | 2017-02-27 | 2017-06-30 | 中国科学院电子学研究所 | Found and link method based on the knowledge mapping entity that production alias is excavated |
CN106934020A (en) * | 2017-03-10 | 2017-07-07 | 东南大学 | A kind of entity link method based on multiple domain entity index |
CN107316062A (en) * | 2017-06-26 | 2017-11-03 | 中国人民解放军国防科学技术大学 | A kind of name entity disambiguation method of improved domain-oriented |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
CN109933785A (en) * | 2019-02-03 | 2019-06-25 | 北京百度网讯科技有限公司 | Method, apparatus, equipment and medium for entity associated |
CN109977228A (en) * | 2019-03-21 | 2019-07-05 | 浙江大学 | The information identification method of grid equipment defect text |
CN110569496A (en) * | 2018-06-06 | 2019-12-13 | 腾讯科技(深圳)有限公司 | Entity linking method, device and storage medium |
CN110795527A (en) * | 2019-09-03 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Candidate entity ordering method, training method and related device |
CN110888946A (en) * | 2019-12-05 | 2020-03-17 | 电子科技大学广东电子信息工程研究院 | Entity linking method based on knowledge-driven query |
CN111428031A (en) * | 2020-03-20 | 2020-07-17 | 电子科技大学 | Graph model filtering method fusing shallow semantic information |
CN112585596A (en) * | 2018-06-25 | 2021-03-30 | 易享信息技术有限公司 | System and method for investigating relationships between entities |
CN112989804A (en) * | 2021-04-14 | 2021-06-18 | 广东工业大学 | Entity disambiguation method based on stacked multi-head feature extractor |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000077690A1 (en) * | 1999-06-15 | 2000-12-21 | Kanisa Inc. | System and method for document management based on a plurality of knowledge taxonomies |
CN102291435A (en) * | 2011-07-15 | 2011-12-21 | 武汉大学 | Mobile information searching and knowledge discovery system based on geographic spatiotemporal data |
CN104217026A (en) * | 2014-09-28 | 2014-12-17 | 福州大学 | Chinese microblog tendency retrieving method based on graph model |
-
2015
- 2015-08-06 CN CN201510475469.5A patent/CN105183770A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000077690A1 (en) * | 1999-06-15 | 2000-12-21 | Kanisa Inc. | System and method for document management based on a plurality of knowledge taxonomies |
CN102291435A (en) * | 2011-07-15 | 2011-12-21 | 武汉大学 | Mobile information searching and knowledge discovery system based on geographic spatiotemporal data |
CN104217026A (en) * | 2014-09-28 | 2014-12-17 | 福州大学 | Chinese microblog tendency retrieving method based on graph model |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105589976A (en) * | 2016-03-08 | 2016-05-18 | 重庆文理学院 | Object entity determining method and device based on semantic correlations |
CN105589976B (en) * | 2016-03-08 | 2019-03-12 | 重庆文理学院 | Method and device is determined based on the target entity of semantic relevancy |
CN105843875A (en) * | 2016-03-18 | 2016-08-10 | 北京光年无限科技有限公司 | Smart robot-oriented question and answer data processing method and apparatus |
CN105843875B (en) * | 2016-03-18 | 2019-09-13 | 北京光年无限科技有限公司 | A kind of question and answer data processing method and device towards intelligent robot |
CN106407180A (en) * | 2016-08-30 | 2017-02-15 | 北京奇艺世纪科技有限公司 | Entity disambiguation method and apparatus |
CN106503148B (en) * | 2016-10-21 | 2019-05-31 | 东南大学 | A kind of table entity link method based on multiple knowledge base |
CN106503148A (en) * | 2016-10-21 | 2017-03-15 | 东南大学 | A kind of form entity link method based on multiple knowledge base |
CN106909655A (en) * | 2017-02-27 | 2017-06-30 | 中国科学院电子学研究所 | Found and link method based on the knowledge mapping entity that production alias is excavated |
CN106909655B (en) * | 2017-02-27 | 2019-03-26 | 中国科学院电子学研究所 | The knowledge mapping entity discovery excavated based on production alias and link method |
CN106934020A (en) * | 2017-03-10 | 2017-07-07 | 东南大学 | A kind of entity link method based on multiple domain entity index |
CN107316062A (en) * | 2017-06-26 | 2017-11-03 | 中国人民解放军国防科学技术大学 | A kind of name entity disambiguation method of improved domain-oriented |
CN107992480A (en) * | 2017-12-25 | 2018-05-04 | 东软集团股份有限公司 | A kind of method, apparatus for realizing entity disambiguation and storage medium, program product |
CN110569496A (en) * | 2018-06-06 | 2019-12-13 | 腾讯科技(深圳)有限公司 | Entity linking method, device and storage medium |
CN110569496B (en) * | 2018-06-06 | 2022-05-17 | 腾讯科技(深圳)有限公司 | Entity linking method, device and storage medium |
CN112585596A (en) * | 2018-06-25 | 2021-03-30 | 易享信息技术有限公司 | System and method for investigating relationships between entities |
CN109933785A (en) * | 2019-02-03 | 2019-06-25 | 北京百度网讯科技有限公司 | Method, apparatus, equipment and medium for entity associated |
CN109977228A (en) * | 2019-03-21 | 2019-07-05 | 浙江大学 | The information identification method of grid equipment defect text |
CN109977228B (en) * | 2019-03-21 | 2021-01-12 | 浙江大学 | Information identification method for power grid equipment defect text |
CN110795527B (en) * | 2019-09-03 | 2022-04-29 | 腾讯科技(深圳)有限公司 | Candidate entity ordering method, training method and related device |
CN110795527A (en) * | 2019-09-03 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Candidate entity ordering method, training method and related device |
CN110888946A (en) * | 2019-12-05 | 2020-03-17 | 电子科技大学广东电子信息工程研究院 | Entity linking method based on knowledge-driven query |
CN111428031A (en) * | 2020-03-20 | 2020-07-17 | 电子科技大学 | Graph model filtering method fusing shallow semantic information |
CN111428031B (en) * | 2020-03-20 | 2023-07-07 | 电子科技大学 | Graph model filtering method integrating shallow semantic information |
CN112989804A (en) * | 2021-04-14 | 2021-06-18 | 广东工业大学 | Entity disambiguation method based on stacked multi-head feature extractor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105183770A (en) | Chinese integrated entity linking method based on graph model | |
JP7283009B2 (en) | Dialogue understanding model training method, device, device and storage medium | |
CN103491205B (en) | The method for pushing of a kind of correlated resources address based on video search and device | |
CN110704743B (en) | Semantic search method and device based on knowledge graph | |
CN112560501B (en) | Semantic feature generation method, model training method, device, equipment and medium | |
CN110837550A (en) | Knowledge graph-based question and answer method and device, electronic equipment and storage medium | |
KR102521765B1 (en) | Method and apparatus for determining causality, electronic device and storage medium | |
WO2019102411A1 (en) | Structuring incoherent nodes by superimposing on a base knowledge graph | |
CN105468605A (en) | Entity information map generation method and device | |
CN111325022B (en) | Method and device for identifying hierarchical address | |
JP2022050379A (en) | Semantic retrieval method, apparatus, electronic device, storage medium, and computer program product | |
CN105677857B (en) | method and device for accurately matching keywords with marketing landing pages | |
CN103488759A (en) | Method and device for searching application programs according to key words | |
CN103440314A (en) | Semantic retrieval method based on Ontology | |
JP7203981B2 (en) | Similarity model creation method, device, electronic device, storage medium and program for searching geographic location | |
CN112115232A (en) | Data error correction method and device and server | |
KR102600018B1 (en) | Method and apparatus for mining entity relationship, electronic device, storage medium and program | |
WO2023159767A1 (en) | Target word detection method and apparatus, electronic device and storage medium | |
CN104834736A (en) | Method and device for establishing index database and retrieval method, device and system | |
CN113761890A (en) | BERT context sensing-based multi-level semantic information retrieval method | |
CN110209781B (en) | Text processing method and device and related equipment | |
KR20220120545A (en) | Method and apparatus for obtaining PIO status information | |
CN104391969A (en) | User query statement syntactic structure determining method and device | |
CN113505190B (en) | Address information correction method, device, computer equipment and storage medium | |
CN105573971A (en) | Table reconstruction apparatus and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20151223 |
|
WD01 | Invention patent application deemed withdrawn after publication |