CN113361283A - Web table-oriented paired entity joint disambiguation method - Google Patents

Web table-oriented paired entity joint disambiguation method Download PDF

Info

Publication number
CN113361283A
CN113361283A CN202110720148.2A CN202110720148A CN113361283A CN 113361283 A CN113361283 A CN 113361283A CN 202110720148 A CN202110720148 A CN 202110720148A CN 113361283 A CN113361283 A CN 113361283A
Authority
CN
China
Prior art keywords
entity
entities
column
candidate
consistency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110720148.2A
Other languages
Chinese (zh)
Inventor
吴天星
李林
漆桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110720148.2A priority Critical patent/CN113361283A/en
Publication of CN113361283A publication Critical patent/CN113361283A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Web table-oriented paired entity joint disambiguation method, which is used for solving the Web table-oriented entity link task. The Web form oriented entity linking task is to link an entity mention in a Web form to an entity in a knowledge base without ambiguity. The invention designs a united disambiguation method for entities aiming at the characteristics of a form, iteratively and oppositely disambiguates the pair of entities with the highest reliability by combining, and gradually realizes all disambiguation of the entities in the whole form. The confidence calculation method comprehensively considers various information, including similarity between entity mentions and candidate entities, consistency between linked entities, and semantic consistency of rows and columns in a table. In the iterative process of the algorithm, the linked entities have high confidence coefficient, and effective auxiliary information can be provided for subsequent linking work, so that high-quality joint disambiguation is realized.

Description

Web table-oriented paired entity joint disambiguation method
Technical Field
The invention relates to a Web table-oriented paired entity joint disambiguation method, belonging to the technical field of knowledge maps.
Background
Web tables organize data in a structured form, providing high quality and high density information. It is estimated that the Web contains 141 hundred million tables, with about 1.54 million associated tables. In order to be able to exploit these value-dependent data, it is necessary for the computer to be able to understand these tables from a semantic level. Entity linking of tables is an effective means for realizing table understanding.
Linking entities in a table requires associating entity references in table cells with corresponding entities in the knowledge-graph. An efficient form entity linking system should be able to unambiguously link an entity reference to a corresponding entity in the knowledge-graph based on the context information of the entity reference in the form. Unlike the unity of the context structure of the entity references in the text, the context of the entity references in the table differ in the form of cell position, row, column angle, etc. The table entity linking method firstly needs to identify entity mentions from the table and generate candidate entities for the entity mentions, and this part of work usually uses some heuristic methods to find entity mentions and candidate entities as comprehensive as possible. Disambiguation of candidate entities is then achieved by picking the right and appropriate entities from the candidate entities for linking by virtue of the entity mentions the context in the table and the relationships between the linked entities.
Identification of entity mentions and generation of candidate entities can often be achieved with good results using engineering methods. Candidate entity disambiguation is a major difficulty in table entity linking, and the task needs to design a ranking model to calculate the similarity between entity mentions and different candidate entities. When calculating the similarity, not only the semantic similarity between the entity mention and the candidate entity but also the correlation between the linked entities need to be considered. An entity disambiguation method that utilizes correlation relationships between linked entities is referred to as a joint disambiguation method. When much work is currently done in joint disambiguation, entities that are as related as possible are selected from a candidate set of entities mentioned by all entities, and the correlation of linked entities and the similarity between entity mentions and linked entities are maximized. The joint disambiguation method obtains good disambiguation effect, but has the defect of strong assumption, and is not completely suitable for the knowledge map and the Web table in reality. Entities in the same row that are not in the primary key column tend to have strong correlations with entities in the primary key column, but do not necessarily have high correlations with other columns. Also affected by the incompleteness of the knowledge-graph, linked entities in the same column may not be particularly relevant. The invention provides a paired entity joint disambiguation algorithm aiming at the defects of the current joint disambiguation algorithm, which is used for carrying out joint disambiguation on a pair of entity mentions with highest confidence in a table in sequence, so that the probability of introducing noise is reduced while the high-quality joint disambiguation effect is ensured.
Disclosure of Invention
The technical problem is as follows: aiming at the structural characteristics of the table and the defects of the current joint disambiguation method, a paired entity linking method is designed. The paired entity link here refers to: and sequentially carrying out joint disambiguation on the pair of entity mentions with the highest confidence level in the table, and reducing the probability of introducing noise while ensuring the high-quality joint disambiguation effect. The linked entities are used for providing richer and more accurate context information for the subsequent entity linking process, and further a better entity linking effect is realized in a real Web table.
The technical scheme is as follows:
the paired entity joint disambiguation method of the present invention is performed by the following steps:
1) and combining every two entity mentions in the same row and column in the Web table to generate all entity mention duplets.
2) Calculating the confidence of all entity mentions when the duplet is linked, and linking a pair of entity mentions with the highest confidence with the respective entity, and deleting other candidate entities mentioned by the pair of entity mentions.
3) The confidence values between the different entity mentions in the table are updated.
4) Iterating said steps 2) and 3) until all entities in the table mention the completion link.
In a preferred embodiment of the present invention, in the step 2), the confidence level calculation is performed as follows:
2-a) confidence computation introduces variable information of column semantic consistency in the linking process. According to the characteristics of the table, the cell contents in the same column have similar semantic characteristics. In an entity linking task, entities linked in the same column usually belong to a certain category together, so that the linked entities have similar vector representation to a certain extent. The column semantic consistency CSC is calculated by:
CSC=-mean(var([e1,e2,…,en]))
wherein e1,e2,…,enVector representation representing linked entities in a column, var is used to obtain variance vectors, mean is used to obtain scalar values representing the semantic consistency of the column by averaging the values in the variance vectors.
2-b) confidence computation introduces variable information of line semantic consistency in the linking process. Row meaning consistency characterizes the consistency of the relationship formed by the link entities in the other columns and the link entities in the primary key column. The row meaning consistency is defined as the negative mean of the relation variance vector, and the smaller the variance, the larger the negative mean of the relation vector, the closer the relation of different rows is, and the more consistent the row meaning is. The rowsense consistency RSC is calculated by:
r=enon-subject-esubject
RSC=-mean(var([r1,r2,…,rn]))
wherein esubjectRepresenting linked entities in the primary key column, enon-subjectRepresenting the linking entities in the non-primary key column and r representing a relationship vector. var is used to obtain variance vector, mean is used to obtain scalar value representing line semantic consistency by averaging the values in the variance vector, r1,r2,…,rnRepresenting a relationship vector representation formed between different row-linked entities.
2-c) confidence computation introduces entity consistency information within the table during the linking process. The link entity consistency is calculated by cosine similarity of the entity vector representation:
EES(e1,e2)=cosine(e1,e2)
wherein e1,e2Referring to two entities in a pairwise entity Joint disambiguation ProcessThe corresponding entity vector representation is mentioned.
2-d) confidence calculation introduces entity mention and candidate entity similarity information. The similarity between the entity mention and the candidate entity is calculated by combining the cosine similarity and the prior probability of the entity mention context vector representation and the candidate entity context vector representation. The context of entity mention is composed of the bag of words in the same row and column, and the context of candidate entity is composed of the bag of words in the text description of entity in the knowledge base. The entity reference context vector representation is derived from the average of all word vectors in its bag of words, and the candidate entity context vector representation is derived from the average of all word vectors in its bag of words, as shown in particular below:
MES(m,e)=cosine(mcontext,econtext)+P(e|m)
wherein m iscontextA context vector representation representing an entity mentioning m, econtextRepresenting the context vector of the candidate entity e, P (e | m) represents the probability that m is linked to e.
2-c) confidence calculation of the method of combining the various information. Mention of m given a pair of entitiesi,mjAnd their corresponding candidate entity sets CSi,CSjConfidence is defined herein as Γ (m)i,mj) The method mainly comprises two parts of contents, wherein one part of contents is the similarity between elements related to paired entity links, the other part of contents is the change of row (column) semantic consistency brought by link operation, and the hyper-parameter beta>And 0, the influence degree proportion used for controlling semantic consistency. The details are as follows:
Figure BDA0003136197140000031
the similarity calculation mainly comprises three parts, namely the similarity between two entities and respective candidate entities and the correlation between the candidate entities.
Figure BDA0003136197140000032
And
Figure BDA0003136197140000033
respectively as candidate entity sets CSiAnd CSjThe candidate entity in (1); MES is used for calculating the similarity between entity reference and candidate entity; EES is used to measure the correlation between entities to be linked. Δ CSCNAnd Δ RSCNDenotes the mentioning of m for an entityi,mjAnd (5) after the link operation is finished, the regularization result of the row and column semantic consistency change values is obtained. The regularization operation is as follows:
Norm(d)=σ(d)-0.5
wherein d is a variation value of semantic consistency, if d is greater than 0, the semantic consistency is increased, and then the confidence value is improved. σ in the formula is a logistic sigmoid function and the regularization operation is such that norm (d) is a member of (-0.5, 0.5).
Has the advantages that: compared with the prior art, the invention has the following advantages:
the current table entity linking task mostly employs a joint disambiguation strategy to disambiguate multiple entity references simultaneously. The method mainly comprises a probabilistic graphical model, a random walk algorithm, an iterative optimization strategy and the like. In calculating the similarity between the entity mention and the candidate entity, the entity to be linked and the entity already linked in the same row and column are considered to be as related as possible. Entities in the same row that are not the primary key column often have strong correlations with entities in the primary key column, but do not necessarily have high correlations with other columns. Also, linked entities in the same column may not be particularly relevant, subject to imperfections in the knowledge-graph. In the joint disambiguation process, when the entity with low relevance is promoted, not only information cannot be provided mutually, but also noise can be introduced, or the entity cannot be linked to the correct entity due to incomplete knowledge graph, so that the link of other cells is influenced. But simply abandon the joint disambiguation strategy, which results in the loss of important information and thus affects the effect of the final entity link. The invention designs a pair entity joint disambiguation method aiming at the characteristics of the table, iteratively and oppositely uses the pair of entities with the highest reliability to mention the joint disambiguation, and gradually realizes all disambiguation of the entities in the whole table. The confidence coefficient calculation method comprehensively considers various information, ensures the reliability of calculation and realizes high-quality joint disambiguation.
The practical effect proves that the method for matching the examples provided by the invention can complete the link tasks of the Web form entities of different types. The invention has better effect on the micro accuracy and the macro accuracy.
Drawings
Fig. 1 is a schematic diagram of the framework of the present invention.
Fig. 2 is a schematic diagram of row (column) consistency calculation in the present invention.
FIG. 3 is a diagram illustrating an example of a pair-wise entity linking process in an embodiment of the invention.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
The invention designs a form entity linking task completed by an entity joint disambiguation algorithm, which mainly comprises the following steps:
1) column semantic consistency calculation.
According to the characteristics of the table, the cell contents in the same column have similar semantic characteristics. In the entity linking task, entities linked in the same column generally belong to a certain category together, so that the linked entities have similar vector representations to a certain extent.
Given a column of data in a Web table, variance is first calculated element-by-element for vector representations linking entities in the column, resulting in a variance vector whose dimensions are the same as those represented by the entity vectors. The variance vector represents the degree of dispersion of linked entities in the column, with smaller variances indicating more similar linked entities. The invention defines the column semantic consistency as the negative mean of the variance vector, and the smaller the variance, the larger the negative mean of the variance vector, the larger the column semantic consistency, and the more similar the linked entities. Computation process the column semantic computation in fig. 2, given the linked entities already in the column, column semantic consistency can be formalized as follows:
CSC=-mean(var([e1,e2,…,en]))
wherein e1,e2,…,enAnd representing the vector representation of a column of linked entities in the upper graph, wherein var is used for obtaining a variance vector, and mean obtains a scalar value representing the semantic consistency of the column by averaging the values in the variance vector.
2) And (5) carrying out line meaning consistency calculation.
Similarly, the cell contents in the same row in the table also have certain semantic properties. Unlike the semantic properties in columns, the contents of different cells in a row will typically correspond to different types of linked entities that do not have similar properties. Thus, row semantic consistency is defined herein by the relationship between columns and columns based on commonality information for different columns of the same row in the table.
Row meaning consistency characterizes the consistency of the relationship formed by the link entities in the other columns and the link entities in the primary key column. The primary key column represents the most important column content of a row for which there is an identifying effect, usually referring to some entity. In order to calculate the row semantic consistency of the relationship between the primary key column and the other columns, a relationship vector is first obtained. The relationship vector is computed by the difference of the linked entity vector representation in the primary key column and the linked entity vector representation in the non-primary key column. In general, the relationship of any row in two columns should be the same, and therefore the representation of the relationship vector should also be close. When row meaning consistency is calculated, firstly, the relation vectors of the main key column and other columns are obtained. And then, calculating the variance of the relation vectors calculated by the two rows of elements element by element to obtain a relation variance vector, wherein the dimensionality of the relation variance vector is the same as the dimensionality of the entity vector. The variance vector represents the degree of dispersion of the relationship. Smaller variances indicate more similar relationships between two different rows. The line semantic consistency is the negative mean of the relation variance vector, the smaller the variance is, the larger the negative mean of the relation vector is, the closer the relation of different lines is, and the more consistent the line semantic consistency is. The calculation process refers to the row semantic calculation in fig. 2, first calculating a relationship vector representation between the given two columns of linked entities, and then calculating row semantic consistency based on the relationship vector using a method similar to the column semantic consistency calculation.
r=enon-subject-esubject#
RSC=-mean(var([r1,r2,…,rn]))#
The above formula gives the calculation method of the relationship vector, esubjectRepresenting linked entities in the primary key column, enon-subjectRepresenting linked entities in non-primary key columns. RSC defines the way in which rowed meanings are consistent.
3) And (4) entity consistency calculation.
Entity consistency is calculated by linking cosine similarity of entities:
EES(e1,e2)=cosine(e1,e2)
wherein e1,e2Refers to the two entities referring to the corresponding entity vector representation in the pairwise entity joint disambiguation process.
4) Entity mentions and candidate entity similarity calculations.
The similarity between the entity mention and the candidate entity is calculated by combining the cosine similarity and the prior probability represented by the entity mention context vector and the candidate entity context vector. The context of entity mention is composed of the bag of words in the same row and column, and the context of candidate entity is composed of the bag of words in the text description of entity in the knowledge base. The entity-mentioned context vector representation is obtained by the average value of all word vectors in the word bag, and the candidate entity context vector representation is obtained by the average value of all word vectors in the word bag, which is specifically shown as follows:
MES(m,e)=cosine(mcontext,econtext)+P(e|m)
wherein m iscontextA context vector representation representing an entity mentioning m, econtextRepresenting the context vector representation of the candidate entity e, P (e | m) represents the probability that m links to e, which is calculated by the entity popularity. In a set of candidate entities mentioned by an entity, different candidate entities tend to have different degrees of importance or popularity. For example, the probability that an entity mentions "Chicago" links to the entity "Chicago" throughout the Web environment is greater than that it links to the entity "Chicago (oscar bonus movie)". This independent feature is very useful for entity linking, and the invention is based on the Wikipedia statistical entity's prior probability of referring to a link to an entity. First collect from all anchor text, redirect page, disambiguation page<Character strings, entities>And calculating the proportion of the character string linked to a certain entity as the entity link prior probability, wherein the specific formula is shown as follows.
Figure BDA0003136197140000061
Where the string m will be referred to as an entity in the entity link, f (m, e) represents the frequency with which the strings m and e co-occur, and f (e) represents the total number of occurrences of the entity e. The a priori statistics are referenced in the table below.
Figure BDA0003136197140000062
5) And (5) calculating confidence.
The paired entity joint disambiguation algorithm selects a pair of entity mentions with the highest confidence coefficient to perform joint disambiguation preferentially, wherein the confidence coefficient mainly comprises row (column) semantic consistency, similarity between the entity mentions and candidate entities and correlation between the entities. Mention of m given a pair of entitiesi,mjAnd their corresponding candidate entity sets CSi,CSjConfidence is defined herein as Γ (m)i,mj). Referring to the following formula, the confidence calculation can be mainly divided into two parts, one part is the similarity between elements related to paired entity links, the other part is the change of row (column) semantic consistency brought by link operation, and the hyper-parameter beta>And 0, the influence degree proportion used for controlling semantic consistency.
Figure BDA0003136197140000063
The similarity calculation mainly comprises three parts, namely the similarity between two entities and respective candidate entities and the correlation between the candidate entities. With reference to the following formula,
Figure BDA0003136197140000064
and
Figure BDA0003136197140000065
respectively as candidate entity sets CSiAnd CSjThe candidate entity of (1); MES is used to calculate the similarity between entity mention and candidate entities, the calculation method uses the deep semantic matching model introduced in section 4.1; EES is used to measure the correlation between the entities to be linked, and the calculation method is the cosine similarity of the pre-training entity vector.
Figure BDA0003136197140000066
In the process of calculating the confidence coefficient, a row (column) semantic consistency value is not directly used, but a row (column) semantic consistency change value is adopted. Δ CSC, see the following equationNAnd Δ RSCNDenotes the mentioning of m for an entityi,mjAnd (5) after the link operation is finished, normalizing the change values of the row and column semantic consistency. Calculating Δ CSCN(or. DELTA. RSC)N) Firstly, calculating the semantic consistency of the columns (or rows) before and after the link, and then carrying out a regularization operation on the variation value of the semantic consistency.
Figure BDA0003136197140000071
The regularization operation is realized based on the following formula, wherein d is a change value of semantic consistency, and if d is greater than 0, the semantic consistency is increased, so that the confidence value is improved. σ in the formula is a logistic sigmoid function and the regularization operation is such that norm (d) is a member of (-0.5, 0.5).
Norm(d)=σ(d)-0.5#
In the example of FIG. 3, when link m is completed4To e4、m12To e12Thereafter, the first and third columns have new linking entities added. At this time, the column semantic consistency in the first column and the third column and the line semantic consistency in the first column and the third column are changed, and when the change is a positive value, it is described that the semantic consistency is increased, and the corresponding confidence degree takes a higher value.
6) Pairwise entity join disambiguation algorithms.
The pair-wise entity joint disambiguation in the table is shown in algorithm 1. The input to the algorithm is all entity references in the table and the corresponding set of candidate entities. The algorithm first combines every two mention entities in the same row and column in the table to generate all entity mention doublets, i.e., mpSet, corresponding to rows 1-8 in algorithm 1. Execution of the algorithm iteration then jointly disambiguates the entities:
1) returning each pair of entity mentions using the top function (m)i,mj) The result to be linked and the corresponding confidence degree correspond to lines 9-14 in the algorithm 1, wherein the confidence degree is calculated by a formula 26.
2) All entity mention bigrams are sorted based on confidence through the mostConf function, and entity mention bigrams with the highest confidence are linked, corresponding to lines 15-16 in algorithm 1.
And continuously iterating in the paired entity joint disambiguation algorithm, and finishing at least one link mentioned by the entity in each iteration to finally realize the links mentioned by all the entities. The link quality mentioned by the entity is ensured by the link process, and the effect of joint disambiguation is exerted to the maximum extent.
Figure BDA0003136197140000072
Figure BDA0003136197140000081
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the specific embodiments described above, which are intended to further illustrate the principles of the invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention, which is intended to be covered by the appended claims. The scope of the invention is defined by the claims and their equivalents.

Claims (2)

1. A Web table-oriented paired entity joint disambiguation method is characterized by comprising the following steps:
1) combining every two entity mentions in the same row and column in a Web table to generate all entity mention binary groups;
2) calculating the confidence degrees when all entity mentions the binary group for linking, linking a pair of entity mentions with the highest confidence degree with respective entities, and deleting other candidate entities mentioned by the pair of entity mentions;
3) the confidence values between the different entity mentions in the table are updated.
4) Iterating said steps 2) and 3) until all entities in the table mention the completion link.
2. The Web table-oriented paired entity joint disambiguation method of claim 1, wherein in step 2), the confidence level is calculated as follows:
2-a) confidence calculation introduces the change information of the column semantic consistency in the linking process, the column semantic consistency is defined as the negative mean of the variance vector, and the column semantic consistency CSC is calculated by the following method:
CSC=-mean(var([e1,e2...,en]))
wherein e1,e2,...,enRepresenting vector representation of a column of linked entities, obtaining a variance vector by var, and obtaining a scalar value representing column semantic consistency by mean through averaging values in the variance vector;
2-b) confidence calculation introduces the change information of line semantic consistency in the linking process, the line semantic consistency is defined as the negative mean value of the relation variance vector, the smaller the variance is, the larger the negative mean value of the relation vector is, the closer the relation of different lines is, the more consistent the line semantic consistency is, and the line semantic consistency RSC is calculated by the following method:
r=enon-subject-esubject
RSC=-mean(var([r1,r2,...,rn]))
wherein esubjectRepresenting linked entities in the primary key column, enon-subjectRepresenting linked entities in non-primary key columns, r representing a relationship vector, var obtaining a variance vector, mean obtaining a scalar value representing row semantic consistency by averaging values in the variance vector, r1,r2,...,rnRepresenting a relationship vector representation formed between different row link entities;
2-c) confidence calculation introduces entity consistency information in the table in the linking process, and the linked entity consistency EES is calculated through cosine similarity represented by entity vectors:
EES(e1,e2)=cosine(e1,e2)
wherein e1,e2Refers to the two entities referring to the corresponding entity vector representation in the pairwise entity joint disambiguation process.
2-d) confidence degree calculation introduces entity mention and candidate entity similarity information, wherein the similarity MES of the entity mention and the candidate entity is calculated by combining cosine similarity and prior probability represented by an entity mention context vector representation and a candidate entity context vector representation, the entity mention context is composed of word bags of all words in the same row and column, the candidate entity context is composed of word bags of all words in entity text description in a knowledge base, the entity mention context vector representation is obtained by the average value of all word vectors in the word bags, and the candidate entity context vector representation is obtained by the average value of all word vectors in the word bags, which is shown as follows:
MES(m,e)=cosine(mcontext,econtext)+P(e|m)
wherein m iscontextA context vector representation representing an entity mentioning m, econtextA context vector representation representing a candidate entity e, P (e | m) representing the probability that m is linked to e;
2-c) method of combining multiple information by confidence calculation, given a pair of entities mentioning mi,mjAnd their corresponding candidate entity sets CSi,CSjConfidence is defined as Γ (m)i,mj) The method comprises two parts of contents, wherein one part of contents is the similarity between elements related to paired entity links, the other part of contents is the change of row/column semantic consistency brought by link operation, and a hyper-parameter beta is more than 0 and is used for controlling the influence degree proportion of the semantic consistency, and the method is specifically as follows:
Figure FDA0003136197130000021
the similarity calculation comprises three parts, namely, the similarity between two entities and respective candidate entities is mentioned, and the correlation between the candidate entities,
Figure FDA0003136197130000022
and
Figure FDA0003136197130000023
respectively as candidate entity sets CSiAnd CSjThe candidate entity in (1); MES is used for calculating the similarity between entity reference and candidate entity; EES is used to measure link entity consistency, Δ CSCNAnd Δ RSCNDenotes the mentioning of m for an entityi,mjAnd (3) after the link operation is finished, the regularization result of the row and column semantic consistency change values is obtained, and the regularization operation is as follows:
Norm(d)=σ(d)-0.5
wherein d is a variation value of semantic consistency, if d is more than 0, the semantic consistency is increased, and the confidence value is further improved. σ in the formula is a logistic sigmoid function and the regularization operation is such that norm (d) is a member of (-0.5, 0.5).
CN202110720148.2A 2021-06-28 2021-06-28 Web table-oriented paired entity joint disambiguation method Pending CN113361283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110720148.2A CN113361283A (en) 2021-06-28 2021-06-28 Web table-oriented paired entity joint disambiguation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110720148.2A CN113361283A (en) 2021-06-28 2021-06-28 Web table-oriented paired entity joint disambiguation method

Publications (1)

Publication Number Publication Date
CN113361283A true CN113361283A (en) 2021-09-07

Family

ID=77536783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110720148.2A Pending CN113361283A (en) 2021-06-28 2021-06-28 Web table-oriented paired entity joint disambiguation method

Country Status (1)

Country Link
CN (1) CN113361283A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003735A (en) * 2021-12-24 2022-02-01 北京道达天际科技有限公司 Knowledge graph question and answer oriented entity disambiguation method based on intelligence document
CN115828854A (en) * 2023-02-17 2023-03-21 东南大学 Efficient table entity linking method based on context disambiguation

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050080613A1 (en) * 2003-08-21 2005-04-14 Matthew Colledge System and method for processing text utilizing a suite of disambiguation techniques
CN101454750A (en) * 2006-03-31 2009-06-10 谷歌公司 Disambiguation of named entities
US20140244550A1 (en) * 2013-02-28 2014-08-28 Microsoft Corporation Posterior probability pursuit for entity disambiguation
CN106503148A (en) * 2016-10-21 2017-03-15 东南大学 A kind of form entity link method based on multiple knowledge base
CN107102989A (en) * 2017-05-24 2017-08-29 南京大学 A kind of entity disambiguation method based on term vector, convolutional neural networks
CN107316062A (en) * 2017-06-26 2017-11-03 中国人民解放军国防科学技术大学 A kind of name entity disambiguation method of improved domain-oriented
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
CN108959461A (en) * 2018-06-15 2018-12-07 东南大学 A kind of entity link method based on graph model
CN109783651A (en) * 2019-01-29 2019-05-21 北京百度网讯科技有限公司 Extract method, apparatus, electronic equipment and the storage medium of entity relevant information
CN109815401A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 A kind of name disambiguation method applied to Web people search
CN110147401A (en) * 2019-05-22 2019-08-20 苏州大学 Merge the knowledge base abstracting method of priori knowledge and context-sensitive degree
CN110442710A (en) * 2019-07-03 2019-11-12 广州探迹科技有限公司 A kind of short text semantic understanding of knowledge based map and accurate matching process and device
CN110704634A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Method and device for checking and repairing knowledge graph link errors and storage medium
CN110765276A (en) * 2019-10-21 2020-02-07 北京明略软件系统有限公司 Entity alignment method and device in knowledge graph
US10733383B1 (en) * 2018-05-24 2020-08-04 Workday, Inc. Fast entity linking in noisy text environments
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN112597305A (en) * 2020-12-22 2021-04-02 上海师范大学 Scientific and technological literature author name disambiguation method based on deep learning and web end disambiguation device
CN112861538A (en) * 2021-02-08 2021-05-28 哈尔滨工业大学 Entity linking method based on context semantic relation and document consistency constraint
CN112883199A (en) * 2021-03-09 2021-06-01 重庆大学 Collaborative disambiguation method based on deep semantic neighbor and multi-entity association

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050080613A1 (en) * 2003-08-21 2005-04-14 Matthew Colledge System and method for processing text utilizing a suite of disambiguation techniques
CN101454750A (en) * 2006-03-31 2009-06-10 谷歌公司 Disambiguation of named entities
US20140244550A1 (en) * 2013-02-28 2014-08-28 Microsoft Corporation Posterior probability pursuit for entity disambiguation
CN106503148A (en) * 2016-10-21 2017-03-15 东南大学 A kind of form entity link method based on multiple knowledge base
US20180137404A1 (en) * 2016-11-15 2018-05-17 International Business Machines Corporation Joint learning of local and global features for entity linking via neural networks
CN107102989A (en) * 2017-05-24 2017-08-29 南京大学 A kind of entity disambiguation method based on term vector, convolutional neural networks
CN107316062A (en) * 2017-06-26 2017-11-03 中国人民解放军国防科学技术大学 A kind of name entity disambiguation method of improved domain-oriented
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
US10733383B1 (en) * 2018-05-24 2020-08-04 Workday, Inc. Fast entity linking in noisy text environments
CN108959461A (en) * 2018-06-15 2018-12-07 东南大学 A kind of entity link method based on graph model
CN109815401A (en) * 2019-01-23 2019-05-28 四川易诚智讯科技有限公司 A kind of name disambiguation method applied to Web people search
CN109783651A (en) * 2019-01-29 2019-05-21 北京百度网讯科技有限公司 Extract method, apparatus, electronic equipment and the storage medium of entity relevant information
CN110147401A (en) * 2019-05-22 2019-08-20 苏州大学 Merge the knowledge base abstracting method of priori knowledge and context-sensitive degree
CN110442710A (en) * 2019-07-03 2019-11-12 广州探迹科技有限公司 A kind of short text semantic understanding of knowledge based map and accurate matching process and device
CN110704634A (en) * 2019-09-06 2020-01-17 平安科技(深圳)有限公司 Method and device for checking and repairing knowledge graph link errors and storage medium
CN110765276A (en) * 2019-10-21 2020-02-07 北京明略软件系统有限公司 Entity alignment method and device in knowledge graph
CN111581973A (en) * 2020-04-24 2020-08-25 中国科学院空天信息创新研究院 Entity disambiguation method and system
CN112597305A (en) * 2020-12-22 2021-04-02 上海师范大学 Scientific and technological literature author name disambiguation method based on deep learning and web end disambiguation device
CN112861538A (en) * 2021-02-08 2021-05-28 哈尔滨工业大学 Entity linking method based on context semantic relation and document consistency constraint
CN112883199A (en) * 2021-03-09 2021-06-01 重庆大学 Collaborative disambiguation method based on deep semantic neighbor and multi-entity association

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIANXING WU,等: "Neural Paired Entity Linking inWeb Tables", ACM JOURNALS, 19 March 2024 (2024-03-19), pages 1 - 15 *
辛涛,等: "基于组合特征的Web人名消歧方法", 计算机系统应用, vol. 24, no. 11, 31 December 2015 (2015-12-31), pages 162 - 166 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003735A (en) * 2021-12-24 2022-02-01 北京道达天际科技有限公司 Knowledge graph question and answer oriented entity disambiguation method based on intelligence document
CN115828854A (en) * 2023-02-17 2023-03-21 东南大学 Efficient table entity linking method based on context disambiguation
CN115828854B (en) * 2023-02-17 2023-05-02 东南大学 Efficient table entity linking method based on context disambiguation

Similar Documents

Publication Publication Date Title
Sun et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
CN110188228B (en) Cross-modal retrieval method based on sketch retrieval three-dimensional model
CN112015868B (en) Question-answering method based on knowledge graph completion
US8301638B2 (en) Automated feature selection based on rankboost for ranking
CN112434517B (en) Community question-answering website answer ordering method and system combined with active learning
CN103778227A (en) Method for screening useful images from retrieved images
CN112966091B (en) Knowledge map recommendation system fusing entity information and heat
CN109389151A (en) A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion
CN113361283A (en) Web table-oriented paired entity joint disambiguation method
CN110598022B (en) Image retrieval system and method based on robust deep hash network
Wang et al. Joint label completion and label-specific features for multi-label learning algorithm
CN104731882A (en) Self-adaptive query method based on Hash code weighting ranking
CN116383422B (en) Non-supervision cross-modal hash retrieval method based on anchor points
CN109933720A (en) A kind of dynamic recommendation method based on user interest Adaptive evolution
CN108733745B (en) Query expansion method based on medical knowledge
CN111090765B (en) Social image retrieval method and system based on missing multi-modal hash
CN111079011A (en) Deep learning-based information recommendation method
CN111709475B (en) N-gram-based multi-label classification method and device
Ma et al. A natural scene recognition learning based on label correlation
CN111723179A (en) Feedback model information retrieval method, system and medium based on concept map
CN114972959B (en) Remote sensing image retrieval method for sample generation and in-class sequencing loss in deep learning
CN113191450B (en) Weak supervision target detection algorithm based on dynamic label adjustment
Jomaa et al. Hyperparameter optimization with differentiable metafeatures
CN114840639A (en) ConceptNet-based information retrieval query expansion method
Xu et al. Cross-media retrieval based on pseudo-label learning and semantic consistency algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination