CN111522911B - Entity linking method, device, equipment and storage medium - Google Patents

Entity linking method, device, equipment and storage medium Download PDF

Info

Publication number
CN111522911B
CN111522911B CN202010298036.8A CN202010298036A CN111522911B CN 111522911 B CN111522911 B CN 111522911B CN 202010298036 A CN202010298036 A CN 202010298036A CN 111522911 B CN111522911 B CN 111522911B
Authority
CN
China
Prior art keywords
entity
storage
score
candidate
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010298036.8A
Other languages
Chinese (zh)
Other versions
CN111522911A (en
Inventor
张发恩
姜勇越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Innovation Qizhi Qingdao Technology Co ltd
Original Assignee
Innovation Qizhi Qingdao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Innovation Qizhi Qingdao Technology Co ltd filed Critical Innovation Qizhi Qingdao Technology Co ltd
Priority to CN202010298036.8A priority Critical patent/CN111522911B/en
Publication of CN111522911A publication Critical patent/CN111522911A/en
Application granted granted Critical
Publication of CN111522911B publication Critical patent/CN111522911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an entity linking method, an entity linking device, equipment and a storage medium, wherein the entity linking method comprises the following steps: extracting text information of an entity to be put in storage, wherein the text information comprises an author name of the entity to be put in storage, and the entity to be put in storage represents a thesis to be put in storage; searching an entity matched with the name of the author of the entity to be warehoused in a scientific research worker entity library at least according to the name of the author of the entity to be warehoused and obtaining text information of a candidate entity; comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result; and when the comparison result represents that the entity to be put in storage and the candidate entity are the same author, linking the entity to be put in storage and the candidate entity. The method and the device can realize the connection between the entity to be put in storage and the entity in the entity library.

Description

Entity linking method, device, equipment and storage medium
Technical Field
The present application relates to the field of computing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for entity linking.
Background
With the emergence of a large number of papers, researchers expect that a paper management platform can provide more services while providing paper downloading.
At present, domestic thesis management platforms such as the Hopkins, all parties, the Uygur and the like can provide review, recommendation and downloading of the thesis according to the fields in which scientific research workers are interested, but most of the thesis management platforms do not carry out entity link aiming at the authors of the thesis, namely, the scientific research workers extracted from the thesis are not the scientific research workers with the same name in real life, but choose to use a recognition mode to enable the authors themselves to judge whether the thesis is the self thesis, and the processing is not beneficial to deep analysis, such as mining of interpersonal relationships such as the same gate and students.
Disclosure of Invention
The application aims to disclose an entity linking method, an entity linking device and a storage medium, which are used for realizing entity linking so as to carry out deep analysis on a thesis as an entity.
A first aspect of the present application discloses an entity linking method, which includes:
extracting text information of an entity to be put in storage, wherein the text information comprises an author name of the entity to be put in storage, and the entity to be put in storage represents a thesis to be put in storage;
searching an entity matched with the name of the author of the entity to be put in storage in a scientific research worker entity library at least according to the name of the author of the entity to be put in storage and obtaining text information of a candidate entity;
comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
and when the comparison result represents that the entity to be put in storage and the candidate entity are the same author, linking the entity to be put in storage and the candidate entity.
In the first aspect of the application, an entity matched with the name of the author of the entity to be put in storage is retrieved in a scientific research worker entity library through the name of the author of the entity to be put in storage, text information of the candidate entity is obtained, the text information of the candidate entity and the text information of the entity to be put in storage can be compared, a comparison result is determined, and therefore the entity to be put in storage and the candidate entity can be linked according to the comparison result.
As an optional implementation manner, after extracting the text information of the entity to be warehoused, before retrieving, in the scientific research worker entity library, an entity matched with the author name of the entity to be warehoused according to the author name of the entity to be warehoused at least and obtaining the text information of the candidate entity, the method further includes:
obtaining a name expansion set of the entity to be warehoused according to the author name of the entity to be warehoused;
and at least searching an entity matched with the author name of the entity to be put in storage in a scientific research worker entity library according to the author name of the entity to be put in storage and obtaining text information of the candidate entity, wherein the text information comprises the following steps:
and searching an entity matched with the author name of the entity to be warehoused in the scientific research worker entity library according to the author name and the name extension set of the entity to be warehoused, and obtaining text information of the candidate entity.
In this optional embodiment, the name extension set of the entity to be put in storage can be obtained according to the author name of the entity to be put in storage, and then the entity matched with the author name of the entity to be put in storage can be retrieved in the scientific research worker entity library according to the author name and the name extension set of the entity to be put in storage, and the text information of the candidate entity can be obtained.
As an optional implementation manner, the comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result includes:
and when the text information of the candidate entity and the text information of the entity to be put in storage have intersection, determining that the comparison result is that the candidate entity and the entity to be put in storage are the same author.
In this optional implementation manner, when there is an intersection between the text information of the candidate entity and the text information of the entity to be put in storage, it is determined that the comparison result is that the candidate entity and the entity to be put in storage are the same author.
As an optional implementation manner, the comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result includes:
when the text information of the candidate entity and the text information of the entity to be put in storage do not have an intersection, calculating a link score of the candidate entity and the entity to be put in storage;
and determining the comparison result according to the link scores of the candidate entity and the entity to be put in storage.
In this optional embodiment, when there is no intersection between the text information of the candidate entity and the text information of the entity to be put in storage, the comparison result can be determined by calculating the link score between the candidate entity and the entity to be put in storage and further according to the link score between the candidate entity and the entity to be put in storage.
As an optional implementation manner, the link score between the candidate entity and the entity to be warehoused is calculated according to the following formula:
Score=ω 0 U+ω 1 L+ω 2 T+D;
wherein, the Sore represents the link score, the U represents the unit score between the entity to be put in storage and the candidate entity, and the L tableCharacterizing a research field score between the entity to be warehoused and the candidate entity, T characterizing a binding relationship score between the entity to be warehoused and the candidate entity, omega i I =0,1,2 represents the weighting coefficient of the unit score, the weighting coefficient of the study region score, the weighting coefficient of the literary relationship score, and D represents a correction coefficient.
In this alternative embodiment, the link score of the entity to be put in storage can be obtained from the research area score, the unit score, and the literary relationship score.
As an optional implementation manner, the determining the comparison result according to the link score of the candidate entity and the entity to be warehoused includes:
comparing the link score with a first preset threshold;
and when the link score is greater than or equal to the first preset threshold value, determining that the candidate entity and the entity to be warehoused are the same author as each other according to the comparison result.
As an optional implementation manner, the determining the comparison result according to the link scores of the candidate entity and the entity to be warehoused further includes:
when the link score is smaller than the preset threshold value, judging whether the entity to be put in storage is a academic paper, if so, recalculating the link score according to the unit score, the research field score, the literary relationship score, the tutor item score and the academic score of the entity to be put in storage and the candidate entity;
and comparing the recalculated link score with a second preset threshold, and if the recalculated link score is greater than or equal to the second preset threshold, determining that the candidate entity and the entity to be put in storage are the same author as each other as the comparison result.
In this optional embodiment, when the link score is smaller than the first preset threshold, it is determined whether the entity to be put in storage is a academic paper, and if so, the link score is recalculated according to the unit score, the research field score, the partnering relationship score, the tutor item score and the academic score of the entity to be put in storage and the candidate entity, so as to determine whether the candidate entity and the entity to be put in storage are the same author according to the comparison result of the link score and the second preset threshold.
A second aspect of the present application discloses an entity linking apparatus, the apparatus comprising:
the system comprises an extraction module, a storage module and a storage module, wherein the extraction module is used for extracting text information of an entity to be stored, the text information comprises an author name of the entity to be stored, and the entity to be stored represents a thesis to be stored;
the retrieval module is used for retrieving an entity matched with the author name of the entity to be put in storage in the scientific research worker entity library at least according to the author name of the entity to be put in storage and obtaining text information of a candidate entity;
the comparison module is used for comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
and the linking module is used for linking the entity to be warehoused with the candidate entity when the comparison result represents that the entity to be warehoused and the candidate entity are the same author.
The device in the second aspect of the application searches entities matched with the author names of the entities to be warehoused in the scientific research worker entity library through the author names of the entities to be warehoused by executing the entity linking method, obtains text information of the candidate entities, can compare the text information of the candidate entities with the text information of the entities to be warehoused and determine a comparison result, and therefore the entities to be warehoused and the candidate entities can be linked according to the comparison result.
A third aspect of the present application discloses an entity linking apparatus, the apparatus including:
a processor; and
a memory configured to store machine-readable instructions that, when executed by the processor, perform the entity linking method disclosed herein.
The device of the third aspect of the present application searches, by executing the entity linking method, an entity matching the author name of the entity to be put in storage in the scientific research worker entity library by using the author name of the entity to be put in storage, and obtains text information of a candidate entity, and can compare the text information of the candidate entity with the text information of the entity to be put in storage and determine a comparison result, so that the entity to be put in storage and the candidate entity can be linked according to the comparison result.
A fourth aspect of the present application discloses a storage medium storing a computer program which, when executed by a processor, performs the entity linking method of the present application.
In the storage medium of the fourth aspect of the present application, by executing the entity linking method, the entity matched with the author name of the entity to be put in storage is retrieved in the scientific research worker entity library by the author name of the entity to be put in storage, and the text information of the candidate entity is obtained, so that the text information of the candidate entity and the text information of the entity to be put in storage can be compared, and a comparison result can be determined, and the entity to be put in storage and the candidate entity can be linked according to the comparison result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart illustrating an entity linking method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an entity linking apparatus disclosed in the second embodiment of the present application;
fig. 3 is a schematic structural diagram of an entity linking device disclosed in the third embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Example one
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an entity linking method according to an embodiment of the present disclosure. As shown in fig. 1, the method comprises the steps of:
101. extracting text information of an entity to be put in storage, wherein the text information comprises an author name of the entity to be put in storage, and the entity to be put in storage represents a thesis to be put in storage;
102. searching entities matched with the author names of the entities to be warehoused in a scientific research worker entity library at least according to the author names of the entities to be warehoused and obtaining text information of candidate entities;
103. comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
104. and when the comparison result indicates that the entity to be warehoused and the candidate entity are the same author, linking the entity to be warehoused and the candidate entity.
In the embodiment of the application, the entities to be put in storage refer to paper documents, wherein in the scientific research worker entity library, each paper document is a node, and each node is an entity. For example, the knowledge graph of a paper document includes 3 nodes, the 3 nodes are divided into a paper by author a, a paper by author B, and a paper by author C, and further, each entity has an attribute of the entity, for example, the attribute of the paper by author B has an attribute of author name, a paper type, and the like.
In the embodiment of the present application, the text information of the entity to be warehoused includes a paper title, a paper abstract, a keyword list, an author list, a cited document list, and a paper type (academic paper, journal paper, or meeting paper). Further, the paper type may be one of a academic paper, a journal paper, and a conference paper. Further, when the paper type is a academic paper, the text information of the entity to be put in storage further includes a paper grade (scholars, masters, and doctors), an educational experience, an author specialty, names of instructors, and a middle chart classification number, wherein the paper grade is one of scholars, masters, and doctors.
Exemplarily, it is assumed that the scientific research worker entity library has two entities, which are respectively represented by "a" and "B", and when the text information of the entity to be warehoused is extracted, the text information of the entity to be warehoused is respectively compared with the attribute information of the entity a and the attribute information of the entity B, if the author of the entity a is the same as the author of the entity to be warehoused, the entity to be warehoused is stored in the scientific research worker entity library, and the binding relationship between the entity to be warehoused and the entity a is established, so that when the user retrieves the entity a, the entity to be warehoused bound with the entity a can be provided to the user.
Therefore, in the embodiment of the application, the entity matched with the author name of the entity to be warehoused is searched in the scientific research worker entity library through the author name of the entity to be warehoused, the text information of the candidate entity is obtained, the text information of the candidate entity and the text information of the entity to be warehoused can be compared, the comparison result is determined, and the entity to be warehoused and the candidate entity can be linked according to the comparison result. In the prior art, entity linkage combines methods such as rules, machine learning and deep learning, a large number of training samples are needed in the method, the data volume of the required labeled data is greatly increased along with the increase of attributes, and on the other hand, a large number of labeled samples limit machine learning and deep learning, so that the method has high recall rate but low accuracy.
As an alternative embodiment, in step 101: after extracting the text information of the entity to be put in storage, step 102: before searching an entity matched with the name of the author of the entity to be put in storage in a scientific research worker entity library at least according to the name of the author of the entity to be put in storage and obtaining text information of a candidate entity, the method also comprises the following steps:
and obtaining a name expansion set of the entity to be warehoused according to the author name of the entity to be warehoused.
Further, step 102: the concrete steps of searching the entity matched with the name of the author of the entity to be warehoused in the scientific research worker entity library at least according to the name of the author of the entity to be warehoused and obtaining the text information of the candidate entity are as follows:
and searching entities matched with the author names of the entities to be warehoused in the scientific research worker entity library according to the author names and the name extension sets of the entities to be warehoused, and obtaining text information of the candidate entities.
Exemplarily, assuming that the author name of the entity to be put in storage is "old AaAb", the name extension set of the entity to be put in storage can be obtained as { "Chen AA" according to the expression rule of the names in chinese and english; "Chen AaAb"; "Chen A A", "Chen A.A"; "AaAb Chen"; "A ache" }
Therefore, in the optional embodiment, the name extension set of the entity to be warehoused can be obtained according to the author name of the entity to be warehoused, and then the entity matched with the author name of the entity to be warehoused can be searched in the entity library of the scientific research worker according to the author name and the name extension set of the entity to be warehoused, and the text information of the candidate entity can be obtained.
As an alternative implementation, step 103: comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining the comparison result, comprising the substeps of:
and when the text information of the candidate entity and the text information of the entity to be warehoused have intersection, determining that the candidate entity and the entity to be warehoused are the same author as the comparison result.
Exemplarily, under the condition that an entity which is the same as the author name of the entity to be warehoused but does not belong to the same person exists in the entity library of the scientific research worker, assuming that the entity library of the scientific research worker has two entities, which are respectively represented by "a" and "B", the entity a and the entity B are both the same as the author name of the entity to be warehoused, and at this time, the entity a and the entity B are taken as candidate entities by comparing the attributes of the entity a and the entity B with the author name of the entity to be warehoused. Furthermore, the attributes of the entity A and the entity B are compared with the text information of the entity to be put in storage, and if the thesis title of the entity A is the same as that of the entity to be put in storage A, it is determined that an intersection exists between the entity to be put in storage and the entity A.
In this optional embodiment, when there is an intersection between the text information of the candidate entity and the text information of the entity to be put in storage, it is determined that the comparison result is that the candidate entity and the entity to be put in storage are the same author.
As an optional implementation manner, the step 103 compares the text information of the candidate entity with the text information of the entity to be put into storage and determines the comparison result, and further includes the sub-steps of:
when the text information of the candidate entity and the text information of the entity to be put in storage do not have intersection, calculating the link score of the candidate entity and the entity to be put in storage;
and determining a comparison result according to the link scores of the candidate entity and the entity to be put in storage.
In this optional embodiment, when there is no intersection between the text information of the candidate entity and the text information of the entity to be put in storage, the comparison result can be determined by calculating the link score between the candidate entity and the entity to be put in storage and further according to the link score between the candidate entity and the entity to be put in storage.
As an alternative implementation, the link score between the candidate entity and the entity to be put in storage is calculated according to the following formula:
Score=ω 0 U+ω 1 L+ω 2 T+D;
wherein, the Sore represents the link score, the U represents the unit score between the entity to be put in storage and the candidate entity, the L represents the research field score between the entity to be put in storage and the candidate entity, the T represents the binding relationship score between the entity to be put in storage and the candidate entity, and omega represents the relationship between the entity to be put in storage and the candidate entity i I =0,1,2 represents a weighting coefficient of the unit score, a weighting coefficient of the research field score, a weighting coefficient of the conjunction relation score, and D represents a correction coefficient.
Exemplarily, it is assumed that the research field of the entity to be put in storage is { computer; artificial intelligence; a knowledge graph; thesis knowledge graph }, and the research field of the candidate entity is { computer; artificial intelligence; image processing }, then the research field between the candidate entity and the entity to be put in storage is divided into 2 points, i.e. the two-stage research field of the candidate entity is the same as the entity to be put in storage.
In the embodiment of the application, the research field of the entity has various representation forms, on one hand, the entity can be directly represented by keywords, on the other hand, a mapping dictionary is made, the keywords are mapped to more uniform general names, and on the other hand, the entity can be represented by middle drawing classification numbers.
In this embodiment of the application, optionally, the research field of the entity to be warehoused may be obtained by extracting information from the profile content.
In this embodiment of the application, optionally, when the research field of the entity to be warehoused includes a plurality of keywords, selecting the keywords with the score greater than the pre-averaging threshold value from the plurality of keywords and averaging, thereby determining the research score between the entity to be warehoused and the candidate entity.
Exemplarily, it is assumed that the research field of the entity to be put in storage is { computer; artificial intelligence; a knowledge graph; thesis knowledge-graph, and the research field of the candidate entity is { computer; intelligent processing; image processing, wherein a keyword 'computer' in an entity to be put in storage is the same as the keyword 'computer' in a candidate entity, the comparison score of the keyword is 1, the keyword 'artificial only' in the entity to be put in storage is not completely the same as the keyword 'intelligent processing' in the candidate entity, the comparison score of the keyword is 0.5, the keywords 'knowledge graph' and 'thesis graph' in the entity to be put in storage are different from all the keywords in the candidate entity, so that the comparison score is 0, and thus, the keywords 'computer' and 'artificial intelligence' which are greater than a pre-average threshold value of 0.5 are selected, and the average value is obtained to obtain a research field score of 0.7.
In the embodiment of the application, the unit score between the entity to be warehoused and the candidate entity is determined by comparing the learning experience or the working experience between the entity to be warehoused and the candidate entity within a specified time period. For example, a paper of an entity to be warehoused is published in 2011, a research unit is AA university, and a working experience of a candidate entity is { (2007-2010, BB university); (2010-2011; AA university, matching the entity to be warehoused with the unit part of the candidate entity, and then dividing the unit score into 0.9.
In the embodiment of the application, the comparison result of the learning experience or the working experience between the entity to be warehoused and the candidate entity within the specified time period may be partial matching, or may also be complete matching or complete mismatching, where complete matching indicates that the learning experience or the working experience of the candidate entity and the entity to be warehoused is completely the same, and complete mismatching indicates that the learning experience or the working experience of the candidate entity and the entity to be warehoused is completely different. Further, the unit scores corresponding to partial match, complete mismatch may be 0.9 point, 1 point, 0 point.
In the embodiment of the present application, the co-culture relationship represents an intersection of co-workers of the candidate entity and the entity to be warehoused, wherein the score of the co-culture relationship of the scores is calculated in a stepwise manner according to the number of the co-workers, for example, if more than three co-workers of the candidate entity and the entity to be warehoused are provided, the score of the co-culture relationship is 1, and further, when two co-workers of the three co-workers appear in the same article, the score of the co-culture relationship is also 1. For another example, if there are only two co-authors of the candidate entity and the entity to be warehoused, the co-culture relation score is 0.8, if there is only one co-author of the candidate entity and the entity to be warehoused, the co-culture relation score is 0.6, and if there is no co-culture relation score, the co-culture relation score is 0.
In the embodiment of the present application, how to determine whether the author of the candidate entity and the author of the entity to be warehoused belong to the same person may be determined according to a comparison result of research units of the authors, that is, if the research units of the author of the candidate entity are the same as the research units of the author of the entity to be warehoused, the author of the candidate entity and the author of the entity to be warehoused belong to the same person, so that the co-working relationship score may be determined by determining the number of co-authors of the candidate entity and the entity to be warehoused.
As can be seen, in this alternative embodiment, the link score of the entity to be put in storage can be obtained according to the research area score, the unit score, and the literary relationship score.
Exemplarily, it is assumed that the research field between the entity to be put in storage and the candidate entity is divided into 2 points, and the weight is 0.1; the unit score between the entity to be put in storage and the candidate entity is 3, and the weight is 0.2; and 4 scores are given to the binding relationship between the entity to be put in storage and the candidate entity, and the weight is 0.7, so that the link score between the candidate entity and the entity to be put in storage is 3.6.
In the embodiment of the present application, the correction coefficient indicates a case for correcting a result when a special case is encountered, so that a case considered to be true by some people is realized. For example, when one of the unit score, the research area score and the literary composition score is zero, the influence weight of one of the unit score, the research area score and the literary composition score is invalid, and this can be avoided by the correction factor in the embodiment of the present application.
As an alternative embodiment, the steps: determining a comparison result according to the link scores of the candidate entity and the entity to be put in storage, comprising the following substeps:
comparing the link score with a first preset threshold;
and when the link score is greater than or equal to a first preset threshold value, determining that the candidate entity and the entity to be put in storage are the same author as each other as a comparison result.
In an alternative embodiment, the first preset threshold may be preset, for example, the first preset threshold may be set to 5.
As an alternative embodiment, the steps: determining a comparison result according to the link scores of the candidate entity and the entity to be put in storage, and further comprising the substeps of:
when the link score is smaller than a first preset threshold value, judging whether the entity to be put in storage is a academic paper, if so, recalculating the link score according to the unit score, the research field score, the literary relation score, the tutor item score and the academic score of the entity to be put in storage and the candidate entity;
and comparing the recalculated link score with a second preset threshold, and if the recalculated link score is greater than or equal to the second preset threshold, determining that the candidate entity and the entity to be put in storage are the same author.
In this alternative embodiment, the second preset threshold may be preset, for example, the second preset threshold may be set to 6.
In this optional embodiment, the teacher score is determined according to a comparison result between the teacher of the entity to be warehoused and the teacher of the candidate entity, for example, if there is a common teacher between the entity to be warehoused and the candidate entity, the teacher score is 1, and if there are two, the teacher score is 2.
In this optional embodiment, the degree score is determined according to a comparison result between the degree of the entity to be put in storage and the degree of the candidate entity, for example, if the degrees of the entity to be put in storage and the candidate entity are the same degree, the degree score is 1 score, and if the degrees of the entity to be put in storage and the candidate entity are the same degree, the degree score is 2 score.
It can be seen that, in this optional embodiment, when the link score is smaller than the first preset threshold, it is determined whether the entity to be put in storage is a academic thesis, if yes, the link score is recalculated according to the unit score, the research field score, the parturient relationship score, the tutor item score and the academic degree score of the entity to be put in storage and the candidate entity, and then it is determined whether the candidate entity and the entity to be put in storage are the same author according to the comparison result of the link score and the second preset threshold.
In this embodiment of the application, optionally, after the entity to be warehoused is connected with the candidate entity a, if the candidate entity a has a binding relationship with the entity C in the scientist entity library before that, the link between the entity to be warehoused and the entity C may be implemented according to steps 101, 102, and 104 of this embodiment of the application.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of an entity linking device according to an embodiment of the present disclosure. As shown in fig. 2, the apparatus includes an extracting module 201, a retrieving module 202, a comparing module 203, and a linking module 204, wherein:
the extraction module 201 is configured to extract text information of an entity to be warehoused, where the text information includes an author name of the entity to be warehoused, and the entity to be warehoused represents a paper to be warehoused;
the retrieval module 202 is configured to retrieve, in the scientific research worker entity library, an entity matched with the author name of the entity to be put in storage according to the author name of the entity to be put in storage at least and obtain text information of a candidate entity;
the comparison module 203 is used for comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
and the linking module 204 is configured to link the entity to be warehoused with the candidate entity when the comparison result indicates that the entity to be warehoused and the candidate entity are the same author.
In the embodiment of the application, the entities to be put in storage refer to paper documents, wherein in the scientific research worker entity library, each paper document is a node, and each node is an entity. For example, the knowledge graph of a paper document includes 3 nodes, the 3 nodes are divided into a paper by author a, a paper by author B, and a paper by author C, and further, each entity has an attribute of the entity, for example, the attribute of the paper by author B has an attribute of author name, a paper type, and the like.
In the embodiment of the present application, the text information of the entity to be warehoused includes a paper title, a paper abstract, a keyword list, an author list, a cited document list, and a paper type (academic paper, journal paper, or meeting paper). Further, the paper type may be one of a academic paper, a journal paper, and a conference paper. Further, when the paper type is a academic paper, the text information of the entity to be put in storage further includes paper grades (scholars, masters, doctors), education experiences, author specialties, names of instructors and Chinese atlas classification numbers, wherein the paper grades are one of scholars, masters and doctors.
Exemplarily, it is assumed that the scientific research worker entity library has two entities, which are respectively represented by "a" and "B", and when the text information of the entity to be warehoused is extracted, the text information of the entity to be warehoused is respectively compared with the attribute information of the entity a and the attribute information of the entity B, if the author of the entity a is the same as the author of the entity to be warehoused, the entity to be warehoused is stored in the scientific research worker entity library, and the binding relationship between the entity to be warehoused and the entity a is established, so that when the user retrieves the entity a, the entity to be warehoused bound with the entity a can be provided to the user.
Therefore, in the embodiment of the application, the entity matched with the author name of the entity to be warehoused is searched in the scientific research worker entity library through the author name of the entity to be warehoused, the text information of the candidate entity is obtained, the text information of the candidate entity and the text information of the entity to be warehoused can be compared, and the comparison result is determined, so that the entity to be warehoused and the candidate entity can be linked according to the comparison result, and therefore the entity connection in the embodiment of the application has better recall rate and better accuracy rate. In the prior art, entity linkage combines methods such as rules, machine learning and deep learning, a large number of training samples are needed in the method, the data volume of the required labeled data is greatly increased along with the increase of attributes, and on the other hand, a large number of labeled samples limit machine learning and deep learning, so that the method has high recall rate but low accuracy.
As an optional implementation manner, the entity linking apparatus further includes an extension module, where the extension module is configured to obtain a name extension set of the entity to be put in storage according to an author name of the entity to be put in storage.
Further, the specific way for the retrieval module 202 to retrieve the entity matched with the author name of the entity to be warehoused from the scientific research worker entity library according to the author name of the entity to be warehoused and obtain the text information of the candidate entity is as follows:
and searching entities matched with the author names of the entities to be warehoused in the scientific research worker entity library according to the author names and the name extension sets of the entities to be warehoused, and obtaining text information of the candidate entities.
Exemplarily, assuming that the author name of the entity to be put in storage is "old AaAb", the name extension set of the entity to be put in storage can be obtained as { "Chen AA" according to the expression rule of the names in chinese and english; "Chen AaAb"; "Chen A A", "Chen A.A"; "AaAb Chen"; "A ache" }
Therefore, in the optional embodiment, the name extension set of the entity to be warehoused can be obtained according to the author name of the entity to be warehoused, and then the entity matched with the author name of the entity to be warehoused can be searched in the entity library of the scientific research worker according to the author name and the name extension set of the entity to be warehoused, and the text information of the candidate entity can be obtained.
As an alternative embodiment, the specific way for the comparison module 203 to compare the text information of the candidate entity with the text information of the entity to be put in storage and determine the comparison result is as follows:
when the text information of the candidate entity and the text information of the entity to be put in storage have intersection, determining that the comparison result is that the candidate entity and the entity to be put in storage are the same author;
when the text information of the candidate entity and the text information of the entity to be put in storage do not have intersection, calculating the link score of the candidate entity and the entity to be put in storage;
and determining a comparison result according to the link scores of the candidate entity and the entity to be put in storage.
Exemplarily, under the condition that an entity which is the same as the author name of the entity to be warehoused but does not belong to the same person exists in the entity library of the scientific research worker, assuming that the entity library of the scientific research worker has two entities, which are respectively represented by "a" and "B", the entity a and the entity B are both the same as the author name of the entity to be warehoused, and at this time, the entity a and the entity B are taken as candidate entities by comparing the attributes of the entity a and the entity B with the author name of the entity to be warehoused. Further, the attributes of the entity A and the entity B are compared with the text information of the entity to be put in storage, and if the thesis question of the entity A is the same as the thesis question of the entity to be put in storage, it is judged that the entity to be put in storage and the entity A have intersection.
In this optional embodiment, when there is an intersection between the text information of the candidate entity and the text information of the entity to be put in storage, it is determined that the comparison result is that the candidate entity and the entity to be put in storage are the same author. On the other hand, when the text information of the candidate entity and the text information of the entity to be put in storage do not have an intersection, the comparison result can be determined according to the link scores of the candidate entity and the entity to be put in storage by calculating the link scores of the candidate entity and the entity to be put in storage.
As an alternative embodiment, the comparing module 203 calculates the link score between the candidate entity and the entity to be put in storage according to the following formula:
Score=ω 0 U+ω 1 L+ω 2 T+D;
wherein, the Sore represents the link score, the U represents the unit score between the entity to be put in storage and the candidate entity, the L represents the research field score between the entity to be put in storage and the candidate entity, the T represents the binding relationship score between the entity to be put in storage and the candidate entity, and omega represents the relationship between the entity to be put in storage and the candidate entity i I =0,1,2 represents a weighting coefficient of the unit score, a weighting coefficient of the research area score, a weighting coefficient of the consensus score, and D represents a correction coefficient.
Exemplarily, it is assumed that the research field of the entity to be put in storage is { computer; artificial intelligence; a knowledge graph; thesis knowledge-graph, and the research field of the candidate entity is { computer; artificial intelligence; image processing }, then the research field between the candidate entity and the entity to be put in storage is divided into 2 points, i.e. the two-stage research field of the candidate entity is the same as the entity to be put in storage.
In the embodiment of the application, the research field of the entity has various representation forms, on one hand, the entity can be directly represented by keywords, on the other hand, a mapping dictionary is made, the keywords are mapped to more uniform general names, and on the other hand, the entity can be represented by middle drawing classification numbers.
In this embodiment of the application, optionally, the research field of the entity to be warehoused may be obtained by extracting information from the profile content.
In this embodiment, optionally, when the research field of the entity to be put in storage includes a plurality of keywords, selecting keywords with scores greater than a pre-average threshold value from the plurality of keywords and averaging the keywords, thereby determining the research score between the entity to be put in storage and the candidate entity.
Exemplarily, it is assumed that the research field of the entity to be put in storage is { computer; artificial intelligence; a knowledge graph; thesis knowledge-graph, and the research field of the candidate entity is { computer; intelligent processing; image processing }, wherein a keyword 'computer' in the entity to be warehoused is the same as the keyword 'computer' in the candidate entity, the comparison score of the keyword is 1, the keyword 'artificial only' in the entity to be warehoused is not completely the same as the keyword 'intelligent processing' in the candidate entity, the comparison score of the keyword is 0.5, the keyword 'knowledge graph' and the 'paper knowledge graph' in the entity to be warehoused are different from all the keywords in the candidate entity, so that the comparison score is 0, and thus, the keywords 'computer' and 'artificial intelligence' which are more than the pre-average threshold value of 0.5 are selected, and the average value is calculated to obtain the research field score of 0.7.
In the embodiment of the application, the unit score between the entity to be warehoused and the candidate entity is determined by comparing the learning experience or the working experience between the entity to be warehoused and the candidate entity within a specified time period. For example, a paper of an entity to be warehoused is published in 2011, a research unit is AA university, and a working experience of a candidate entity is { (2007-2010, BB university); (2010-2011; AA university, entity to be warehoused matches the unit part of the candidate entity, and then the unit score is 0.9.
In the embodiment of the application, the comparison result of the study experience or the work experience between the entity to be warehoused and the candidate entity in the specified time period may be partial matching, or may also be complete matching or complete mismatching, where complete matching indicates that the study experience or the work experience of the candidate entity and the entity to be warehoused is completely the same, and complete mismatching indicates that the study experience or the work experience of the candidate entity and the entity to be warehoused is completely different. Further, the unit scores corresponding to partial match, complete mismatch may be 0.9 point, 1 point, 0 point.
In the embodiment of the present application, the co-working relation represents an intersection of co-workers of the candidate entity and the entity to be warehoused, wherein the co-working relation score of the scores is calculated in a step manner according to the number of the co-workers, for example, if more than three co-workers of the candidate entity and the entity to be warehoused exist, the co-working relation score is 1, and further, when two co-workers of the three co-workers appear in the same article, the co-working relation score is 1. For another example, if there are only two co-authors of the candidate entity and the entity to be warehoused, the co-ordination relationship score is 0.8, if there is only one co-author of the candidate entity and the entity to be warehoused, the co-ordination relationship score is 0.6, and if there is no co-ordination relationship score, the co-ordination relationship score is 0.
In the embodiment of the present application, how to determine whether the author of the candidate entity and the author of the entity to be warehoused belong to the same person may be determined according to a comparison result of research units of the authors, that is, if the research units of the author of the candidate entity are the same as the research units of the author of the entity to be warehoused, the author of the candidate entity and the author of the entity to be warehoused belong to the same person, so that the co-working relationship score may be determined by determining the number of co-authors of the candidate entity and the entity to be warehoused.
It can be seen that, in this optional embodiment, the link score of the entity to be put in storage can be obtained according to the research field score, the unit score and the conjunction relation score.
Exemplarily, it is assumed that the research field between the entity to be put in storage and the candidate entity is divided into 2 points, and the weight is 0.1; the unit score between the entity to be put in storage and the candidate entity is 3, and the weight is 0.2; and 4 scores are given to the binding relationship between the entity to be put in storage and the candidate entity, and the weight is 0.7, so that the link score between the candidate entity and the entity to be put in storage is 3.6.
In the embodiment of the present application, the correction coefficient indicates a case for correcting a result when a special case is encountered, so that a case considered to be true by some people is realized. For example, when one of the unit score, the research area score and the conjunction relation score is zero, the influence weight of one of the unit score, the research area score and the conjunction relation score is invalid, and this can be avoided by the correction coefficient in the embodiment of the present application.
As an optional implementation manner, the specific manner of the comparison module 203 determining the comparison result according to the link scores of the candidate entity and the entity to be put in storage is as follows:
comparing the link score with a first preset threshold;
when the link score is greater than or equal to a first preset threshold value, determining that the candidate entity and the entity to be put in storage are the same author as each other according to the comparison result;
when the link score is smaller than a first preset threshold value, judging whether the entity to be put in storage is a academic paper, if so, recalculating the link score according to the unit score, the research field score, the literary relation score, the instructor item score and the academic score of the entity to be put in storage and the candidate entity;
and comparing the recalculated link score with a second preset threshold, and if the recalculated link score is greater than or equal to the second preset threshold, determining that the candidate entity and the entity to be put in storage are the same author.
In an alternative embodiment, the first preset threshold may be preset, for example, the first preset threshold may be set to 5.
In this alternative embodiment, the second preset threshold may be preset, for example, the second preset threshold may be set to 6.
In this optional embodiment, the teacher score is determined according to a comparison result between the teacher of the entity to be warehoused and the teacher of the candidate entity, for example, if there is a common teacher between the entity to be warehoused and the candidate entity, the teacher score is 1, and if there are two, the teacher score is 2.
In this optional embodiment, the degree score is determined according to a comparison result between the degree of the entity to be put in storage and the degree of the candidate entity, for example, if the degrees of the entity to be put in storage and the candidate entity are both scholars, the degree score is 1 score, and if the degrees of the entity to be put in storage and the candidate entity are both scholars and masters, the degree score is 2 score.
It can be seen that, in this optional embodiment, when the link score is smaller than the first preset threshold, it is determined whether the entity to be put in storage is a academic paper, if so, the link score is recalculated according to the unit score, the research field score, the partnering relationship score, the tutor item score and the academic score of the entity to be put in storage and the candidate entity, and then it is determined whether the candidate entity and the entity to be put in storage are the same author according to the comparison result of the link score and the second preset threshold.
In this embodiment of the application, optionally, after the entity to be warehoused is connected with the candidate entity a, if the candidate entity a has a binding relationship with the entity C in the scientific research worker entity library before that, the link between the entity to be warehoused and the entity C may be implemented by the extraction module 201, the retrieval module 201, the comparison module 203, and the link module 204 of this embodiment of the application.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of an entity linking device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
a processor 302; and
the memory 301 is configured to store machine readable instructions, which when executed by the processor 302, perform the entity linking method disclosed in the embodiment of the present application.
According to the equipment, the entity matching with the author name of the entity to be put in storage is searched in the scientific research worker entity library through the author name of the entity to be put in storage by executing the entity linking method, the text information of the candidate entity is obtained, the text information of the candidate entity and the text information of the entity to be put in storage can be compared, the comparison result is determined, and therefore the entity to be put in storage and the candidate entity can be linked according to the comparison result.
Example four
The embodiment of the application discloses a storage medium, and the storage medium stores a computer program, wherein when the computer program is executed by a processor, the entity linking method disclosed in the embodiment of the application is executed.
The storage medium of the embodiment of the application searches entities matched with the author names of the entities to be put in storage in the scientific research worker entity library through the author names of the entities to be put in storage by executing the entity linking method, obtains text information of the candidate entities, and further can compare the text information of the candidate entities with the text information of the entities to be put in storage and determine a comparison result, so that the entities to be put in storage and the candidate entities can be linked according to the comparison result.
In the embodiments disclosed in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a positioning base station, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

Claims (8)

1. An entity linking method, the method comprising:
extracting text information of an entity to be put in storage, wherein the text information comprises an author name of the entity to be put in storage, and the entity to be put in storage represents a thesis to be put in storage;
searching an entity matched with the name of the author of the entity to be warehoused in a scientific research worker entity library at least according to the name of the author of the entity to be warehoused and obtaining text information of a candidate entity;
comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
when the comparison result represents that the entity to be put in storage and the candidate entity are the same author, linking the entity to be put in storage and the candidate entity;
and comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result, wherein the comparing comprises the following steps:
when the text information of the candidate entity and the text information of the entity to be put in storage do not have an intersection, calculating a link score of the candidate entity and the entity to be put in storage;
determining the comparison result according to the link scores of the candidate entity and the entity to be put in storage;
and calculating the link scores of the candidate entities and the entities to be put in storage according to the following formula:
Score=ω 0 U+ω 1 L+ω 2 T+D;
wherein, sore represents the link score, U represents the unit score between the entity to be put in storage and the candidate entity, L represents the research field score between the entity to be put in storage and the candidate entity, T represents the binding relationship score between the entity to be put in storage and the candidate entity, omega i I =0,1,2 represents the weighting coefficient of the unit score, the weighting coefficient of the study region score, the weighting coefficient of the literary relationship score, and D represents a correction coefficient.
2. The method of claim 1, wherein after the extracting the text information of the entity to be warehoused, before the searching an entity matched with the author name of the entity to be warehoused in a scientific research worker entity library according to at least the author name of the entity to be warehoused and obtaining the text information of the candidate entity, the method further comprises:
obtaining a name expansion set of the entity to be warehoused according to the author name of the entity to be warehoused;
and searching an entity matched with the author name of the entity to be warehoused in a scientific research worker entity library at least according to the author name of the entity to be warehoused and obtaining text information of a candidate entity: the method comprises the following steps:
and searching an entity matched with the author name of the entity to be warehoused in the scientific research worker entity library according to the author name and the name extension set of the entity to be warehoused, and obtaining text information of the candidate entity.
3. The method of claim 1, wherein comparing the text information of the candidate entity with the text information of the entity to be warehoused and determining a comparison result comprises:
and when the text information of the candidate entity and the text information of the entity to be warehoused have intersection, determining that the comparison result is that the candidate entity and the entity to be warehoused are the same author.
4. The method of claim 1, wherein said determining the comparison result according to the link scores of the candidate entity and the entity to be warehoused comprises:
comparing the link score with a first preset threshold;
and when the link score is greater than or equal to the first preset threshold value, determining that the candidate entity and the entity to be warehoused are the same author as each other according to the comparison result.
5. The method of claim 4, wherein the determining the comparison result according to the link scores of the candidate entity and the entity to be warehoused further comprises:
when the link score is smaller than the first preset threshold value, judging whether the entity to be put in storage is a academic paper, if so, recalculating the link score according to the unit score, the research field score, the literary relationship score, the tutor item score and the academic score of the entity to be put in storage and the candidate entity;
and comparing the recalculated link score with a second preset threshold, and if the recalculated link score is greater than or equal to the second preset threshold, determining that the comparison result is that the candidate entity and the entity to be put in storage are the same author.
6. An apparatus for physical linking, the apparatus comprising:
the system comprises an extraction module, a storage module and a storage module, wherein the extraction module is used for extracting text information of an entity to be stored, the text information comprises an author name of the entity to be stored, and the entity to be stored represents a thesis to be stored;
the retrieval module is used for retrieving an entity matched with the author name of the entity to be warehoused in a scientific research worker entity library at least according to the author name of the entity to be warehoused and obtaining text information of a candidate entity;
the comparison module is used for comparing the text information of the candidate entity with the text information of the entity to be put in storage and determining a comparison result;
the link module is used for linking the entity to be warehoused with the candidate entity when the comparison result represents that the entity to be warehoused and the candidate entity are the same author;
and the specific way for the comparison module to compare the text information of the candidate entity with the text information of the entity to be put in storage and determine the comparison result is as follows:
when the text information of the candidate entity and the text information of the entity to be put in storage do not have an intersection, calculating a link score of the candidate entity and the entity to be put in storage;
determining the comparison result according to the link scores of the candidate entity and the entity to be put in storage;
and calculating the link scores of the candidate entities and the entities to be put in storage according to the following formula:
Score=ω 0 U+ω 1 L+ω 2 T+D;
wherein, sore represents the link score, U represents the unit score between the entity to be put in storage and the candidate entity, L represents the research field score between the entity to be put in storage and the candidate entity, T represents the binding relationship score between the entity to be put in storage and the candidate entity, omega i I =0,1,2 represents the weighting coefficient of the unit score, the weighting coefficient of the study region score, the weighting coefficient of the literary relationship score, and D represents a correction coefficient.
7. An entity linking device, characterized in that the device comprises:
a processor; and
a memory configured to store machine readable instructions that, when executed by the processor, perform the entity linking method of any of claims 1-5.
8. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, performs the entity linking method according to any one of claims 1 to 5.
CN202010298036.8A 2020-04-16 2020-04-16 Entity linking method, device, equipment and storage medium Active CN111522911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010298036.8A CN111522911B (en) 2020-04-16 2020-04-16 Entity linking method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010298036.8A CN111522911B (en) 2020-04-16 2020-04-16 Entity linking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111522911A CN111522911A (en) 2020-08-11
CN111522911B true CN111522911B (en) 2023-04-14

Family

ID=71903621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010298036.8A Active CN111522911B (en) 2020-04-16 2020-04-16 Entity linking method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111522911B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012046906A1 (en) * 2010-10-07 2012-04-12 한국과학기술정보연구원 Device and method for providing resource search information on marked correlations between research subjects using a knowledge base from a combination of multiple resources
US8315849B1 (en) * 2010-04-09 2012-11-20 Wal-Mart Stores, Inc. Selecting terms in a document
CN106294677A (en) * 2016-08-04 2017-01-04 浙江大学 A kind of towards the name disambiguation method of China author in english literature

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993534B2 (en) * 2002-05-08 2006-01-31 International Business Machines Corporation Data store for knowledge-based data mining system
US7788084B2 (en) * 2006-09-19 2010-08-31 Xerox Corporation Labeling of work of art titles in text for natural language processing
CN105045826A (en) * 2015-06-29 2015-11-11 华东师范大学 Entity linkage algorithm based on graph model
CN105740387B (en) * 2016-01-27 2019-04-05 北京工业大学 A kind of scientific and technical literature recommended method based on author's frequent mode
CN106202382B (en) * 2016-07-08 2019-06-14 南京柯基数据科技有限公司 Link instance method and system
CN109857793A (en) * 2018-12-28 2019-06-07 考拉征信服务有限公司 Processing method, device, electronic equipment and the storage medium of technical background data
CN110362692A (en) * 2019-07-23 2019-10-22 中南大学 A kind of academic circle construction method of knowledge based map

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8315849B1 (en) * 2010-04-09 2012-11-20 Wal-Mart Stores, Inc. Selecting terms in a document
WO2012046906A1 (en) * 2010-10-07 2012-04-12 한국과학기술정보연구원 Device and method for providing resource search information on marked correlations between research subjects using a knowledge base from a combination of multiple resources
CN106294677A (en) * 2016-08-04 2017-01-04 浙江大学 A kind of towards the name disambiguation method of China author in english literature

Also Published As

Publication number Publication date
CN111522911A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
US7313515B2 (en) Systems and methods for detecting entailment and contradiction
US20030163302A1 (en) Method and system of knowledge based search engine using text mining
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
EP1669896A2 (en) A machine learning system for extracting structured records from web pages and other text sources
CN112650840A (en) Intelligent medical question-answering processing method and system based on knowledge graph reasoning
US20080215541A1 (en) Techniques for searching web forums
US9619481B2 (en) Method and apparatus for generating ordered user expert lists for a shared digital document
CN108595525B (en) Lawyer information processing method and system
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
CN111783428B (en) Emergency management objective question automatic generation system based on deep learning
CN108681548B (en) Lawyer information processing method and system
Wei et al. Table extraction for answer retrieval
CN108681977B (en) Lawyer information processing method and system
CN114706972A (en) Unsupervised scientific and technical information abstract automatic generation method based on multi-sentence compression
US20140089246A1 (en) Methods and systems for knowledge discovery
US11409814B2 (en) Systems and methods for crawling web pages and parsing relevant information stored in web pages
JP4873739B2 (en) Text multiple topic extraction apparatus, text multiple topic extraction method, program, and recording medium
Adilaksa et al. Recommendation System for Elective Courses using Content-based Filtering and Weighted Cosine Similarity
Singh et al. Question answering chatbot using deep learning with NLP
KR101429621B1 (en) Duplication news detection system and method for detecting duplication news
CN117216221A (en) Intelligent question-answering system based on knowledge graph and construction method
CN111522911B (en) Entity linking method, device, equipment and storage medium
CN111191045A (en) Entity alignment method and system applied to knowledge graph
Ye et al. Feature extraction of travel destinations from online Chinese-language customer reviews
Indra et al. A Hybrid Information Retrieval for Indonesian Translation of Quran by Using Single Pass Clustering Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant