CN113360665A - Method and system for associating knowledge base document and knowledge graph entity - Google Patents

Method and system for associating knowledge base document and knowledge graph entity Download PDF

Info

Publication number
CN113360665A
CN113360665A CN202110601045.4A CN202110601045A CN113360665A CN 113360665 A CN113360665 A CN 113360665A CN 202110601045 A CN202110601045 A CN 202110601045A CN 113360665 A CN113360665 A CN 113360665A
Authority
CN
China
Prior art keywords
entity
candidate
text
list
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110601045.4A
Other languages
Chinese (zh)
Inventor
何吉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Zhiyan Huijia Technology Co ltd
Original Assignee
Wuxi Zhiyan Huijia Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Zhiyan Huijia Technology Co ltd filed Critical Wuxi Zhiyan Huijia Technology Co ltd
Priority to CN202110601045.4A priority Critical patent/CN113360665A/en
Publication of CN113360665A publication Critical patent/CN113360665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method and a system for associating knowledge base documents and knowledge graph entities, wherein the method comprises the following steps: performing entity identification on the text to obtain an entity list; searching in a knowledge graph library according to entities in the entity list to obtain at least one candidate entity; respectively calculating the similarity of the first characteristic information of the text and the second characteristic information of each candidate entity and at least one associated node of the candidate entities, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity; and associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold. The method and the device can effectively improve the accuracy and the recall rate of entity association.

Description

Method and system for associating knowledge base document and knowledge graph entity
Technical Field
The invention relates to the field of knowledge graphs, in particular to a method and a system for associating knowledge base documents and knowledge graph entities.
Background
With the rise and rapid development of the internet, knowledge engineering and artificial intelligence, text data is explosively increased, and people urgently need an efficient and intelligent text analysis technology to understand the real meaning of the data, so that people or organizations are helped to quickly acquire useful information. The entity association technology is a text analysis technology, which associates words or phrases appearing in text data as entities with corresponding entity IDs in a knowledge graph library. Therefore, people can understand the real meaning of the text data through entity association, and great convenience is provided for people to understand semantic information of the text data.
The main method of the current entity association is to calculate the similarity of the entities in the text and the context semantic vectors of the text entities and the attribute vectors of the candidate entities in the map, rank the similarity values, associate the similarity values with the knowledge base entities if the similarity values exceed a threshold value, and otherwise, do not associate. One problem with this approach is that if the context description information for some entity names in the knowledge base document has a low correlation with the entity attributes in the graph, but has a high correlation with other information, such as a relationship node, a first degree relationship, a second degree relationship, etc., it cannot be correlated with the entity ID in the graph, resulting in a low accuracy and recall rate of entity correlation.
For example, the following text:
in the day ago, famous singers korea hanyamin appeared together with octopine on their own initiated tibet public welfare activity release meetings. It is known that in the beginning of the next month, Hanhong, as many as hundreds of love people and medical experts form a love fleet of rescue volunteers for 20 days of public service travel.
And (3) carrying out entity recognition on the text to recognize the name of a person: korean red, Yaoming and Zhangzi Yi, and the three names are the names of entities to be linked. The context semantics related to the chapter yi are all commonwise related, but the entity chapter yi stored in the knowledge graph is all film-television related in attribute description, and when the similarity of the semantic vector and the entity attribute is calculated, the score is very low, and the chapter yi cannot be linked. But the chapter yi has a one-degree relationship node which is a charity emissary, so that the chapter yi in the article and the chapter yi in the knowledge base can be linked through calculation.
Disclosure of Invention
Aiming at the technical problems, the invention provides a method and a system for associating knowledge base documents and knowledge graph entities, which can improve the accuracy and recall rate of entity association.
The technical scheme for solving the technical problems is as follows:
in a first aspect, the present invention provides a method for associating knowledge base documents with knowledge-graph entities, comprising:
performing entity identification on the text to obtain an entity list;
searching in a knowledge graph library according to the entities in the entity list to obtain at least one candidate entity;
respectively calculating the similarity of the first characteristic information of the text and the second characteristic information of each candidate entity and at least one associated node of the candidate entities, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
and associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
The invention has the beneficial effects that:
the similarity is calculated by fully utilizing the characteristic information of the text and the characteristic information of the candidate entity and the associated node searched according to the entity in the text, so that the accuracy and the recall rate of entity association are effectively improved.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the first feature information is a sum of word vectors of feature words of the text, and the second feature information is a sum of word vectors of node names and attributes.
Further, the position of the entity in the entity list in the document of the knowledge base is inquired, and a position list corresponding to the entity is obtained.
Further, the format of the entity of the document of the knowledge base in the position of the position list is emphasized.
In a second aspect, the present invention further provides a system for associating knowledge base documents with knowledge-graph entities, comprising:
the entity identification module is used for carrying out entity identification on the text to obtain an entity list;
the candidate entity searching module is used for searching in a knowledge spectrum library according to the entities in the entity list to obtain at least one candidate entity;
the similarity calculation module is used for calculating the similarity between the first feature information of the text and the second feature information of each candidate entity and at least one associated node of the candidate entities respectively, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
and the entity association module is used for associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
Further, the first feature information is a sum of word vectors of feature words of the text, and the second feature information is a sum of word vectors of node names and attributes.
Further, still include:
and the position query module is used for querying the position of the entity in the entity list in the document of the knowledge base to obtain a position list corresponding to the entity.
Further, still include:
and the format processing module is used for emphasizing the format of the entity of the document of the knowledge base in the position of the position list.
In a third aspect, the present invention also provides an electronic device, including: the device comprises a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory are communicated through the bus when the electronic device runs, and the processor executes the machine-readable instructions to execute the steps of the method.
In a fourth aspect, the present invention also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program performs the steps of the method.
Drawings
FIG. 1 is a flow chart of a method for associating knowledge-base documents with knowledge-graph entities according to an embodiment of the present invention;
FIG. 2 is a block diagram of a system for associating knowledge base documents with knowledge-graph entities according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a computing device according to an embodiment of the invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a method for associating knowledge-base documents and knowledge-graph entities according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s1, performing entity recognition on the text to obtain an entity list;
specifically, the text is a section of text in a knowledge base document, and the CRF entity recognition model is used for performing entity recognition on the knowledge base document, recognizing entities such as names of people and objects, and obtaining an entity list of the text.
S2, searching in a knowledge spectrum library according to the entities in the entity list to obtain at least one candidate entity;
as known to those skilled in the art, a knowledge-graph is composed of entities (nodes) and entity relationships (edges), where the entities have descriptive information such as names, attributes, and the like. Entity relationships also have names and attributes, and have directions.
S3, respectively calculating the similarity between the first feature information of the text and the second feature information of each candidate entity and at least one associated node of the candidate entities, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
specifically, the first feature information may be a sum of word vectors of feature words of a text where the entity is located, and the specific description is as follows:
segmenting text and calculating each wordThe word frequency of the words (the number of occurrences of a word divided by the total number of words in the document) is ranked from high to low according to the word frequency score. And obtaining Top n vocabularies before ranking as text characteristic words. Adding the word vectors of the determined n feature words:
Figure BDA0003092986410000051
wherein ViThe word vector of the ith word is represented, and textVec represents the text abstract vector to be processed, namely the first characteristic information. The word vector can be obtained by training Chinese pre-training based on encyclopedic data by using FastText (fast text classification algorithm), and the dimension of the word vector is 300 dimensions, the same below.
The associated nodes are nodes such as first degree relationship nodes and second degree relationship nodes which have an associated relationship with the candidate entity in the knowledge graph, the second characteristic information can be the sum of word vectors of node names and attributes, the similarity of the first characteristic information of the text and the candidate entity nodes and the second characteristic information of the first degree relationship nodes and the second degree relationship nodes are calculated and weighted and summed respectively, and the total similarity of the candidate entity can be obtained, and the method specifically comprises the following steps:
1) and carrying out similarity calculation according to sentences. Dividing documents in a knowledge base according to periods, aiming at sentences where entities are located, then according to the sentence number, dividing the sentences to obtain word vectors of the documents, adding the word vectors to form senVec, obtaining word vectors of candidate entity node names and attributes, adding the word vectors to form attrVec, and then calculating cosine similarity of the vectors by using the senVec and the attrVec:
Figure BDA0003092986410000052
| x | | represents the norm of the vector x, giving the score senScore.
2) And acquiring the candidate entity node name and attribute and the word vectors of the one-degree relation node name and attribute of the candidate entity node name and attribute, adding the candidate entity node name and attribute and the word vectors to form firstRelVec, and performing similarity calculation on the firstRelVec and the text abstract textVec to obtain a score firstRelScore.
3) And acquiring the names and the attributes of the candidate nodes and the word vectors of the names and the attributes of the two-degree relation nodes of the candidate nodes, adding the candidate nodes and the attributes to form a secondrelVec, and calculating the similarity of the vectors and the text abstract textVec to obtain a score secondrelScore.
4) The scores of the candidate nodes searched by each entity are respectively set with different weights, and the weights are configurable and then summed.
And S4, associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
Specifically, if the number of candidate entities searched in the knowledge map library in step S2 is greater than one, feature matching and semantic calculation need to be performed according to step S3, and the maximum total similarity is determined, so as to find the best matching candidate entity. And further judging whether the maximum similarity reaches an association threshold, if so, associating, and returning the entity ID of the candidate entity, namely doc _ ID, if not, not associating.
If only one matched entity is searched, the total similarity is directly calculated through the step S3, whether the correlation threshold is reached is judged, if so, the correlation is carried out, and doc _ id is returned, and the doc _ id is not reached and is not associated.
The method for associating the knowledge base document with the knowledge graph entity provided by the embodiment of the invention can extract effective characteristics, fully utilize the entity existing in the text, the sentence where the entity is located, the text abstract and the entity in the graph, the entity attribute, the first-degree relation and the relation entity and the second-degree relation and the correlation degree of the relation entity, and effectively improve the accuracy and the recall rate of entity association.
The existing entity association method has another problem that the entity in the document is associated with the knowledge graph, but the position of the associated entity in the document is required to be obtained, so that the entity cannot be directly obtained, and particularly when the number of pages of the document is too large. To address this issue, optionally, in this embodiment, the method further includes:
s5, inquiring the position of the entity in the entity list in the document of the knowledge base to obtain a position list corresponding to the entity.
Specifically, the position of the entity in the document of the knowledge base may be the page number of the entity, and in this embodiment, an Elasticsearch engine may be used to query the page number of the entity in the document of the knowledge base, so as to obtain a page number list of all the page numbers where the entity appears. Thus, when the doc _ id of the entity is returned, the page number list corresponding to the doc _ id can be further returned.
In order to further facilitate the user to quickly view the association information of the entity in real time, optionally, in this embodiment, the method further includes:
and S6, emphasizing the format of the entity of the document of the knowledge base in the position of the position list.
Specifically, according to the page list corresponding to doc _ id, emphasis processing such as thickening and highlighting can be performed on the format of the entity in the document page content, so that the document entity corresponding to the entity link can be found quickly and conveniently.
The following illustrates the principles of the present invention, for example, the following text processes:
"the famous singer hanyaoming appears together with octogen on the tibetan public welfare event release party initiated by himself. It is known that in the beginning of the next month, Hanhong, as many as hundreds of love people and medical experts form a love fleet of rescue volunteers for 20 days of public service travel. "
The text above presses first ". "split into two sentences
Sentence 1: "the famous singer hanyaoming appears together with octogen on the tibetan public welfare event release party initiated by himself. "
Sentence 2: "it is known that Korean red will be combined with hundreds of loved persons and medical experts to form a love fleet of recovering volunteers for 20 days of public interest in the beginning of the next month. "
And aiming at the fact that the entity is a sentence, performing word segmentation on the sentence, acquiring word vectors of all words through FastText, and adding the word vectors to form a sentence vector senVec. Acquiring word vectors of node names and attributes of candidate entities of Korean red, Yaoming and octopus, adding the word vectors to form attrVec, and then calculating vector similarity by using senVec and attrVec:
Figure BDA0003092986410000071
| x | | represents the norm of vector x to obtain the scoresenScore。
And then acquiring the node names and attributes of the candidate entities of Korean red, Yaoming and octoyi and word vectors of the first-degree relation node names and attributes of the candidate entities, adding the node names and attributes to form firstLeVec, and performing similarity calculation with the text abstract textVec, wherein the formula is the same as the formula above, so as to obtain the score firstLeScore.
And acquiring the node names and attributes of the candidate entities of Korean, Yaming and Octope and word vectors of the two-degree relation node names and attributes of the candidate entities, adding the word vectors to form secondRelVec, and calculating the vector similarity with the text abstract textVec, wherein the formula is the same as the formula, so as to obtain the score secondRelScore.
The calculated scores are weighted differently and are configurable, and if the score of the one-degree relationship is more weighted, the weight of firstDelScore is set higher, assuming 0.7, the remaining score weight senScore is 0.2, and secondreScore is 0.1, and the scores are multiplied by the weights and summed sum. And comparing sum of each entity with a set threshold, if the sum is greater than the threshold, associating, and obtaining a page list of associated entities according to the page corresponding to each entity obtained from the elastic search.
Fig. 2 is a block diagram of a system for associating knowledge base documents and knowledge graph entities according to an embodiment of the present invention, where functional principles of various modules in the system have been described in the foregoing method embodiment, and are not described in detail below.
As shown in fig. 2, the system includes:
the entity identification module is used for carrying out entity identification on the text to obtain an entity list;
the candidate entity searching module is used for searching in a knowledge spectrum library according to the entities in the entity list to obtain at least one candidate entity;
the similarity calculation module is used for calculating the similarity between the first feature information of the text and the second feature information of each candidate entity and at least one associated node of the candidate entities respectively, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
and the entity association module is used for associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
Optionally, in this embodiment, the first feature information is a sum of word vectors of feature words of the text, and the second feature information is a sum of word vectors of node names and attributes.
Optionally, in this embodiment, the system further includes:
and the position query module is used for querying the position of the entity in the entity list in the document of the knowledge base to obtain a position list corresponding to the entity.
Optionally, in this embodiment, the system further includes:
and the format processing module is used for emphasizing the format of the entity of the document of the knowledge base in the position of the position list.
FIG. 3 is a schematic diagram illustrating a computing device according to an exemplary embodiment of the present invention.
Referring to fig. 3, computing device 300 includes memory 310 and processor 320.
The Processor 320 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 310 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 320 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 310 may comprise any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 310 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 310 has stored thereon executable code that, when processed by the processor 320, may cause the processor 320 to perform some or all of the methods described above.
The aspects of the invention have been described in detail hereinabove with reference to the drawings. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. Those skilled in the art should also appreciate that the acts and modules referred to in the specification are not necessarily required by the invention. In addition, it can be understood that the steps in the method according to the embodiment of the present invention may be sequentially adjusted, combined, and deleted according to actual needs, and the modules in the device according to the embodiment of the present invention may be combined, divided, and deleted according to actual needs.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out some or all of the steps of the above-described method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform part or all of the various steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method for associating knowledge base documents with knowledge-graph entities, comprising:
performing entity identification on the text to obtain an entity list;
searching in a knowledge graph library according to the entities in the entity list to obtain at least one candidate entity;
respectively calculating the similarity of the first characteristic information of the text and the second characteristic information of each candidate entity and at least one associated node of the candidate entities, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
and associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
2. The method according to claim 1, wherein the first feature information is a sum of word vectors of feature words of the text, and the second feature information is a sum of word vectors of node names and attributes.
3. The method of claim 1 or 2, further comprising:
and inquiring the position of the entity in the entity list in the document of the knowledge base to obtain a position list corresponding to the entity.
4. The method of claim 3, further comprising:
and emphasizing the format of the entity of the document of the knowledge base in the position of the position list.
5. A system for associating knowledge base documents with knowledge-graph entities, comprising:
the entity identification module is used for carrying out entity identification on the text to obtain an entity list;
the candidate entity searching module is used for searching in a knowledge spectrum library according to the entities in the entity list to obtain at least one candidate entity;
the similarity calculation module is used for calculating the similarity between the first feature information of the text and the second feature information of each candidate entity and at least one associated node of the candidate entities respectively, and performing weighted calculation on the calculated similarities according to corresponding weights to obtain the total similarity corresponding to each candidate entity;
and the entity association module is used for associating the entity with the candidate entity corresponding to the maximum total similarity exceeding the threshold value.
6. The system according to claim 5, wherein the first feature information is a sum of word vectors of feature words of the text, and the second feature information is a sum of word vectors of node names and attributes.
7. The system of claim 5 or 6, further comprising:
and the position query module is used for querying the position of the entity in the entity list in the document of the knowledge base to obtain a position list corresponding to the entity.
8. The system of claim 7, further comprising:
and the format processing module is used for emphasizing the format of the entity of the document of the knowledge base in the position of the position list.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method of any of claims 1 to 4.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method according to any one of claims 1 to 4.
CN202110601045.4A 2021-05-31 2021-05-31 Method and system for associating knowledge base document and knowledge graph entity Pending CN113360665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601045.4A CN113360665A (en) 2021-05-31 2021-05-31 Method and system for associating knowledge base document and knowledge graph entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601045.4A CN113360665A (en) 2021-05-31 2021-05-31 Method and system for associating knowledge base document and knowledge graph entity

Publications (1)

Publication Number Publication Date
CN113360665A true CN113360665A (en) 2021-09-07

Family

ID=77530391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601045.4A Pending CN113360665A (en) 2021-05-31 2021-05-31 Method and system for associating knowledge base document and knowledge graph entity

Country Status (1)

Country Link
CN (1) CN113360665A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417845A (en) * 2022-03-30 2022-04-29 支付宝(杭州)信息技术有限公司 Identical entity identification method and system based on knowledge graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283593A1 (en) * 2015-03-23 2016-09-29 Microsoft Technology Licensing, Llc Salient terms and entities for caption generation and presentation
CN110188168A (en) * 2019-05-24 2019-08-30 北京邮电大学 Semantic relation recognition methods and device
CN111159423A (en) * 2019-12-27 2020-05-15 北京明略软件系统有限公司 Entity association method, device and computer readable storage medium
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities
CN112633000A (en) * 2020-12-25 2021-04-09 北京明略软件系统有限公司 Method and device for associating entities in text, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283593A1 (en) * 2015-03-23 2016-09-29 Microsoft Technology Licensing, Llc Salient terms and entities for caption generation and presentation
CN112585596A (en) * 2018-06-25 2021-03-30 易享信息技术有限公司 System and method for investigating relationships between entities
CN110188168A (en) * 2019-05-24 2019-08-30 北京邮电大学 Semantic relation recognition methods and device
CN111159423A (en) * 2019-12-27 2020-05-15 北京明略软件系统有限公司 Entity association method, device and computer readable storage medium
CN112633000A (en) * 2020-12-25 2021-04-09 北京明略软件系统有限公司 Method and device for associating entities in text, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114417845A (en) * 2022-03-30 2022-04-29 支付宝(杭州)信息技术有限公司 Identical entity identification method and system based on knowledge graph

Similar Documents

Publication Publication Date Title
CN106599278B (en) Application search intention identification method and device
WO2018049960A1 (en) Method and apparatus for matching resource for text information
CN110321537B (en) Method and device for generating file
US20220277038A1 (en) Image search based on combined local and global information
CN112364624B (en) Keyword extraction method based on deep learning language model fusion semantic features
WO2021146388A1 (en) Systems and methods for providing answers to a query
CN110019669B (en) Text retrieval method and device
CN111428506B (en) Entity classification method, entity classification device and electronic equipment
US11227183B1 (en) Section segmentation based information retrieval with entity expansion
CN110728135B (en) Text theme indexing method and device, electronic equipment and computer storage medium
CN113032584A (en) Entity association method, entity association device, electronic equipment and storage medium
Renjit et al. CUSAT NLP@ AILA-FIRE2019: Similarity in Legal Texts using Document Level Embeddings.
Blanco et al. Overview of NTCIR-13 Actionable Knowledge Graph (AKG) Task.
JP6340351B2 (en) Information search device, dictionary creation device, method, and program
CN113360665A (en) Method and system for associating knowledge base document and knowledge graph entity
US9087293B2 (en) Categorizing concept types of a conceptual graph
CN114238744A (en) Data processing method, device and equipment
CN112818206A (en) Data classification method, device, terminal and storage medium
Jamil et al. A subject identification method based on term frequency technique
US7849037B2 (en) Method for using the fundamental homotopy group in assessing the similarity of sets of data
KR102028155B1 (en) Document scoring method and document searching system
JP2003263441A (en) Keyword determination database preparing method, keyword determining method, device, program and recording medium
CN112417154B (en) Method and device for determining similarity of documents
CN113139383A (en) Document sorting method, system, electronic equipment and storage medium
CN113515940B (en) Method and equipment for text search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination