CN110929038A - Entity linking method, device, equipment and storage medium based on knowledge graph - Google Patents

Entity linking method, device, equipment and storage medium based on knowledge graph Download PDF

Info

Publication number
CN110929038A
CN110929038A CN201910992304.3A CN201910992304A CN110929038A CN 110929038 A CN110929038 A CN 110929038A CN 201910992304 A CN201910992304 A CN 201910992304A CN 110929038 A CN110929038 A CN 110929038A
Authority
CN
China
Prior art keywords
entity
word segmentation
entity reference
legal
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910992304.3A
Other languages
Chinese (zh)
Other versions
CN110929038B (en
Inventor
陈晨
雷骏峰
刘嘉伟
于修铭
李可
汪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910992304.3A priority Critical patent/CN110929038B/en
Publication of CN110929038A publication Critical patent/CN110929038A/en
Priority to PCT/CN2020/111240 priority patent/WO2021073254A1/en
Application granted granted Critical
Publication of CN110929038B publication Critical patent/CN110929038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of big data, in particular to an entity linking method, device, equipment and storage medium based on a knowledge graph. The method comprises the following steps: performing word segmentation on the legal text to obtain word segmentation results, searching whether entity expression identical to the word segmentation results exists, if so, putting the entity expression into an entity expression set, and putting the entity into a candidate entity set; respectively calculating the relevance scores and the correlation scores, and adding the relevance scores and the corresponding correlation scores to obtain a target function; and in the entity reference set, determining the entity reference with the maximum objective function value as a final entity reference, and linking the final entity reference to the corresponding entity in the legal knowledge graph. According to the method, the final entity reference is determined by calculating the association score of the entity reference and the correlation score between the candidate entities, and the entity references are linked, so that the phenomena of synonyms and word ambiguity existing in legal texts are avoided.

Description

Entity linking method, device, equipment and storage medium based on knowledge graph
Technical Field
The invention relates to the technical field of big data, in particular to an entity linking method, device, equipment and storage medium based on a knowledge graph.
Background
Knowledge maps express the information of the internet into a form closer to the human cognitive world, and provide the capability of better organizing, managing and understanding the mass information of the internet. The knowledge graph brings vitality to the internet semantic search, simultaneously shows powerful power in intelligent question answering, big data analysis and decision making, and becomes an infrastructure of the internet knowledge-based intelligent service. The knowledge map, big data and deep learning are one of the core driving forces for promoting the development of artificial intelligence. In the knowledge graph, each node represents an entity existing in the real world, each edge is a relation between the entities, and the knowledge graph is the most effective representation mode of the relation.
The construction of the legal knowledge map plays an important role in integrating legal knowledge, mining legal hotspots, predicting legal events, constructing a legal field expert system and the like. Due to the complex knowledge system of law, the system is a combination of various logics. The legal documents contain a large number of entities, such as original reports, announcements, dispute focuses, fact elements, legal springs and the like, which are very important for links such as case information extraction, legal information retrieval and the like, but synonyms and word ambiguity phenomena generally exist in Chinese languages, so that how to find out the entities in the legal documents by using a proper natural language processing technology and link the entities to correct entities in a legal knowledge graph becomes very important.
Disclosure of Invention
In view of the above, there is a need to provide a method, an apparatus, a device and a storage medium for entity linking based on a knowledge graph, which addresses the problem of how to correctly link an entity in a complex legal document to a legal knowledge graph.
A knowledge-graph-based entity linking method comprises the following steps:
obtaining a legal text, segmenting words of the legal text to obtain a word segmentation result, searching whether entity reference identical to the word segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the word segmentation result into an entity reference set, and putting an entity corresponding to the entity reference identical to the word segmentation result into a candidate entity set, wherein the entity reference is a name of the entity, and one entity reference corresponds to a plurality of entities;
calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions;
and in the entity reference set, determining the entity reference with the maximum objective function value as a final entity reference, and linking the final entity reference to a corresponding entity in the legal knowledge graph.
In one possible design, the obtaining a legal text and performing word segmentation on the legal text to obtain a word segmentation result includes:
and performing word segmentation on the obtained legal text to obtain a plurality of words as word segmentation results, wherein the minimum word segmentation sliding window is a preset minimum word segmentation threshold value when performing word segmentation, and the maximum word segmentation sliding window is the length of the legal text.
In one possible design, the mapping table is a mapping relationship table between an entity reference and an entity in a preset legal knowledge base, and includes:
acquiring a legal referee document in a preset website through a preset crawler script;
deconstructing the content of each legal referee document to obtain node content, wherein the node content comprises but is not limited to original reports, announcements, dispute focuses and evidences;
establishing a relation between the entity and the attribute of the node content to obtain a legal knowledge graph;
and establishing a mapping relation between each entity in the legal knowledge graph and the entity in a preset mapping relation table to obtain an updated mapping relation table.
In one possible design, the calculating an association score between each entity reference in the set of entity references and the corresponding candidate entity includes:
the relevance score is obtained by multiplying a context-independent score and a context-dependent score;
the context-free fraction sim (m, e) is obtained by using the following calculation formula:
Figure BDA0002238651470000031
wherein m is an entity reference, e is one of the candidate entities in the candidate entity set corresponding to the entity reference, | m | and | e | respectively represent the string lengths of m and e, ed (m, e) is a distance formula, which means the minimum number of editing operations required for converting one character string into another between two character strings, wsIs a preset coefficient;
vectorizing the context referred by the entity and the attributes of the candidate entities, and determining the context correlation score by calculating the distance between the two vectors.
In one possible design, the determining the context correlation score by calculating a distance between two vectors includes:
obtaining the context correlation score by calculating the cosine distance of the two vectors, wherein the calculation formula of the cosine distance is as follows:
Figure BDA0002238651470000032
wherein the content of the first and second substances,
Figure BDA0002238651470000033
which represents two vectors of the vector(s),
Figure BDA0002238651470000034
representing the vector modulo length.
In one possible design, the calculating a correlation score of each entity referring to any two of all the corresponding candidate entities includes:
a correlation score sim (e) between two of said candidate entities1,e2) The calculation formula of (2) is as follows:
Figure BDA0002238651470000035
wherein e is1、e2Representing two of said candidate entities, E1Is represented by1Set of directly connected entities, E2Is represented by2Set of directly connected entities, | E1I denotes E1Number of mesoentities, | E2I denotes E2Number of intermediate entities, E1∩E2Representing the intersection of the two sets, | E | representing the number of all entities in the legal knowledge graph.
In one possible design, the adding the relevance scores and the corresponding correlation scores to obtain a plurality of objective functions includes:
the objective function
Figure BDA0002238651470000041
The calculation formula of (2) is as follows:
Figure BDA0002238651470000042
wherein phi (m)i,ei) For the relevance score, coh (e)i,ej) Is the correlation score between two of the candidate entities.
A knowledge-graph based entity linking apparatus, comprising:
the system comprises a word segmentation and search module, a word segmentation and search module and a word segmentation and search module, wherein the word segmentation and search module is used for acquiring a legal text, segmenting the legal text to obtain a word segmentation result, searching whether an entity reference identical to the word segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the word segmentation result into an entity reference set, and putting an entity corresponding to the entity reference identical to the word segmentation result into a candidate entity set, wherein the entity reference is a name of an entity, and one entity reference corresponds to a plurality of entities;
the calculation module is used for calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions;
and the determining and linking module is used for determining the entity reference with the maximum objective function value as the final entity reference in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph.
A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of the above-described knowledge-graph based entity linking method.
A storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the above-described knowledge-graph based entity linking method.
The entity linking method, the device, the equipment and the storage medium based on the knowledge graph comprise the steps of obtaining a legal text, performing word segmentation on the legal text to obtain a word segmentation result, searching whether an entity reference identical to the word segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the word segmentation result into an entity reference set, and putting an entity corresponding to the entity reference identical to the word segmentation result into a candidate entity set; calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions; and in the entity reference set, determining the entity reference with the maximum objective function value as a final entity reference, and linking the final entity reference to a corresponding entity in the legal knowledge graph. According to the method, the final entity reference is determined by calculating the association score of the entity reference and the correlation score between the candidate entities, and the entity references are linked, so that the phenomena of synonyms and word ambiguity existing in legal texts are avoided. After the entity reference is linked to the legal knowledge map, the entity link can help the machine to really understand semantic information of legal entities in the free text and effectively perform tasks such as follow-up case retrieval, evidence guidance, intelligent question answering and the like.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a flow diagram of a method for knowledge-graph based entity linking in one embodiment of the invention;
FIG. 2 is a flowchart of step S1 according to an embodiment of the present invention;
FIG. 3 is a block diagram of a knowledge-graph based entity linking device in accordance with an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Fig. 1 is a flowchart of an entity linking method based on a knowledge-graph in an embodiment of the present invention, and as shown in fig. 1, the entity linking method based on the knowledge-graph includes the following steps:
step S1, word segmentation and search: the method comprises the steps of obtaining a legal text, segmenting the legal text to obtain a segmentation result, searching whether entity reference identical to the segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the segmentation result into an entity reference set, putting an entity corresponding to the entity reference identical to the segmentation result into a candidate entity set, wherein the entity reference is a name of the entity, and one entity reference corresponds to a plurality of entities.
Because the daily written text may be referred to by short names or names, for example, the word "apple" or "apple company" is used to refer to the specific noun of "apple company", these short names or names such as "apple" or "apple company" are referred to as entities, some specific nouns are entities, and a preset mapping table is obtained between the entity and the entity according to the corresponding relationship between words.
The legal text in this step is a sentence or a text segment input by the user, and the entity designation is identified according to the input legal text. When the entity designation identification is carried out, firstly, the legal text is segmented, a sentence or a segment of text is segmented into a plurality of words, the words are compared with a mapping table to obtain the entity designation and the entity corresponding to the entity designation, and the entity designation and the corresponding entity obtained after the reading and the searching are classified and put into an entity designation set and a candidate entity set.
Wherein, the entity designation set is written as: m ═ M1,m2,…,mNAnd m refers to an entity with a word segmentation result in a mapping table. The set of candidate entities is written as: ei={ei1,ei2,…,eikI ═ 1,2, …, N), e refers to the entity to which the corresponding entity is referred in the mapping table.
In one embodiment, in step S1, obtaining the legal text, and performing word segmentation on the legal text to obtain a word segmentation result, including:
and performing word segmentation on the obtained legal text to obtain a plurality of words as word segmentation results, wherein the minimum word segmentation sliding window is a preset minimum word segmentation threshold value when performing word segmentation, and the maximum word segmentation sliding window is the length of the legal text.
For example, if the legal text content input is "apple lam sold by apple company", the preset minimum participle threshold is 2, and the maximum participle sliding window is 10, then: the segmentation result obtained when the window size is 2 is apple, company, sale, yes, apple and apple chan, the segmentation result obtained when the window size is 3 is apple, company, sale, apple chan, and the segmentation result obtained until the window size is 10 is apple chant. The "apple", "guotong", "company", "sold", "what is", "apple chan", "apple gong", "fruit company", "company sale", "what is sold", "apple chan", "…" apple chan "are all the participle results obtained by the present embodiment.
Each word in the word segmentation result is inquired in a mapping table, whether the same entity refers to the word, when two entities, namely ' apple ' and ' apple company ', exist in the mapping table, the entity refers to a set of the entities, namely ' apple ' and ' apple company ', the entity corresponding to the entity refers to the entity in the mapping table is searched, the entity corresponding to the ' apple ' can be obtained, namely ' apple (Rosaceae apple fruits), ' apple (apple products company), ' apple (Korea 2008. Consumer) and the like, and the set of the entities is a candidate entity set. There is also a corresponding set of candidate entities for the entity designation "apple Inc". At this time, M [ "apple", "apple company" ], E1 [ "apple (fruit of the genus malus of the family rosaceae)", "apple (apple products company)", "apple (korean 2008. consistency movie), … ]
The minimum participle threshold value range of the embodiment is greater than or equal to 2 and less than or equal to the legal text length. By means of the method, the obtained legal text is segmented, all possible words can be divided, and word omission is avoided.
In one embodiment, in step S1, the mapping table is a mapping relationship table between an entity reference and an entity in a preset legal knowledge base, as shown in fig. 2, and includes:
step S101, crawling data: and acquiring the legal referee documents in the preset website through the preset crawler script.
The step crawls legal referee documents about the legal aspect in each website disclosed in the network through a crawler technology. The specific crawling mode is as follows:
presetting a website list, wherein the website list comprises a plurality of websites of legal referee document contents; calling a browser kernel to send a webpage access request to a website in a website list in sequence, and waiting for feedback information sent by a website receiving the webpage access request, wherein the feedback information comprises feedback information for receiving access and feedback information for refusing to receive access; when receiving feedback information of receiving access, calling a web crawler algorithm preset in a database, collecting the content of a legal referee document, and then continuing calling a browser kernel to access other websites in a website list until all websites in the website list are traversed; after receiving feedback information of refusing to receive access, continuing to call a browser kernel to access other websites in the website list until all websites in the website list are traversed; and summarizing the legal referee documents collected by the web crawler algorithm.
Step S102, deconstructing data: deconstructing the content of each legal referee document to obtain node content, wherein the node content comprises but is not limited to original reports, announcements, dispute focuses and evidences.
Because the format of the legal referee document is basically fixed, the content of the legal referee document can be deconstructed by adopting analytic modes such as a regular expression, a json expression or a grok expression and the like during deconstruction.
The entity designation means that when the legal referee document content is deconstructed, the same entity can have multiple expression modes, namely, one entity has multiple possible Chinese meanings, when the node content is determined, one node content is defined as the entity, other entities with the same meanings are defined as the entity designation, and the entity designation and the entity are filled in a mapping table to obtain the mapping table between the entity designation and the entity. For example, if the obtained node contents include "apple", "apple company", etc., the specific term "apple" or "apple company" is used to refer to "apple company", the former is an entity, and the latter is an entity.
Step S103, map construction: and establishing the relationship between the entity and the attribute according to the node content to obtain the legal knowledge graph.
Entities such as plaintiff, defendant, dispute focus, etc., and relationships such as proposing, requesting, etc.
Step S104, establishing a mapping relation: and (4) referring each entity in the legal knowledge graph and an entity in a preset mapping relation table to establish a mapping relation, so as to obtain an updated mapping relation table.
Before the legal knowledge graph is established, an initial mapping table between entity reference and entity can be preset, and after the construction of the legal knowledge graph is completed, all entities in the legal knowledge graph and the entity reference in the initial mapping table are established to form a mapping relation, so that an updated mapping relation table is obtained. For example, if the entities in the legal knowledge graph include "apple company", and the initial mapping table includes the references of the entities such as "apple" or "apple company", the references of the entities are mapped with the entities "apple company" in the legal knowledge graph, so that the candidate entity set is determined according to the updated mapping table.
In the embodiment, data for constructing the legal knowledge graph is obtained through a web crawler technology, the legal knowledge graph is finally obtained through the process of deconstructing the data and constructing the graph, and the legal knowledge graph is used as the basis of entity reference identification to determine the final entity reference.
Step S2, calculating an objective function: calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores respectively to obtain a plurality of objective functions.
The entity reference and candidate entities obtained by the step 1) are more, wherein most of the candidate entities are not the entities which are finally determined, so the step realizes the disambiguation task of the candidate entities by calculating the association scores. In the candidate entity set, any entity refers to a plurality of possible candidate entities, correlation scores are calculated between any two candidate entities in the candidate entities, the entity refers to all the corresponding candidate entities in a traversing manner, a plurality of correlation scores corresponding to the entity are obtained, and the correlation scores obtained by the entity refer are respectively added with all the correlation scores to obtain a plurality of objective functions. This step utilizes the similarity between candidate entities to perform global disambiguation by adding a calculation of a correlation score in the objective function.
In one embodiment, in step S2, the relevance score is obtained by multiplying the context-independent score and the context-dependent score.
1) The context-free score preferably uses a Levenshtein string edit distance formula, i.e., the text edit distance score between the calculated entity reference and the candidate entity is used as the context-free score. The context-free score sim (m, e) is obtained by using the following calculation formula:
Figure BDA0002238651470000101
wherein m is an entity reference, e is one of the candidate entities in the candidate entity set corresponding to the entity reference, | m | and | e | represent the character string lengths of m and e, respectivelyDegree, ed (m, e) is the Levenshtein distance formula, which refers to the minimum number of editing operations between two strings required to convert one to another, wsIs a preset coefficient.
For the above Levenshtein distance formula, for example, for the strings kitten and sitting, the first step, kitten- - > sitten replaces k with s; step two, replacing e with i by sitten- - > sittin; step three, sittin- - > sitting adds g; the cost of each edit, i.e. change (insertion, deletion, replacement) is 1, so ed (missing) is 3, the above example is english, and the same calculation method is adopted for chinese.
2) The context-related score is determined by vectorizing the context referred by the entity and the attributes of the candidate entities and calculating the distance between the two vectors.
The attribute of the candidate entity is related attribute information of the candidate entity in a preset legal knowledge graph. During vectorization, an existing model in an NLP natural language processing system, such as a word2vec word vectorization model, may be used, where word2vec is an NLP tool that can vectorize all words, so that the relations between words can be quantitatively measured, and the relations between words are mined. In the step, the word2vec method is utilized, and the direct calling mode is adopted to carry out vectorization on the context referred by the entity and the attributes of the candidate entities respectively.
When the distance between two vectors is calculated, the context correlation score is preferably obtained by calculating the cosine distance between two vectors, and the calculation formula of the cosine distance is:
Figure BDA0002238651470000111
wherein the content of the first and second substances,
Figure BDA0002238651470000112
two vectors obtained with the method of word2vec are shown,
Figure BDA0002238651470000113
direction of expressionMeasuring the length of the mold.
In this embodiment, the disambiguation task of the candidate entity is quickly and effectively implemented through the Levenshtein distance formula, the cosine distance formula, and the like.
In one embodiment, in step S2, calculating the correlation score of each entity referring to any two of all the corresponding candidate entities includes:
correlation score sim (e) between two candidate entities1,e2) The calculation formula of (2) is as follows:
Figure BDA0002238651470000114
wherein e is1、e2Representing two candidate entities, E1Is represented by1Set of directly connected entities, E2Is represented by2Set of directly connected entities, | E1I denotes E1Number of mesoentities, | E2I denotes E2Number of intermediate entities, E1∩E2Representing the intersection of the two sets, | E | representing the number of all entities in the legal knowledge graph.
Objective function
Figure BDA0002238651470000115
The calculation formula of (2) is as follows:
Figure BDA0002238651470000116
wherein phi (m)i,ei) As a relevance score, coh (e)i,ej) Is the correlation score between the two candidate entities.
In the embodiment, the correlation scores of two candidate entities are obtained through the calculation formula, and considering that a plurality of entity references may exist in the legal text, global disambiguation is performed by increasing the intervention of the correlation scores in the objective function and utilizing the similarity between the candidate entities.
Step S3, determining and linking: and in the entity reference set, determining the entity reference with the maximum objective function value as a final entity reference, and linking the final entity reference to the corresponding entity in the legal knowledge graph.
After all objective functions are calculated in the step 2), the final objective is the maximization of the objective function, and the final objective is obtained
Figure BDA0002238651470000121
For an entity, denote M ═ M1,m2,…,mNThe entity result corresponding to the set is the entity reference M set and the entity set
Figure BDA0002238651470000122
For example, the legal text content input in step 1) is "apple is sold by apple company" and finally the entities obtained in this step refer to "apple company" and "apple", the entity corresponding to "apple company" is "apple company" and the entity corresponding to "apple" is "apple (fruit of the genus malus of the family rosaceae)".
And after the final entity reference is obtained, each entity reference is linked to the corresponding entity in the legal knowledge graph, so that a retrieval basis is provided for subsequent legal case retrieval and evidence guidance intelligent question answering.
For example, linking an entity designation "apple company" to an entity in the legal knowledge graph as "apple company", linking an entity designation "apple" to an entity in the legal knowledge graph as "apple (fruit of the genus malus, rosaceae)".
In the entity linking method based on the knowledge graph, the legal text is segmented by adopting the segmentation calculation, all possible words can be segmented, and the problem of omission of the segmented words is avoided. Under the condition that the obtained word segmentation result is large in quantity, the word segmentation result is compared with a preset mapping table for query, irrelevant words are removed, key words are quickly and efficiently screened out, an entity reference set and a corresponding candidate entity set are added, and data support is provided for subsequently determining correct entity reference. The invention also realizes the disambiguation tasks of a plurality of candidate entities through the calculation of the association scores. Considering that a plurality of entity references may exist in the input legal text, the calculation of the related scores is increased in the objective function, the global disambiguation is further realized by utilizing the similarity between the candidate entities, the determined entity references are finally obtained, and the entity references are linked, so that the phenomena of synonyms and word ambiguity existing in the legal text are avoided.
In one embodiment, a knowledge-graph based entity linking apparatus is provided, as shown in fig. 3, including:
the word segmentation and search module is used for acquiring a legal text, segmenting the legal text to obtain a word segmentation result, searching whether an entity expression identical to the word segmentation result exists in a preset mapping table, if so, putting the entity expression identical to the word segmentation result into an entity expression set, putting an entity corresponding to the entity expression identical to the word segmentation result into a candidate entity set, wherein the entity expression is a name of the entity, and one entity expression corresponds to a plurality of entities;
the calculation module is used for calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores respectively to obtain a plurality of objective functions;
and the determining and linking module is used for determining the entity with the maximum objective function value as the final entity in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph.
In one embodiment, a computer device is provided, which includes a memory and a processor, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to implement the steps of the method for linking entities based on knowledge-graph according to the above embodiments.
In one embodiment, a storage medium storing computer-readable instructions is provided, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for knowledge-graph-based entity linking of the above embodiments. The storage medium may be a nonvolatile storage medium.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express some exemplary embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An entity linking method based on knowledge graph is characterized by comprising the following steps:
obtaining a legal text, segmenting words of the legal text to obtain a word segmentation result, searching whether entity reference identical to the word segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the word segmentation result into an entity reference set, and putting an entity corresponding to the entity reference identical to the word segmentation result into a candidate entity set, wherein the entity reference is a name of the entity, and one entity reference corresponds to a plurality of entities;
calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores respectively to obtain a plurality of objective functions;
and in the entity reference set, determining the entity reference with the maximum objective function value as a final entity reference, and linking the final entity reference to a corresponding entity in the legal knowledge graph.
2. The method for entity linking based on knowledge graph of claim 1, wherein the obtaining legal text and performing word segmentation on the legal text to obtain word segmentation result comprises:
and performing word segmentation on the obtained legal text to obtain a plurality of words as word segmentation results, wherein the minimum word segmentation sliding window is a preset minimum word segmentation threshold value when performing word segmentation, and the maximum word segmentation sliding window is the length of the legal text.
3. The method for linking entities based on knowledge graph according to claim 1, wherein the mapping table is a mapping relation table between entity reference and entity in a preset legal knowledge graph, and comprises:
acquiring a legal referee document in a preset website through a preset crawler script;
deconstructing the content of each legal referee document to obtain node content, wherein the node content comprises but is not limited to original reports, announcements, dispute focuses and evidences;
establishing a relation between the entity and the attribute of the node content to obtain a legal knowledge graph;
and establishing a mapping relation between each entity in the legal knowledge graph and the entity in a preset mapping relation table to obtain an updated mapping relation table.
4. The method of knowledge-graph-based entity linking according to claim 1, wherein the calculating of association scores between each entity reference in the set of entity references and the corresponding candidate entity comprises:
the relevance score is obtained by multiplying a context-independent score and a context-dependent score;
the context-free fraction sim (m, e) is obtained by using the following calculation formula:
Figure FDA0002238651460000021
wherein m is an entity reference, e is one of the candidate entities in the candidate entity set corresponding to the entity reference, | m | and | e | respectively represent the string lengths of m and e, ed (m, e) is a distance formula, which means the minimum number of editing operations required for converting one character string into another between two character strings, wsIs a preset coefficient;
vectorizing the context referred by the entity and the attributes of the candidate entities, and determining the context correlation score by calculating the distance between the two vectors.
5. The method of knowledge-graph-based entity linking according to claim 4, wherein said determining the context correlation score by calculating a distance of two vectors comprises:
obtaining the context correlation score by calculating the cosine distance of the two vectors, wherein the calculation formula of the cosine distance is as follows:
Figure FDA0002238651460000022
wherein the content of the first and second substances,
Figure FDA0002238651460000023
which represents two vectors of the vector(s),
Figure FDA0002238651460000024
representing the vector modulo length.
6. The method of claim 1, 4 or 5, wherein the calculating the correlation score of each entity referring to any two of all the corresponding candidate entities comprises:
a correlation score sim (e) between two of said candidate entities1,e2) The calculation formula of (2) is as follows:
Figure FDA0002238651460000025
wherein e is1、e2Representing two of said candidate entities, E1Is represented by1Set of directly connected entities, E2Is represented by2Set of directly connected entities, | E1I denotes E1Number of mesoentities, | E2I denotes E2Number of intermediate entities, E1∩E2Representing the intersection of the two sets, | E | representing the number of all entities in the legal knowledge graph.
7. The method of claim 6, wherein the adding the relevance scores and the corresponding relevance scores to obtain a plurality of objective functions comprises:
the objective function
Figure FDA0002238651460000031
The calculation formula of (2) is as follows:
Figure FDA0002238651460000032
wherein phi (m)i,ei) For the relevance score, coh (e)i,ej) Is the correlation score between two of the candidate entities.
8. An apparatus for linking entities based on knowledge-graph, comprising:
the system comprises a word segmentation and search module, a word segmentation and search module and a word segmentation and search module, wherein the word segmentation and search module is used for acquiring a legal text, segmenting the legal text to obtain a word segmentation result, searching whether an entity reference identical to the word segmentation result exists in a preset mapping table, if so, putting the entity reference identical to the word segmentation result into an entity reference set, and putting an entity corresponding to the entity reference identical to the word segmentation result into a candidate entity set, wherein the entity reference is a name of an entity, and one entity reference corresponds to a plurality of entities;
the calculation module is used for calculating the association scores between each entity reference in the entity reference set and the corresponding candidate entities, calculating the association scores of any two candidate entities in all the candidate entities corresponding to each entity reference, and adding the association scores and the corresponding association scores respectively to obtain a plurality of objective functions;
and the determining and linking module is used for determining the entity reference with the maximum objective function value as the final entity reference in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph.
9. A computer device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of the knowledge-graph based entity linking method of any one of claims 1 to 7.
10. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the knowledge-graph based entity linking method of any one of claims 1 to 7.
CN201910992304.3A 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium Active CN110929038B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910992304.3A CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium
PCT/CN2020/111240 WO2021073254A1 (en) 2019-10-18 2020-08-26 Knowledge graph-based entity linking method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910992304.3A CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110929038A true CN110929038A (en) 2020-03-27
CN110929038B CN110929038B (en) 2023-07-21

Family

ID=69849193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910992304.3A Active CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110929038B (en)
WO (1) WO2021073254A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814477A (en) * 2020-07-06 2020-10-23 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN111858903A (en) * 2020-06-11 2020-10-30 创新工场(北京)企业管理股份有限公司 Method and device for negative news early warning
CN112231575A (en) * 2020-10-30 2021-01-15 衢州量智科技有限公司 Knowledge recommendation method and system for complex electromechanical product design process
CN112380865A (en) * 2020-11-10 2021-02-19 北京小米松果电子有限公司 Method, device and storage medium for identifying entity in text
WO2021073254A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Knowledge graph-based entity linking method and apparatus, device, and storage medium
CN113220835A (en) * 2021-05-08 2021-08-06 北京百度网讯科技有限公司 Text information processing method and device, electronic equipment and storage medium
CN113326697A (en) * 2021-05-31 2021-08-31 云南电网有限责任公司电力科学研究院 Knowledge graph-based electric power text entity semantic understanding method
CN114741627A (en) * 2022-04-12 2022-07-12 中国人民解放军32802部队 Internet-oriented auxiliary information searching method
CN115269879A (en) * 2022-09-05 2022-11-01 北京百度网讯科技有限公司 Knowledge structure data generation method, data search method and risk warning method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360605B (en) * 2021-06-23 2024-02-23 中国科学技术大学 Global entity linking method based on topic entity context iterative optimization
CN115599903A (en) * 2021-07-07 2023-01-13 腾讯科技(深圳)有限公司(Cn) Object tag obtaining method and device, electronic equipment and storage medium
CN115809311A (en) * 2022-12-22 2023-03-17 企查查科技有限公司 Data processing method and device of knowledge graph and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224648A (en) * 2015-09-29 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of entity link method and system
CN109255031A (en) * 2018-09-20 2019-01-22 苏州友教习亦教育科技有限公司 The data processing method of knowledge based map
CN109635114A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for handling information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198026A1 (en) * 2004-02-03 2005-09-08 Dehlinger Peter J. Code, system, and method for generating concepts
CN103488724B (en) * 2013-09-16 2016-09-28 复旦大学 A kind of reading domain knowledge map construction method towards books
CN106844413B (en) * 2016-11-11 2020-12-08 南京柯基数据科技有限公司 Method and device for extracting entity relationship
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN110929038B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Knowledge graph-based entity linking method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224648A (en) * 2015-09-29 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of entity link method and system
CN109255031A (en) * 2018-09-20 2019-01-22 苏州友教习亦教育科技有限公司 The data processing method of knowledge based map
CN109635114A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for handling information

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021073254A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Knowledge graph-based entity linking method and apparatus, device, and storage medium
CN111858903A (en) * 2020-06-11 2020-10-30 创新工场(北京)企业管理股份有限公司 Method and device for negative news early warning
CN111814477A (en) * 2020-07-06 2020-10-23 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN111814477B (en) * 2020-07-06 2022-06-21 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN112231575A (en) * 2020-10-30 2021-01-15 衢州量智科技有限公司 Knowledge recommendation method and system for complex electromechanical product design process
CN112231575B (en) * 2020-10-30 2022-05-10 衢州量智科技有限公司 Knowledge recommendation method and system for complex electromechanical product design process
CN112380865A (en) * 2020-11-10 2021-02-19 北京小米松果电子有限公司 Method, device and storage medium for identifying entity in text
CN113220835A (en) * 2021-05-08 2021-08-06 北京百度网讯科技有限公司 Text information processing method and device, electronic equipment and storage medium
CN113220835B (en) * 2021-05-08 2023-09-29 北京百度网讯科技有限公司 Text information processing method, device, electronic equipment and storage medium
CN113326697A (en) * 2021-05-31 2021-08-31 云南电网有限责任公司电力科学研究院 Knowledge graph-based electric power text entity semantic understanding method
CN114741627A (en) * 2022-04-12 2022-07-12 中国人民解放军32802部队 Internet-oriented auxiliary information searching method
CN115269879A (en) * 2022-09-05 2022-11-01 北京百度网讯科技有限公司 Knowledge structure data generation method, data search method and risk warning method

Also Published As

Publication number Publication date
CN110929038B (en) 2023-07-21
WO2021073254A1 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN107679039B (en) Method and device for determining statement intention
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN110781276A (en) Text extraction method, device, equipment and storage medium
CN109947952B (en) Retrieval method, device, equipment and storage medium based on English knowledge graph
CN111831802B (en) Urban domain knowledge detection system and method based on LDA topic model
RU2704531C1 (en) Method and apparatus for analyzing semantic information
CN116775847B (en) Question answering method and system based on knowledge graph and large language model
CN108334489B (en) Text core word recognition method and device
CN106980664B (en) Bilingual comparable corpus mining method and device
JP2020191075A (en) Recommendation of web apis and associated endpoints
CN111539197A (en) Text matching method and device, computer system and readable storage medium
CN110909539A (en) Word generation method, system, computer device and storage medium of corpus
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
KR20190059084A (en) Natural language question-answering system and learning method
CN113392651A (en) Training word weight model, and method, device, equipment and medium for extracting core words
CN114595327A (en) Data enhancement method and device, electronic equipment and storage medium
CN110968664A (en) Document retrieval method, device, equipment and medium
CN112883182A (en) Question-answer matching method and device based on machine reading
CN113515589A (en) Data recommendation method, device, equipment and medium
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
CN116521892A (en) Knowledge graph application method, knowledge graph application device, electronic equipment, medium and program product
US9104755B2 (en) Ontology enhancement method and system
CN114780700A (en) Intelligent question-answering method, device, equipment and medium based on machine reading understanding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant