CN110929038B - Knowledge graph-based entity linking method, device, equipment and storage medium - Google Patents

Knowledge graph-based entity linking method, device, equipment and storage medium Download PDF

Info

Publication number
CN110929038B
CN110929038B CN201910992304.3A CN201910992304A CN110929038B CN 110929038 B CN110929038 B CN 110929038B CN 201910992304 A CN201910992304 A CN 201910992304A CN 110929038 B CN110929038 B CN 110929038B
Authority
CN
China
Prior art keywords
entity
word segmentation
entities
legal
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910992304.3A
Other languages
Chinese (zh)
Other versions
CN110929038A (en
Inventor
陈晨
雷骏峰
刘嘉伟
于修铭
李可
汪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910992304.3A priority Critical patent/CN110929038B/en
Publication of CN110929038A publication Critical patent/CN110929038A/en
Priority to PCT/CN2020/111240 priority patent/WO2021073254A1/en
Application granted granted Critical
Publication of CN110929038B publication Critical patent/CN110929038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for entity linking based on a knowledge graph. The method comprises the following steps: word segmentation is carried out on legal texts to obtain word segmentation results, whether entity references identical to the word segmentation results exist or not is searched, if so, the entity references are put into an entity reference set, and the entity is put into a candidate entity set; calculating the associated score and the related score respectively, and adding the associated score and each corresponding related score to obtain an objective function; and in the entity reference set, determining the entity reference with the largest objective function value as the final entity reference, and linking the final entity reference into the corresponding entity in the legal knowledge graph. According to the method, the final entity index is determined by calculating the association score of the entity index and the association score between the candidate entities, and the entity index is linked, so that synonyms and word ambiguity phenomena in legal texts are avoided.

Description

Knowledge graph-based entity linking method, device, equipment and storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for entity linking based on a knowledge graph.
Background
Knowledge graph can express the information of the Internet into a form which is more similar to the human cognitive world, and provides a capability of better organizing, managing and understanding the mass information of the Internet. The knowledge graph brings vitality to the internet semantic search, and simultaneously shows strong power in intelligent question-answering, big data analysis and decision making, and becomes an infrastructure of the internet knowledge-based intelligent service. Knowledge graph, big data and deep learning together become one of the core driving forces for promoting artificial intelligence development. In the knowledge graph, each node represents an entity existing in the real world, each side is a relationship between the entities, and the knowledge graph is the most effective representation mode of the relationship.
The construction of legal knowledge graph integrates legal knowledge, digs legal hot spot, and plays an important role in legal event prediction, construction of legal field expert system and the like. Since the knowledge system of law is very complex, it is a combination of various logics. The law document contains a large number of entities such as original notices, disputes, facts, legal springs and the like, which are very important for the links of case information extraction, legal information retrieval and the like, but synonyms and word ambiguity phenomena are commonly existed in Chinese language, so how to find out the entities in the law document by utilizing a proper natural language processing technology and link the entities to the correct entities in the legal knowledge graph becomes very important.
Disclosure of Invention
In view of this, it is necessary to provide a knowledge-based entity linking method, apparatus, device and storage medium for solving the problem of how to link entities in complex legal documents to legal knowledge-graph correctly.
An entity linking method based on a knowledge graph, comprising:
obtaining legal text, performing word segmentation on the legal text to obtain a word segmentation result, searching whether entity indexes identical to the word segmentation result exist in a preset mapping table, if so, putting the entity indexes identical to the word segmentation result into an entity index set, putting the entity indexes identical to the word segmentation result into a candidate entity set, wherein the entity indexes refer to the entity names, and one entity index corresponds to a plurality of entities;
calculating the association scores between each entity index and the corresponding candidate entity in the entity index set, calculating the association scores of any two candidate entities in all the corresponding candidate entities of each entity index, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions;
and in the entity reference set, determining the entity reference with the largest objective function value as the final entity reference, and linking the final entity reference into the corresponding entity in the legal knowledge graph.
In one possible design, the obtaining legal text, and performing word segmentation on the legal text to obtain a word segmentation result includes:
and performing word segmentation on the obtained legal text, wherein a plurality of words are obtained as word segmentation results, a minimum word segmentation sliding window is a preset minimum word segmentation threshold value when word segmentation is performed, and a maximum word segmentation sliding window is the legal text length.
In one possible design, the mapping table is a mapping relationship table between entity references and entities in a preset legal knowledge graph, and includes:
acquiring legal referee documents in a preset website through a preset crawler script;
deconstructing the content of each of the legal referees documents to obtain node content including, but not limited to, textual matters, notices, disputes, and evidence;
constructing a relation between the entity and the attribute by the node content to obtain a legal knowledge graph;
and establishing a mapping relation between each entity in the legal knowledge graph and the entity index in a preset mapping relation table to obtain an updated mapping relation table.
In one possible design, the calculating the association score between each entity reference in the set of entity references and the corresponding candidate entity includes:
the associated score is obtained by multiplying the context-free score by the context-dependent score;
the context-free score sim (m, e) is obtained using the following calculation formula:
wherein m is an entity designation, e is an entity designationRepresenting one candidate entity in the corresponding candidate entity set, wherein m and e represent the character string lengths of m and e respectively, and ed (m, e) is a distance formula, which refers to the minimum editing operation number, w, required for converting from one to the other between two character strings s Is a preset coefficient;
and vectorizing the context pointed by the entity and the attribute of the candidate entity, and determining the context correlation score by calculating the distance between the two vectors.
In one possible design, the determining the context correlation score by calculating the distance of two vectors includes:
the context correlation score is obtained by calculating the cosine distance of the two vectors, and the calculation formula of the cosine distance is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing two vectors, +.>Representing the vector modulo length.
In one possible design, the calculating the correlation score of each entity to any two candidate entities in all the corresponding candidate entities includes:
a correlation score sim (e) 1 ,e 2 ) The calculation formula of (2) is as follows:
wherein e 1 、e 2 Representing two of said candidate entities, E 1 Representation and e 1 Set of directly connected entities, E 2 Representation and e 2 Set of directly connected entities, |E 1 I represents E 1 Number of middle entity, |E 2 I represents E 2 Number of middle entities, E 1 ∩E 2 Representing the intersection of the two sets, |e| represents the number of total entities in the legal knowledge graph.
In one possible design, the adding the associated score and the corresponding associated scores to obtain a plurality of objective functions includes:
the objective functionThe calculation formula of (2) is as follows:
wherein phi (m) i ,e i ) For the association score, coh (e i ,e j ) Is a correlation score between two of the candidate entities.
An entity linking device based on a knowledge graph, comprising:
the word segmentation and searching module is used for obtaining legal text, carrying out word segmentation on the legal text to obtain a word segmentation result, searching whether entity indexes which are the same as the word segmentation result exist in a preset mapping table, if so, putting the entity indexes which are the same as the word segmentation result into an entity index set, putting the entity indexes which are the same as the word segmentation result into a candidate entity set, wherein the entity indexes are the entity names, and one entity index corresponds to a plurality of entities;
the calculating module is used for calculating the association scores between each entity index and the corresponding candidate entity in the entity index set, calculating the association scores of any two candidate entities in all the corresponding candidate entities of each entity index, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions;
and the determining and linking module is used for determining the entity reference with the largest objective function value as the final entity reference in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph.
A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the knowledge-graph based entity linking method described above.
A storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the knowledge-graph based entity linking method described above.
The entity linking method, the device, the equipment and the storage medium based on the knowledge graph comprise the steps of obtaining legal text, segmenting the legal text to obtain a segmentation result, searching whether entity indexes identical to the segmentation result exist in a preset mapping table, if so, putting the entity indexes identical to the segmentation result into an entity index set, and putting the entity corresponding to the entity indexes identical to the segmentation result into a candidate entity set; calculating the association scores between each entity index and the corresponding candidate entity in the entity index set, calculating the association scores of any two candidate entities in all the corresponding candidate entities of each entity index, and adding the association scores and the corresponding association scores to obtain a plurality of objective functions; and in the entity reference set, determining the entity reference with the largest objective function value as the final entity reference, and linking the final entity reference into the corresponding entity in the legal knowledge graph. According to the method, the final entity index is determined by calculating the association score of the entity index and the association score between the candidate entities, and the entity index is linked, so that synonyms and word ambiguity phenomena in legal texts are avoided. After the entity index is linked to the legal knowledge graph, the entity link can help the machine to truly understand semantic information of legal entities in the free text, and help the machine to effectively perform tasks such as subsequent case retrieval, evidence guidance, intelligent question-answering and the like.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a flow chart of a knowledge-graph-based entity linking method in one embodiment of the invention;
FIG. 2 is a flowchart of step S1 in one embodiment of the present invention;
fig. 3 is a block diagram of a knowledge-graph-based entity linking apparatus in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Fig. 1 is a flowchart of an entity linking method based on a knowledge graph according to an embodiment of the present invention, as shown in fig. 1, the entity linking method based on the knowledge graph includes the following steps:
step S1, word segmentation and search: obtaining legal text, segmenting the legal text to obtain a segmentation result, searching whether entity indexes identical to the segmentation result exist in a preset mapping table, if so, putting the entity indexes identical to the segmentation result into an entity index set, putting the entity indexes identical to the segmentation result into a candidate entity set, wherein the entity indexes refer to the entity names, and one entity index corresponds to a plurality of entities.
Because some specific nouns may be denoted by abbreviations or names in daily written texts, for example, words such as "apple" or "apple company" are used to refer to specific nouns such as "apple" or "apple company", which are abbreviated or names such as "apple company" or are called entity designations, some specific nouns are entities, and a preset mapping table is obtained between the entity designations and the entities according to the corresponding relation between words.
The legal text in the step is a sentence or a section of words input by a user, and entity indication recognition is performed according to the input legal text. When entity indication identification is carried out, firstly word segmentation is carried out on legal texts, a sentence or a section of characters are segmented into a plurality of words, the words are compared with a mapping table, entity indication and entity indication corresponding to the entity indication are obtained, and the entity indication and the corresponding entity obtained after reading and searching are classified and put into an entity indication set and a candidate entity set.
Wherein the entity reference set is written as: m= { M 1 ,m 2 ,…,m N M refers to the entity designation for which there is a word segmentation result in the mapping table. The candidate entity set is noted as: e (E) i ={e i1 ,e i2 ,…,e ik E refers to an entity in the mapping table referring to the corresponding entity (i=1, 2, …, N).
In one embodiment, in step S1, obtaining legal text, and word segmentation is performed on the legal text to obtain a word segmentation result, including:
the method comprises the steps of performing word segmentation on an acquired legal text, wherein a plurality of words are obtained as word segmentation results, a minimum word segmentation sliding window is a preset minimum word segmentation threshold value when word segmentation is performed, and a maximum word segmentation sliding window is the legal text length.
For example, if the content of the inputted legal text is "apple is sold by apple corporation", the preset minimum word segmentation threshold is 2, and the maximum word segmentation sliding window is 10, then the following can be obtained: the word segmentation result when the window size is 2 is "apple", "fruit company", "sales", "sold", "yes", "apple", "fruit prayer", the word segmentation results obtained when the window size is 3 are apple company, fruit company, company sales, department sales, selling, apple and apple prayer, and the word segmentation results obtained when the window size is 10 are apple prayer and apple prayer. "apple", "fruit company", "selling", "sold", "yes", "apple", "fruit prayer", "fruit company", "company selling", "sold" apple "," apple prayer "…" apple company sold "apple prayer" are all word segmentation results obtained by the present embodiment.
Each term in the word segmentation result is queried in a mapping table, whether the same entity reference exists or not, when two entity references of 'apple' and 'apple company' exist in the mapping table, the entity references are collected to be 'apple' and 'apple company', the entity references in the mapping table are searched for corresponding entities, and the corresponding entities of 'apple' can be obtained to be 'apple (rosaceae apple fruit)', 'apple (apple product company)', 'apple (korean Kang Liguan guide movie)', and the like, and the collection of the entities is the candidate entity collection. There is also a corresponding set of candidate entities for the entity reference "apple company". At this time, m= [ "apple", "apple company" ], e1= [ "apple (fruit of genus malus of family rosaceae)", "apple (apple product company)", "apple (movie guide of korean 2008 Kang Liguan)", … ]
The minimum word segmentation threshold value range of the embodiment is more than or equal to 2 and less than or equal to the legal text length. Through the method, the obtained legal text is segmented, all possible words can be divided, and omission of the words is avoided.
In one embodiment, in step S1, the mapping table is a mapping relationship table between entity designations and entities in a preset legal knowledge graph, as shown in fig. 2, including:
step S101, crawling data: and acquiring legal referee documents in a preset website through a preset crawler script.
The step is to crawl legal referee documents on legal aspects in each website disclosed in the network through a crawler technology. The specific crawling mode is as follows:
presetting a website list, wherein the website list comprises websites of a plurality of legal referee document contents; invoking a browser kernel to sequentially send out a webpage access request to websites in a website list, and waiting for receiving feedback information sent by websites of the webpage access request, wherein the feedback information comprises feedback information for receiving access and feedback information for refusing to receive access; when receiving the feedback information of the receiving access, invoking a web crawler algorithm preset in a database, collecting legal referee document content, and then continuing to invoke a browser kernel to access other websites in the website list until all websites in the website list are traversed; after receiving the feedback information of refusing to receive the access, continuing to call the browser kernel to access other websites in the website list until all websites in the website list are traversed; and summarizing legal referee documents collected by the web crawler algorithm.
Step S102, deconstructing data: deconstructing the content of each legal referee document to obtain node content including, but not limited to, textual advice, disputes focus, and evidence.
Because the format of the legal referee document is basically fixed, the legal referee document content can be deconstructed by adopting analytic modes such as regular expressions, json expressions or hook expressions and the like in the deconstructing process.
The entity refers to that the same entity may have multiple expression modes when deconstructing the content of the legal referee document, namely one entity contains multiple possible Chinese meanings, one node content is defined as the entity when determining the node content, other entities with the same meaning are defined as the entity reference, the entity reference and the entity are filled in a mapping table, and a mapping table between the entity reference and the entity is obtained. For example, the obtained node content includes "apple," "apple company," etc., and the "apple" or "apple company" is used to refer to a specific noun of "apple company," the former is referred to as an entity, and the latter is referred to as an entity.
Step S103, constructing a map: and constructing the relation between the entity and the attribute by the node content to obtain the legal knowledge graph.
Entities such as original notices, disputes, etc., relationships such as proposition, establishment of requests, etc.
Step S104, establishing a mapping relation: and establishing a mapping relation between each entity in the legal knowledge graph and the entity index in the preset mapping relation table to obtain an updated mapping relation table.
Before the legal knowledge graph is built, an initial mapping table between an entity reference and the entity can be preset, and after the legal knowledge graph is built, mapping relations between all entities in the legal knowledge graph and the entity reference in the initial mapping table are built, so that an updated mapping relation table is obtained. For example, the entity in the legal knowledge graph contains "apple company", and the initial mapping table contains entity references such as "apple" or "apple company", so that mapping relation is established between the entity references and the entity "apple company" in the legal knowledge graph, so that the candidate entity set is determined according to the updated mapping relation table.
According to the embodiment, the data for constructing the legal knowledge graph is obtained through the web crawler technology, the legal knowledge graph is finally obtained through the process of deconstructing the data and constructing the graph, the legal knowledge graph is used as the basis of entity indication identification, and the final entity indication is determined.
Step S2, calculating an objective function: calculating the correlation scores between each entity index and the corresponding candidate entity in the entity index set, calculating the correlation scores of any two candidate entities in all the corresponding candidate entities of each entity index, and respectively adding the correlation scores with the corresponding correlation scores to obtain a plurality of objective functions.
The entity index obtained in the step 1) is more than the candidate entity, wherein most candidate entities are not the finally determined entities, so that the step realizes the disambiguation task of the candidate entities through calculating the association score. In the candidate entity set, any entity refers to a plurality of corresponding candidate entities, in the plurality of candidate entities, correlation scores are calculated between any two candidate entities, all candidate entities corresponding to the entity refers to are traversed to obtain a plurality of correlation scores corresponding to the entity refers, the correlation scores obtained by the entity refers to are respectively added with all the correlation scores, and a plurality of objective functions are obtained. The step uses the similarity between candidate entities to perform global disambiguation by adding a calculation of the correlation score to the objective function.
In one embodiment, in step S2, the relevance score is multiplied by a context-free score and a context-dependent score.
1) The context-free score preferably adopts a Levenshtein string edit distance formula, i.e., the calculated entity refers to the text edit distance score of the candidate entity as the context-free score. The context-free score sim (m, e) is obtained using the following calculation formula:
wherein m is an entity, e is one candidate entity in the corresponding candidate entity set, m and e represent the lengths of the character strings of m and e, respectively, and ed (m, e) is a Levenshtein distance formula, which refers to the minimum editing operation number, w, required for converting one into the other between two character strings s Is a preset coefficient.
The above Levenshtein distance formula, for example, for the strings kitten and sitting, in a first step, kitten— > sitten replaces k with s; second step, sitten- > sittin replaces e with i; thirdly, adding g into the settin- > setting; each time an edit is made, that is, a change (insert, delete, replace) takes the cost of 1, so ed (kitten, position) =3, the above example is english, and the same calculation method is adopted for chinese.
2) The context correlation score is determined by vectorizing the context that the entity refers to with the attributes of the candidate entity and calculating the distance of the two vectors.
The attribute of the candidate entity is related attribute information of the candidate entity in a preset legal knowledge graph. In the vectorization, an existing model in an NLP natural language processing system, such as a word2vec word vectorization model, is adopted, and word2vec is an NLP tool, which can vectorize all words, so that the relationship between the words can be quantitatively measured, and the relationship between the words can be mined. In the step, a word2vec method is utilized, and a direct calling mode is adopted to respectively vectorize the context pointed by the entity and the attribute of the candidate entity.
When the distance between the two vectors is calculated, the context correlation score is preferably obtained by calculating the cosine distance between the two vectors, and the cosine distance is calculated by the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing two vectors obtained using the word2vec method, +.>Representing the vector modulo length.
In the embodiment, the disambiguation task of the candidate entity is quickly and effectively realized through the modes of the Levenshtein distance formula, the cosine distance formula and the like.
In one embodiment, in step S2, calculating the relevance score of each entity referring to any two candidate entities among all the corresponding candidate entities includes:
correlation score sim (e) between two candidate entities 1 ,e 2 ) The calculation formula of (2) is as follows:
wherein e 1 、e 2 Representing two candidate entities, E 1 Representation and e 1 Set of directly connected entities, E 2 Representation and e 2 Set of directly connected entities, |E 1 I represents E 1 Number of middle entity, |E 2 I represents E 2 Number of middle entities, E 1 ∩E 2 Representing the intersection of the two sets, |e| represents the number of total entities in the legal knowledge graph.
Objective functionThe calculation formula of (2) is as follows:
wherein phi (m) i ,e i ) To correlate scores, coh (e i ,e j ) Is the correlation score between two candidate entities.
According to the embodiment, the correlation scores of the two candidate entities are obtained through the calculation formula, and considering that a plurality of entity references possibly exist in legal texts, global disambiguation is performed by adding the intervention of the correlation scores in the objective function and utilizing the similarity between the candidate entities.
Step S3, determining and linking: and in the entity reference set, determining the entity reference with the largest objective function value as the final entity reference, and linking the final entity reference into the corresponding entity in the legal knowledge graph.
After all objective functions are calculated in the step 2), the final objective is the maximization of the objective function, and finally the final objective is obtainedRefer to m= { M for entity 1 ,m 2 ,…,m N Entity results corresponding to the set, the entity results refer to M set and entity set +.>
For example, the legal text content inputted in the step 1) is "apple prayer's apple" and finally, the entity obtained in the step is "apple prayer" and "apple", the entity corresponding to "apple prayer" is "apple prayer", and the entity corresponding to "apple" is "apple (fruit of genus apple of family rosaceae)".
After the final entity index is obtained, each entity index is also linked to the corresponding entity in the legal knowledge graph, so that a retrieval basis is provided for subsequent legal case retrieval and evidence guiding intelligent question-answering.
For example, an entity designation "apple company" is linked to an entity "apple company" in the legal knowledge-graph, and an entity designation "apple" is linked to an entity "apple (fruit of the genus apple of the family Rosaceae)" in the legal knowledge-graph.
According to the entity linking method based on the knowledge graph, word segmentation is carried out on legal texts by word segmentation calculation, all possible words can be divided, and the problem of omission of the divided words is avoided. Under the condition that the obtained word segmentation result is large in quantity, comparing and inquiring the word segmentation result with a preset mapping table, removing irrelevant words, rapidly and efficiently screening out key words, adding an entity indication set and a corresponding candidate entity set, and providing data support for subsequent determination of correct entity indication. The invention also realizes the disambiguation task of a plurality of candidate entities through the calculation of the associated scores. Considering that a plurality of entity designations possibly exist in the input legal text, calculating the relevant scores is added in the objective function, global disambiguation is further realized by utilizing the similarity between candidate entities, the determined entity designations are finally obtained, and the entity designations are linked, so that the phenomenon of synonyms and word ambiguity in the legal text is avoided.
In one embodiment, a knowledge-graph-based entity linking apparatus is provided, as shown in fig. 3, including:
the word segmentation and searching module is used for obtaining legal text, segmenting the legal text to obtain a word segmentation result, searching whether entity indexes which are the same as the word segmentation result exist in a preset mapping table, if so, putting the entity indexes which are the same as the word segmentation result into an entity index set, putting the entity indexes which are the same as the word segmentation result into a candidate entity set, wherein the entity indexes refer to entity names, and one entity index corresponds to a plurality of entities;
the computing module is used for computing the correlation score between each entity indication computing entity indication and the corresponding candidate entity in the entity indication set, computing the correlation score of any two candidate entities in all the corresponding candidate entities of each entity indication, and adding the correlation score and each corresponding correlation score respectively to obtain a plurality of objective functions;
and the determining and linking module is used for determining the entity reference with the largest objective function value as the final entity reference in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph.
In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores computer readable instructions that, when executed by the processor, cause the processor to implement the steps in the knowledge-graph-based entity linking method of the above embodiments when executing the computer readable instructions.
In one embodiment, a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the knowledge-graph based entity linking method of the above embodiments is presented. Wherein the storage medium may be a non-volatile storage medium.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above-described embodiments represent only some exemplary embodiments of the invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (7)

1. The entity linking method based on the knowledge graph is characterized by comprising the following steps of:
obtaining legal text, performing word segmentation on the legal text to obtain a word segmentation result, searching whether entity indexes identical to the word segmentation result exist in a preset mapping table, if so, putting the entity indexes identical to the word segmentation result into an entity index set, putting the entity indexes identical to the word segmentation result into a candidate entity set, wherein the entity indexes refer to the entity names, and one entity index corresponds to a plurality of entities;
calculating the correlation score between each entity index and the corresponding candidate entity in the entity index set, calculating the correlation score of any two candidate entities in all corresponding candidate entities of each entity index, and respectively adding the correlation score and each corresponding correlation score to obtain a plurality of objective functions;
in the entity reference set, determining the entity reference with the largest objective function value as the final entity reference, and linking the final entity reference to the corresponding entity in the legal knowledge graph;
the calculating the association score between each entity reference and the corresponding candidate entity in the entity reference set comprises:
the associated score is obtained by multiplying the context-free score by the context-dependent score;
the context-free score sim (m, e) is obtained using the following calculation formula:
wherein m is an entity, e is one candidate entity in the corresponding candidate entity set, m and e represent the character string lengths of m and e, respectively, and ed (m, e) is a distance formula, which refers to the minimum editing operation number, w, required for converting one into the other between two character strings s Is a preset coefficient;
vectorizing the context pointed by the entity and the attribute of the candidate entity, and determining the context correlation score by calculating the distance between the two vectors;
the calculating the relevant scores of any two candidate entities in all the corresponding candidate entities includes:
a correlation score sim (e) 1 ,e 2 ) The calculation formula of (2) is as follows:
wherein e 1 、e 2 Representing two of said candidate entities, E 1 Representation and e 1 Set of directly connected entities, E 2 Representation and e 2 Set of directly connected entities, |E 1 I represents E 1 Number of middle entity, |E 2 I represents E 2 Number of middle entities, E 1 ∩E 2 Representing the intersection of two sets, |E| represents the number of all entities in the legal knowledge graph;
the step of adding the associated scores and the corresponding associated scores to obtain a plurality of objective functions, including:
the objective functionThe calculation formula of (2) is as follows:
wherein phi (m) i ,e i ) For the association score, coh (e i ,e j ) Is a correlation score between two of the candidate entities.
2. The knowledge-graph-based entity linking method of claim 1, wherein the obtaining legal text, and performing word segmentation on the legal text to obtain a word segmentation result, comprises:
and performing word segmentation on the obtained legal text, wherein a plurality of words are obtained as word segmentation results, a minimum word segmentation sliding window is a preset minimum word segmentation threshold value when word segmentation is performed, and a maximum word segmentation sliding window is the legal text length.
3. The knowledge-graph-based entity linking method according to claim 1, wherein the mapping table is a mapping relationship table between entity references and entities in a preset legal knowledge graph, and the mapping table comprises:
acquiring legal referee documents in a preset website through a preset crawler script;
deconstructing the content of each of the legal referees documents to obtain node content including, but not limited to, textual matters, notices, disputes, and evidence;
constructing a relation between the entity and the attribute by the node content to obtain a legal knowledge graph;
and establishing a mapping relation between each entity in the legal knowledge graph and the entity index in a preset mapping relation table to obtain an updated mapping relation table.
4. The knowledge-graph based entity linking method of claim 1, wherein said determining the context correlation score by calculating a distance of two vectors comprises:
the context correlation score is obtained by calculating the cosine distance of the two vectors, and the calculation formula of the cosine distance is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing two vectors, +.>Representing the vector modulo length.
5. An entity linking device based on a knowledge graph, comprising:
the word segmentation and searching module is used for obtaining legal text, carrying out word segmentation on the legal text to obtain a word segmentation result, searching whether entity indexes which are the same as the word segmentation result exist in a preset mapping table, if so, putting the entity indexes which are the same as the word segmentation result into an entity index set, putting the entity indexes which are the same as the word segmentation result into a candidate entity set, wherein the entity indexes are the entity names, and one entity index corresponds to a plurality of entities;
the calculating module is used for calculating the association scores between each entity index and the corresponding candidate entity in the entity index set, calculating the association scores of any two candidate entities in all the corresponding candidate entities of each entity index, and respectively adding the association scores with the corresponding association scores to obtain a plurality of objective functions;
the determining and linking module is used for determining the entity reference with the largest objective function value as the final entity reference in the entity reference set and linking the final entity reference to the corresponding entity in the legal knowledge graph;
the calculating module is specifically used for obtaining the association score by multiplying the context-free score and the context-related score;
the context-free score sim (m, e) is obtained using the following calculation formula:
wherein m is an entity, e is one candidate entity in the corresponding candidate entity set, m and e represent the character string lengths of m and e, respectively, and ed (m, e) is a distance formula, which refers to the minimum editing operation number, w, required for converting one into the other between two character strings s Is a preset coefficient;
vectorizing the context pointed by the entity and the attribute of the candidate entity, and determining the context correlation score by calculating the distance between the two vectors;
a correlation score sim (e) 1 ,e 2 ) The calculation formula of (2) is as follows:
wherein e 1 、e 2 Representing two of said candidate entities, E 1 Representation and e 1 Set of directly connected entities, E 2 Representation and e 2 Set of directly connected entities, |E 1 I represents E 1 Number of middle entity, |E 2 I represents E 2 Number of middle entities, E 1 ∩E 2 Representing the intersection of two sets, |E| represents the number of all entities in the legal knowledge graph;
the objective functionThe calculation formula of (2) is as follows:
wherein phi (m) i ,e i ) For the association score, coh (e i ,e j ) Is a correlation score between two of the candidate entities.
6. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the knowledge-graph based entity linking method of any one of claims 1 to 4.
7. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the knowledge-graph based entity linking method of any one of claims 1 to 4.
CN201910992304.3A 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium Active CN110929038B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910992304.3A CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium
PCT/CN2020/111240 WO2021073254A1 (en) 2019-10-18 2020-08-26 Knowledge graph-based entity linking method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910992304.3A CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110929038A CN110929038A (en) 2020-03-27
CN110929038B true CN110929038B (en) 2023-07-21

Family

ID=69849193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910992304.3A Active CN110929038B (en) 2019-10-18 2019-10-18 Knowledge graph-based entity linking method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN110929038B (en)
WO (1) WO2021073254A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929038B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Knowledge graph-based entity linking method, device, equipment and storage medium
CN111858903A (en) * 2020-06-11 2020-10-30 创新工场(北京)企业管理股份有限公司 Method and device for negative news early warning
CN111814477B (en) * 2020-07-06 2022-06-21 重庆邮电大学 Dispute focus discovery method and device based on dispute focus entity and terminal
CN112231575B (en) * 2020-10-30 2022-05-10 衢州量智科技有限公司 Knowledge recommendation method and system for complex electromechanical product design process
CN113220835B (en) * 2021-05-08 2023-09-29 北京百度网讯科技有限公司 Text information processing method, device, electronic equipment and storage medium
CN113326697A (en) * 2021-05-31 2021-08-31 云南电网有限责任公司电力科学研究院 Knowledge graph-based electric power text entity semantic understanding method
CN113360605B (en) * 2021-06-23 2024-02-23 中国科学技术大学 Global entity linking method based on topic entity context iterative optimization
CN115599903A (en) * 2021-07-07 2023-01-13 腾讯科技(深圳)有限公司(Cn) Object tag obtaining method and device, electronic equipment and storage medium
CN114741627B (en) * 2022-04-12 2023-03-24 中国人民解放军32802部队 Internet-oriented auxiliary information searching method
CN115269879B (en) * 2022-09-05 2023-05-05 北京百度网讯科技有限公司 Knowledge structure data generation method, data search method and risk warning method
CN115809311A (en) * 2022-12-22 2023-03-17 企查查科技有限公司 Data processing method and device of knowledge graph and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224648A (en) * 2015-09-29 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of entity link method and system
CN109255031A (en) * 2018-09-20 2019-01-22 苏州友教习亦教育科技有限公司 The data processing method of knowledge based map
CN109635114A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for handling information

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050198026A1 (en) * 2004-02-03 2005-09-08 Dehlinger Peter J. Code, system, and method for generating concepts
CN103488724B (en) * 2013-09-16 2016-09-28 复旦大学 A kind of reading domain knowledge map construction method towards books
CN106844413B (en) * 2016-11-11 2020-12-08 南京柯基数据科技有限公司 Method and device for extracting entity relationship
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN110929038B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Knowledge graph-based entity linking method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224648A (en) * 2015-09-29 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of entity link method and system
CN109255031A (en) * 2018-09-20 2019-01-22 苏州友教习亦教育科技有限公司 The data processing method of knowledge based map
CN109635114A (en) * 2018-12-17 2019-04-16 北京百度网讯科技有限公司 Method and apparatus for handling information

Also Published As

Publication number Publication date
CN110929038A (en) 2020-03-27
WO2021073254A1 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
CN110929038B (en) Knowledge graph-based entity linking method, device, equipment and storage medium
US10740678B2 (en) Concept hierarchies
CN105279252B (en) Excavate method, searching method, the search system of related term
CN106874441B (en) Intelligent question-answering method and device
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN101814067B (en) System and methods for quantitative assessment of information in natural language contents
US20170132288A1 (en) Extracting and Denoising Concept Mentions Using Distributed Representations of Concepts
Sunilkumar et al. A survey on semantic similarity
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
US20200372025A1 (en) Answer selection using a compare-aggregate model with language model and condensed similarity information from latent clustering
US20070136336A1 (en) Method and apparatus for constructing a compact similarity structure and for using the same in analyzing document relevance
CN109325201A (en) Generation method, device, equipment and the storage medium of entity relationship data
CN111190997A (en) Question-answering system implementation method using neural network and machine learning sequencing algorithm
CN112559684A (en) Keyword extraction and information retrieval method
CN111753167B (en) Search processing method, device, computer equipment and medium
US11461613B2 (en) Method and apparatus for multi-document question answering
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN110909539A (en) Word generation method, system, computer device and storage medium of corpus
KR102059743B1 (en) Method and system for providing biomedical passage retrieval using deep-learning based knowledge structure construction
AU2018226420B2 (en) Voice assisted intelligent searching in mobile documents
CN110969005B (en) Method and device for determining similarity between entity corpora
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN112883182A (en) Question-answer matching method and device based on machine reading
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant