CN110795926B - Judgment document similarity judgment method and system based on legal knowledge graph - Google Patents

Judgment document similarity judgment method and system based on legal knowledge graph Download PDF

Info

Publication number
CN110795926B
CN110795926B CN202010004494.6A CN202010004494A CN110795926B CN 110795926 B CN110795926 B CN 110795926B CN 202010004494 A CN202010004494 A CN 202010004494A CN 110795926 B CN110795926 B CN 110795926B
Authority
CN
China
Prior art keywords
knowledge
legal knowledge
referee
vector
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010004494.6A
Other languages
Chinese (zh)
Other versions
CN110795926A (en
Inventor
翁洋
王竹
李鑫
其他发明人请求不公开姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Xingyun Law Technology Co Ltd
Sichuan University
Original Assignee
Chengdu Xingyun Law Technology Co Ltd
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Xingyun Law Technology Co Ltd, Sichuan University filed Critical Chengdu Xingyun Law Technology Co Ltd
Priority to CN202010004494.6A priority Critical patent/CN110795926B/en
Publication of CN110795926A publication Critical patent/CN110795926A/en
Application granted granted Critical
Publication of CN110795926B publication Critical patent/CN110795926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention discloses a judge document similarity judgment method and a judge document similarity judgment system based on a legal knowledge base, wherein the method comprises the following steps: acquiring a referee document A and a referee document B; respectively constructing a legal knowledge base A 'corresponding to the case of the referee document A and a legal knowledge base B' corresponding to the case of the referee document B; respectively converting the legal knowledge map A 'into a vector A', and converting the legal knowledge map B 'into a vector B'; comparing the vector A 'with the vector B' to obtain a similarity judgment result between the referee document A and the referee document B; the method and the system convert the original legal knowledge map from a graph structure which is difficult to calculate into a mathematical vector through a knowledge embedding algorithm, support pairwise comparison of referee documents based on semantic similarity, and facilitate more accurate and efficient judgment of similarity between the referee documents.

Description

Judgment document similarity judgment method and system based on legal knowledge graph
Technical Field
The invention relates to the field of natural language processing, in particular to a judging document similarity judging method and system based on a legal knowledge graph.
Background
In recent years, knowledge-graph applications have been a hallmark of the big data era. The knowledge graph is essentially a semantic network, and is a data structure based on a graph, and consists of nodes (points) and edges (edges). Each node of the knowledge-graph represents an "entity" present in the real world, and each edge is an "relationship" between entities. Knowledge-graphs are the most efficient way to represent relationships. The key point of the knowledge graph is that a series of large-order structured data or unstructured data are collected, the data are analyzed and modeled based on domain professional knowledge, a rule, which is usually the rule of the domain, is found out through machine calculation, and finally the machine can recognize the rule and learn to generate a calculation rule of related data.
At present, methods for constructing a knowledge graph are various, and are generally constructed in the modes of crawler crawling, log searching (query log) or multi-class collaborative mode learning based on Bootstrapping and the like, and the methods are mainly applied to optimizing the existing search engine at present. All data around the world, up to 80% are unstructured data and most prior art techniques are unable to identify and analyze such data.
In the legal field, the referee documents are unstructured data, and the similarity between the referee documents cannot be judged directly through legal knowledge maps corresponding to the referee documents in the prior art.
Disclosure of Invention
The method and the system can convert the original legal knowledge map from a graph structure which is difficult to calculate into a mathematical vector to express through a knowledge embedding method, support pairwise comparison between referee documents based on semantic similarity, and facilitate more accurate and efficient judgment of the similarity between the referee documents.
In order to achieve the above object, the present invention provides a method for judging similarity of official documents based on legal knowledge base, comprising:
step 1: acquiring a referee document A and a referee document B;
step 2: respectively constructing a legal knowledge base A 'corresponding to the case of the referee document A and a legal knowledge base B' corresponding to the case of the referee document B;
and step 3: respectively converting the legal knowledge map A 'into a vector A', converting the legal knowledge map B 'into a vector B',
and 4, step 4: calculating cosine similarity of the vector A '' and the vector B '' to obtain a similarity judgment result of the referee document A and the referee document B.
Wherein, on the whole, the technical scheme of the invention is as follows: firstly, a legal knowledge graph based on case facts is constructed, then triples (entity 1, relation and entity 2) represented by the legal knowledge graph are expressed by vectors by adopting a knowledge embedding method (TransE, TransH, TransR, TransG, DistMult, HolE and the like), vector representation of the legal knowledge graph based on the case facts of each referee document is obtained through calculation, pairwise similarity comparison of the referee documents is supported, and the problem of comparison between the referee documents based on semantic similarity is solved.
In the invention, the case of the referee document can be constructed in various ways by a corresponding legal knowledge graph, the methods for constructing the knowledge graph are various at present, and the case is usually constructed in the ways of crawler crawling, log searching (query log) or multi-class collaborative mode learning based on Bootstrapping and the like, and can also be constructed in the way of the published patent document CN 201710339258.8.
Preferably, the step 3 uses a knowledge embedding algorithm to vector the triples represented by the legal knowledge graph.
Preferably, the knowledge embedding algorithm includes, but is not limited to: the TransE algorithm, the TransH algorithm, the TransR algorithm, the TransG algorithm, the DistMult algorithm and the Hole algorithm.
Preferably, the step 3 further includes training the knowledge embedding algorithm by using a training set, verifying the knowledge embedding algorithm by using a verification set, and testing the knowledge embedding algorithm by using a test set.
Preferably, the legal knowledge graph comprises a plurality of sets of triples (h, r, t), h representing head entities, r representing relationships, and t representing tail entities.
Preferably, the step 2 further comprises: collecting sample data, wherein the sample data comprises a plurality of referee documents, generalizing the triplets of legal knowledge maps corresponding to all the referee documents of the sample data, and defining the character roles of the cases as an original report, an announced report and a third party.
Preferably, counting the number of all entities and the number of relations in the sample data, and listing the counted number into a newly-built entity dictionary and a newly-built relation dictionary, wherein the id number takes 0 as the beginning, and the number of the entities as the end; the formats of the entity dictionary and the relation dictionary are two columns, the entities or the relations are stored in a one-to-one correspondence mode through id numbers, and a first data set and a second data set are respectively established;
replacing the triples (h, r and t) of the legal knowledge graph with the numbers of the first data set and the second data set one by one, and storing the triples (h, r and t) in a (h, t and r) structure to a third data set;
dividing the third data set into a training set, a verification set and a test set;
training the knowledge embedding algorithm by using a training set, verifying the knowledge embedding algorithm by using a verification set, and testing the knowledge embedding algorithm by using a test set.
Preferably, based on a knowledge embedding algorithm passing the test, by setting relevant dimension parameters, a vector representation of each entity and relationship of the sample data is obtained.
Preferably, the entity and the relation of each triple of the legal knowledge base of each referee document are replaced by corresponding vectors by adopting a tested knowledge embedding algorithm, and the vectors are averaged to obtain the multidimensional vector representation of the legal knowledge base of the referee document.
On the other hand, the invention also provides a judge document similarity judging system based on the legal knowledge base, which comprises:
a referee document obtaining unit for obtaining a referee document A and a referee document B;
the legal knowledge map construction unit is used for respectively constructing a legal knowledge map A 'corresponding to the case of the referee document A and a legal knowledge map B' corresponding to the case of the referee document B;
a vector conversion unit for converting the legal knowledge base A 'into a vector A', and the legal knowledge base B 'into a vector B',
and the judge document similarity judgment unit is used for calculating the cosine similarity of the vector A 'and the vector B' to obtain the similarity judgment result of the judge document A and the judge document B.
One or more technical schemes provided by the invention at least have the following technical effects or advantages:
the knowledge representation of the legal knowledge graph based on case facts realizes the graph representation of case facts, is different from the traditional keyword retrieval, completely presents the case of each case based on the knowledge representation of the legal knowledge graph based on the case facts, and embodies the relationship among entities, and comprises both the fact relationship (for example, a driving vehicle) and the legal relationship (for example, a and b form a loan relationship). Through complete presentation of cases, the defects of current keyword matching (such as incapability of distinguishing negative description and describing complete cases) are avoided when the similarity of a large number of cases is compared, and case matching in a substantial sense is realized; the method can support pairwise comparison between referee documents based on semantic similarity, and is convenient for judging the similarity between the referee documents more accurately and efficiently.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;
FIG. 1 is a schematic flow chart of a judging method for similarity of official documents based on legal knowledge base;
FIG. 2 is a schematic diagram of a judging system for judging similarity of official documents based on legal knowledge base;
FIG. 3 is a schematic flow chart of the method of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflicting with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
The legal knowledge maps of various case bases on case facts are firstly constructed, the legal knowledge maps of the constructed specific case bases (road traffic, divorce, labor disputes and the like) are converted into vector representation, so that the internal structures of the internal entities and relations of the knowledge maps are represented by mathematical vectors, and the semantic similarity-based pairwise comparison between referee documents is supported.
Referring to fig. 1, an embodiment of the present invention provides a method for judging similarity of referee documents based on legal knowledge base, including:
step 1: acquiring a referee document A and a referee document B;
step 2: respectively constructing a legal knowledge base A 'corresponding to the case of the referee document A and a legal knowledge base B' corresponding to the case of the referee document B;
and step 3: respectively converting the legal knowledge map A 'into a vector A', converting the legal knowledge map B 'into a vector B',
and 4, step 4: and comparing the vector A 'with the vector B' to obtain a similarity judgment result of the referee document A and the referee document B.
Referring to fig. 2, an embodiment of the present invention provides a system for judging similarity of official documents based on legal knowledge base, the system comprising:
a referee document obtaining unit for obtaining a referee document A and a referee document B;
the legal knowledge map construction unit is used for respectively constructing a legal knowledge map A 'corresponding to the case of the referee document A and a legal knowledge map B' corresponding to the case of the referee document B;
a vector conversion unit for converting the legal knowledge base A 'into a vector A', and the legal knowledge base B 'into a vector B',
and the judge document similarity judgment unit is used for comparing the vector A 'with the vector B' to obtain a similarity judgment result of the judge document A and the judge document B.
Referring to fig. 3, the method includes the following steps:
1. the knowledge graph is represented by a plurality of triples (h, r and t), namely a plurality of groups (h, r and t) are used for representing the knowledge graph of a case;
2. the method generalizes all triples and defines the character roles as an original notice, a defended notice and a third party;
3. and counting the number of all entities and the number of relations, and adding the entities and the relation into the newly-built entity dictionary and the newly-built relation dictionary, wherein the id number takes 0 as the beginning and takes the number of the entities as the end. The dictionary formats are two columns, are stored in a mode that id numbers correspond to entities or relationship names one by one, and respectively establish data sets of entry 2id and relation2 id;
4. and replacing the triples (h, r and t) of the case knowledge graph with the numbers in the entity2id and the relation2id one by one, and storing the triples in a (h, t and r) structure. For example, the way the triplet is finally replaced by (2, 1, 3), 2 represents the second digit of the entry 2id dictionary, 1 represents the first digit of the entry 2id dictionary, and 3 represents the third digit of the relation2id dictionary, and the triplet is saved to the new dataset;
5. dividing the new data set into a training set train2id, a verification set valid2id and a test set test2id according to the ratio of 6:2: 2;
6. training by using the data of train2id, verifying the data of valid2id and testing the data of test2id by using a type knowledge embedding algorithm such as TransE, TransH, TransR, TransG, DistMult, HolE and the like;
7. according to the knowledge embedding algorithm, vector representation about each entity and the relation is finally obtained by setting relevant dimension parameters (for example, 100 dimensions and 200 dimensions), for example, the vector representation is set to be 100 dimensions, and each entity is represented by a 100-dimensional vector;
8. replacing the entity and the relation of each triple of the legal knowledge base of each referee document with a trained vector and averaging, for example, if the vector dimension obtained by setting the entity and the relation by the previous algorithm is 100 dimensions, the vector dimension formed by (h, r and t) is 300 dimensions, and finally the legal knowledge base of each referee document is represented by a 300-dimensional vector;
9. the method converts each referee document into a vector, thereby supporting similarity comparison between every two referee documents based on knowledge map vector representation.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (7)

1. A judge document similarity judgment method based on a legal knowledge base is characterized by comprising the following steps:
step 1: acquiring a referee document A and a referee document B;
step 2: respectively constructing a legal knowledge base A 'corresponding to the case of the referee document A and a legal knowledge base B' corresponding to the case of the referee document B;
and step 3: respectively converting the legal knowledge map A 'into a vector A', converting the legal knowledge map B 'into a vector B',
and 4, step 4: comparing the vector A 'with the vector B' to obtain a similarity judgment result of the referee document A and the referee document B;
the step 2 further comprises: collecting sample data, wherein the sample data comprises a plurality of referee documents, generalizing triples in legal knowledge maps corresponding to all the referee documents in the sample data, and defining character roles of cases as original reports, announcements and third parties;
counting the number of all entities and the number of relations in the sample data, and listing the counted number into a newly-built entity dictionary and a newly-built relation dictionary, wherein the id number takes 0 as the beginning, and the actual number as the end; the formats of the entity dictionary and the relation dictionary are two columns, the entities or the relations are stored in a one-to-one correspondence mode through id numbers, and a first data set and a second data set are respectively established;
replacing triples (h, r and t) in the legal knowledge graph with numbers in the first data set and the second data set one by one, and storing the triples (h, r and t) in a (h, t and r) structure to a third data set;
dividing the third data set into a training set, a verification set and a test set;
training the knowledge embedding algorithm by using a training set, verifying the knowledge embedding algorithm by using a verification set, and testing the knowledge embedding algorithm by using a test set.
2. The method for judging the similarity of official documents based on legal knowledge domains as claimed in claim 1, wherein said step 3 uses knowledge embedding algorithm to vector the triplets represented by legal knowledge domains.
3. The method for judging the similarity of official documents based on legal knowledge domain as claimed in claim 2, wherein the knowledge embedding algorithm comprises: the TransE algorithm, the TransH algorithm, the TransR algorithm, the TransG algorithm, the DistMult algorithm and the Hole algorithm.
4. The method as claimed in claim 1, wherein the step 3 further comprises training the knowledge embedding algorithm with a training set, verifying the knowledge embedding algorithm with a validation set, and testing the knowledge embedding algorithm with a test set.
5. The method as claimed in claim 4, wherein the legal knowledge base comprises a plurality of triplets (h, r, t), h represents the head entity, r represents the relationship, and t represents the tail entity.
6. The method for judging the similarity of referee documents based on legal knowledge domains as claimed in claim 1, wherein the vector representation of each entity and relationship in the sample data is obtained by setting relevant dimension parameters based on a tested knowledge embedding algorithm.
7. The method for judging the similarity of referee documents based on legal knowledge domain as claimed in claim 6, wherein the entities and relations in each triple in the legal knowledge domain of each referee document are converted into corresponding vectors by a tested knowledge embedding algorithm and are averaged to obtain the multidimensional vector representation of the legal knowledge domain of the referee document.
CN202010004494.6A 2020-01-03 2020-01-03 Judgment document similarity judgment method and system based on legal knowledge graph Active CN110795926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010004494.6A CN110795926B (en) 2020-01-03 2020-01-03 Judgment document similarity judgment method and system based on legal knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010004494.6A CN110795926B (en) 2020-01-03 2020-01-03 Judgment document similarity judgment method and system based on legal knowledge graph

Publications (2)

Publication Number Publication Date
CN110795926A CN110795926A (en) 2020-02-14
CN110795926B true CN110795926B (en) 2020-04-07

Family

ID=69448489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010004494.6A Active CN110795926B (en) 2020-01-03 2020-01-03 Judgment document similarity judgment method and system based on legal knowledge graph

Country Status (1)

Country Link
CN (1) CN110795926B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459973B (en) * 2020-06-16 2020-10-23 四川大学 Case type retrieval method and system based on case situation triple information
CN111858940B (en) * 2020-07-27 2023-07-25 湘潭大学 Multi-head attention-based legal case similarity calculation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908671A (en) * 2017-10-25 2018-04-13 南京擎盾信息科技有限公司 Knowledge mapping construction method and system based on law data
CN108073673A (en) * 2017-05-15 2018-05-25 北京华宇元典信息服务有限公司 A kind of legal knowledge map construction method, apparatus, system and medium based on machine learning
CN108733798A (en) * 2018-05-17 2018-11-02 电子科技大学 A kind of personalized recommendation method of knowledge based collection of illustrative plates
CN110147450A (en) * 2019-05-06 2019-08-20 北京科技大学 A kind of the knowledge complementing method and device of knowledge mapping
CN110489751A (en) * 2019-08-13 2019-11-22 腾讯科技(深圳)有限公司 Text similarity computing method and device, storage medium, electronic equipment
CN110598006A (en) * 2019-09-17 2019-12-20 南京医渡云医学技术有限公司 Model training method, triplet embedding method, apparatus, medium, and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073673A (en) * 2017-05-15 2018-05-25 北京华宇元典信息服务有限公司 A kind of legal knowledge map construction method, apparatus, system and medium based on machine learning
CN107908671A (en) * 2017-10-25 2018-04-13 南京擎盾信息科技有限公司 Knowledge mapping construction method and system based on law data
CN108733798A (en) * 2018-05-17 2018-11-02 电子科技大学 A kind of personalized recommendation method of knowledge based collection of illustrative plates
CN110147450A (en) * 2019-05-06 2019-08-20 北京科技大学 A kind of the knowledge complementing method and device of knowledge mapping
CN110489751A (en) * 2019-08-13 2019-11-22 腾讯科技(深圳)有限公司 Text similarity computing method and device, storage medium, electronic equipment
CN110598006A (en) * 2019-09-17 2019-12-20 南京医渡云医学技术有限公司 Model training method, triplet embedding method, apparatus, medium, and device

Also Published As

Publication number Publication date
CN110795926A (en) 2020-02-14

Similar Documents

Publication Publication Date Title
Mustière et al. Matching networks with different levels of detail
CN106407208B (en) A kind of construction method and system of city management ontology knowledge base
CN110795926B (en) Judgment document similarity judgment method and system based on legal knowledge graph
CN109977291B (en) Retrieval method, device and equipment based on physical knowledge graph and storage medium
CN109408578B (en) Monitoring data fusion method for heterogeneous environment
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN114913729B (en) Question selecting method, device, computer equipment and storage medium
CN113468300B (en) Intelligent message processing system and method based on WeChat interaction
CN115203337A (en) Database metadata relation knowledge graph generation method
Motro et al. Estimating the Quality of Data in Relational Databases.
CN111339258B (en) University computer basic exercise recommendation method based on knowledge graph
CN117313683A (en) Metadata processing method, device, server and storage medium
CN109543712B (en) Method for identifying entities on temporal data set
CN112445976A (en) City address positioning method based on congestion index map
CN117076590A (en) Address standardization method, address standardization device, computer equipment and readable storage medium
CN107480130B (en) Method for judging attribute value identity of relational data based on WEB information
Li A Data Mining-Based Method for Quality Assessment of Ideological and Political Education in Universities
CN115982329A (en) Intelligent generation method and system for engineering construction scheme compilation basis
CN115827885A (en) Operation and maintenance knowledge graph construction method and device and electronic equipment
Nath et al. Resolving scalability issue to ontology instance matching in semantic web
CN109614456B (en) Deep learning-based geographic information positioning and partitioning method and device
CN113221540A (en) Knowledge point identification method, device, equipment and storage medium
CN108733848A (en) A kind of method and system of search knowledge
CN114510649B (en) Social network and LSTM model accuracy calculating method based on deduplication sample
Berjawi et al. Pabench: Designing a taxonomy and implementing a benchmark for spatial entity matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant