CN111858940A

CN111858940A - Multi-head attention-based legal case similarity calculation method and system

Info

Publication number: CN111858940A
Application number: CN202010733019.2A
Authority: CN
Inventors: 程戈; 张冬良; 肖冬梅
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-10-30
Anticipated expiration: 2040-07-27
Also published as: CN111858940B

Abstract

A legal case similarity calculation method based on multi-head attention comprises the following steps: 1) inputting case description of two cases with similarity to be calculated into a case legal relationship recognition model, respectively extracting triples formed by legal relationships in the two cases, wherein the triples comprise a head entity, a relationship and a tail entity, and constructing a case knowledge graph according to the triples; 2) converting the case knowledge graph into vectorization representation through a multi-head attention mechanism; 3) and inputting the vectorization representation of the case into a deep case perception model to obtain the similarity of the two cases. According to the technical scheme, the accuracy of application scenes such as class retrieval and class recommendation in the law can be improved by calculating the similarity.

Description

Multi-head attention-based legal case similarity calculation method and system

Technical Field

The invention relates to a legal case similarity calculation method based on multi-head attention, in particular to a legal case similarity calculation method based on multi-head attention, and belongs to the technical field of text recognition; the invention also relates to a legal case similarity calculation system based on multi-head attention.

Background

As the country of the patent statutes, the judge of the court is according to the law provisions rather than precedent, but the civilian treatment emotion of the court requires 'class case simultaneous judgment', and particularly the 'class case different judgment' of the same court appears in a short time has great influence on the credibility of the judicial justice, so that the similarity calculation of the research case to accurately match the similar case has important significance for realizing the justice. In practical applications, judicial workers usually use full-text search, keyword matching and keyword restriction search to screen cases, and then still rely on manual judgment to determine whether the screened cases are similar to target cases, so that the time consumption is long and the efficiency is low.

The method for calculating case similarity in scientific research field can be mainly divided into three categories: the first kind is based on word vector method, for example, using nouns in legal text to form vector space, and calculating legal text similarity through TF-IDF; or forming a vector space by using legal professional vocabularies, and calculating the similarity of the legal text by a cosine similarity formula. The second category, based on the word embedding model, utilizes a hierarchical attention mechanism to improve document representation in a twin network structure and compresses document content to avoid data sparsity problems for long documents. Or the problems of text imbalance and news text redundancy are solved by using a news and case correlation calculation method of the asymmetric twin network. And thirdly, extracting medical events from the medical dispute cases according to a predefined event template based on the event template, and calculating the similarity of the cases according to the similarity of event tuples.

The word vector method and the word embedding model do not relate to knowledge in the legal field, and do not consider the characteristics of legal texts, so that the similarity calculated based on the two methods is not accurate enough. The knowledge in the legal field is considered based on the event template, but in the actual cases of the same case, the events are extremely similar, and the events are borrowing at any time and place, debt at any time and place and the like by taking folk lending as an example. Therefore, the events are extracted according to the predefined event mode, the cases can be represented only in a coarse granularity, key information influencing case similarity calculation is not contained, and the matching accuracy is not high.

Therefore, a technical problem to be solved by those skilled in the art is urgent, how to provide a method for calculating similarity of legal cases based on multi-head attention, which can improve accuracy of application scenarios such as category retrieval and category recommendation in the law by calculating similarity.

Disclosure of Invention

Aiming at the defects of the prior art, the case knowledge graph is formed by extracting legal relations in case description, the vectorization expression of the case knowledge graph is obtained by using a multi-head attention mechanism, and finally the similarity of two cases is calculated by using a deep case perception network. The training deep case perception network assisted by the structured legal relationship can greatly improve the accuracy of case similarity calculation. The invention provides a legal case similarity calculation method based on multi-head attention, which comprises the following steps: 1) inputting case description of two cases with similarity to be calculated into a case legal relationship recognition model, respectively extracting triples formed by legal relationships in the two cases, wherein the triples comprise a head entity, a relationship and a tail entity, and constructing a case knowledge graph according to the triples; 2) converting the case knowledge graph into vectorization representation through a multi-head attention mechanism; 3) and inputting the vectorization representation of the case into a deep case perception model to obtain the similarity of the two cases.

According to a first embodiment of the present invention, there is provided a method for calculating similarity of legal cases based on multi-head attention, comprising:

a legal case similarity calculation method based on multi-head attention comprises the following steps:

1) inputting case description of two cases with similarity to be calculated into a case legal relationship recognition model, respectively extracting triples formed by legal relationships in the two cases, wherein the triples comprise a head entity, a relationship and a tail entity, and constructing a case knowledge graph according to the triples;

2) converting the case knowledge graph into vectorization representation through a multi-head attention mechanism;

3) and inputting the vectorization representation of the case into a deep case perception model to obtain the similarity of the two cases.

Preferably, the case-legal-relationship recognition model in step 1) includes: an entity identification module for identifying the entity in the case description;

and the relationship extraction module is used for identifying the relationship between the entities in the case description.

Preferably, when the case description of the structured text and the class structured text is identified, the entity identification module identifies an entity by adopting a rule-based entity identification mode.

Preferably, the construction method of case legal relationship identification model in step 1) comprises the following steps:

1a) determining entities and relations having influence or large influence on calculating case similarity, and constructing a case legal relation recognition model training data set;

1b) constructing the entity identification module, wherein the entity identification module is constructed by adopting an entity identification network based on the BilSTM-CRF;

1c) and constructing the relation extraction module, wherein the relation extraction module is constructed by adopting a BERT-based relation extraction network.

Preferably, the step 2) specifically comprises the following steps:

2a) initializing vectorized representations of head, relationship and tail entities of the triples;

2b) combining the entities of the triples and the vectorized representation of the relationships, and calculating the eigenvectors of the triples using a projection matrix;

2c) calculating the attention value of the triplet using leakyreu;

2d) calculating relative attention values of triples with the same head entity by using a softmax function;

2e) calculating a new vectorized representation of the entity from the relative attention values of the triples and the feature vectors;

2f) a multi-head attention mechanism is used to encapsulate complex legal relationships and update the vectorized representation of the entity again.

Preferably, the deep case perception model in the step 3) is a trained model, and the method for constructing the deep case perception model specifically comprises the following steps:

3a) acquiring a certain amount of similar case pairs based on a certain case, wherein the similar case pairs are positive cases, and randomly replacing one case in part of the similar case pairs to form a negative case so as to construct a similar case data set;

3b) inputting the case pairs in the similar case data set into the case legal relationship recognition model respectively, constructing case knowledge graphs respectively, and obtaining vectorization representation of the case knowledge graph of each case respectively by using the multi-head attention mechanism;

3c) and respectively connecting the entities of the similar case pairs and the vectorized representation of the relationship, constructing a deep case perception network, and training the network by using the vectorized representation of the case to obtain the deep case perception model.

Preferably, the case description includes case descriptions existing in legal documents such as a prosecution, a decision document, and the like, and case descriptions in informal legal texts in which case information is described.

According to a second embodiment of the present invention, there is provided a multi-attention-based legal case similarity calculation system:

a multi-head attention-based legal case similarity calculation system, the system comprising:

the case situation knowledge graph building module is used for inputting case situation descriptions of two cases with similarity to be calculated into the case legal relationship recognition model, further extracting triples formed by legal relationships in the two cases respectively, and building a case situation knowledge graph according to the triples, wherein the triples comprise a head entity, a relationship and a tail entity;

the case knowledge graph representation module is in signal connection with the case knowledge graph construction module and is used for converting the case knowledge graph into vectorization representation through a multi-head attention mechanism;

and the deep case perception network module is in signal connection with the case knowledge map representation module and is used for inputting the vectorization representation into the deep case perception model to obtain the similarity of the two cases.

Preferably, the case-legal-relationship identification model includes: an entity identification module for identifying the entity in the case description; a relation extraction module in signal connection with the entity identification module and used for identifying the relation between the entities; the relation extraction module extracts the relation between the two entities from the sentence by finding the sentence containing the two entities, wherein the two entities comprise a head entity and a tail entity.

Preferably, when the case description is structured text or similar structured text, the entity identification module adopts a rule-based entity identification method.

In the first embodiment of the application, firstly, triples formed by legal relationships, namely head entities, relationships and tail entities, are extracted from two cases with similarity to be calculated through a case legal relationship recognition model, and then a case situation knowledge graph is constructed by utilizing the triples; and converting the case knowledge graph into vectorization representation by using a multi-attention mechanism so as to facilitate later calculation. And finally, calculating the similarity of the two cases by deep emotion perception models. According to the technical scheme, after the triple information is screened out through a technical means, the accuracy of calculating the similarity between the cases is improved by using a vectorization expression method, and compared with the prior art, the calculation result is more accurate and reliable.

In a first embodiment of the present application, the case-law relationship recognition model extracts entities in the case, including head entities and tail entities, through an entity recognition module. The case law relation recognition model extracts the relation between two entities of the same sentence through a relation extraction module.

It should be noted that the entity identification module identifies Entities, which are collectively called Named entity identification, and Named Entities in the Named entity identification (NER) study are generally classified into 3 major categories (entity category, time category, and number category) and 7 minor categories (person name, place name, organization name, time, date, currency, and percentage). There are three main ways: 1) rule-based named entity recognition, 2) statistics-based named entity recognition, 3) hybrid approaches. When the case description of the structured text and the class structured text is identified, the entity identification module identifies an entity by adopting a rule-based entity identification mode.

In a first embodiment of the present application, in the case law relationship identification model construction method, an entity and a relationship that have or have a large influence on calculating case similarity are determined, a case law relationship identification model training data set is constructed, and then the case law relationship identification model training data set is imported into the constructed entity identification module and the relationship extraction module, so that the case law relationship identification model is trained to construct the trained case law relationship identification model.

In a first embodiment of the present application, in step 2), vectorized representations of the head, relationship and tail entities of the triplet are initialized, constituting vector parameters. And calculating the feature vector of the triplet by using a projection matrix after the entities of the triplet and the vectorized representation of the relationship are combined so as to facilitate the subsequent indiscriminate calculation. And then calculating the relative attention value of the triple with the same head entity by using a LeakyReLU function and a softmax function, and calculating a new vectorization representation of the entity by using the relative attention value of the triple and the characteristic vector. And finally, packaging the complex legal relationship by using a multi-head attention mechanism and updating the vectorization representation of the entity again to obtain the optimized vectorization representation of the entity.

In the first implementation scheme of the application, a deep case perception model needs to be trained firstly, a certain amount of similar case pairs based on a certain case are obtained, the similar case pairs are positive cases, one case in part of the similar case pairs is replaced randomly to form a negative case, and a similar case data set is constructed; and then importing the similar case data sets into a case legal relationship identification model, identifying the entities in the similar case data sets and extracting the relationship among the entities. And finally, respectively connecting the entities of the cases in the similar case data set and the vectorization representation of the relation to obtain the vectorization representation of the cases in the similar case data set, and training the network by using the vectorization representation of the cases in the similar case data set to obtain the deep case perception model. Therefore, the accuracy of the deep case perception model is improved.

In a second embodiment of the present application, a multi-attention-based legal case similarity calculation system running a software program including the method of the first embodiment comprises: the system comprises a case knowledge graph construction module, a case knowledge graph representation module in signal connection with the case knowledge graph construction module, and a deep case perception network module in signal connection with the case knowledge graph representation module. The similarity of the legal cases is calculated by the cooperative application of the case knowledge graph construction module, the case knowledge graph representation module and the deep case perception network module. Compared with the prior art, the accuracy of application scenes such as class retrieval and class recommendation in law is improved.

In a second embodiment of the present application, the case-law relationship identification model identifies entities in the case description through an entity identification module, and identifies relationships between the entities through a relationship extraction module; specifically, the relationship extraction module extracts the relationship between the two entities from a sentence by finding the sentence containing the two entities, where the two entities include a head entity and a tail entity.

Compared with the prior art, the invention has the following beneficial effects:

1. and forming a case knowledge graph by extracting legal relations in case description, acquiring vectorization representation of the case knowledge graph by using a multi-head attention mechanism, and finally calculating the similarity of the two cases by using a deep case perception network. The training deep case perception network assisted by the structured legal relationship can greatly improve the accuracy of case similarity calculation.

Drawings

Fig. 1 is a schematic flowchart of a method for calculating similarity of legal cases based on multi-head attention according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a case knowledge graph according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a legal case similarity calculation system based on multi-head attention according to an embodiment of the present invention.

Detailed Description

Preferably, the step 2) specifically comprises the following steps:

2c) calculating the attention value of the triplet using leakyreu;

and the deep case perception network module is in signal connection with the case knowledge map representation module and is used for inputting the vectorization representation into the deep case perception model and obtaining the similarity of the two cases.

Example 1

Example 2

Example 1 is repeated except that the case-legal-relationship identification model in step 1) includes: an entity identification module for identifying the entity in the case description; and the relationship extraction module is used for identifying the relationship between the entities in the case description. And when the case description of the structured text and the similar structured text is identified, the entity identification module identifies an entity by adopting a rule-based entity identification mode.

The construction method of the case legal relationship identification model in the step 1) comprises the following steps:

in this embodiment, 600 parts of the decision documents are randomly selected for labeling, and the entity and the relationship contained in the case description are labeled by using a BIO labeling method.

It should be noted that, in the present embodiment, a decision document taking folk loan as case basis is obtained, the decision document is a class structured text, and a case body description part and an examined finding part are separated from the decision document through a regular expression as case description. Determining an entity and a relationship which have great influence on case similarity under the case, wherein the entity comprises: original, announcements, natural human subjects, unnatural human subjects, etc., said relationships comprising: role, couple, non-continuous relationship between couples, and guarantor.

the embodiment first uses the pre-trained language model BERT to set (x) as the sentence x in the case description₁,x₂,,x_n) Each word of (a) is mapped into a low-dimensional dense word vector, wherein n is the number of contained words in the sentence; then, the sentence characteristics are extracted by using a bidirectional long-short term memory network, and a hidden state sequence H ═ H (H) is obtained_[CLS],h₁,h₂,…,h_n)∈R^n×mThe last layer of the bi-directional long-short term memory network maps the state vector from m-dimension to k-dimension, resulting in P ═ P (P)₁,p₂,…,p_n)∈R^n×kWherein k is the kind of entity; finally, conditional random fields are used for sentence-level sequence labeling, and for the sentence x, the corresponding tag sequence is y ═ y (y)₁,y₂,…,y_n) Defining the scoring function as:

wherein A is a transition score matrix between output labels, A_ijThe transition score from the ith label to the jth label is shown.

The probability of the tag sequence y is then predicted using the softmax function:

also to maximize the prediction score, take the logarithm of the predicted output tag sequence:

in the prediction process, the optimal predicted tag sequence is obtained by using a Viterbi algorithm:

In this embodiment, the embedding layer and the feature extraction layer of the BERT-based relationship extraction network and the BiLSTM-CRF-based entity identification network are the same, and two entities e are obtained₁And e₂The hidden state sequence of sentences of (H) ═ H_[CLS],h₁,h₂,…,h_n) And vector representations v of two entities₁And v₂Using only h_[CLS]As a feature of the sentence. The sentence after the entity is merged is characterized as follows:

the relation calculation formula contained in the sentence is as follows:

P(y|h)＝softmax(h·w₂)

wherein, w₁、w₂、b₁As parameters of neural networks。

The case knowledge graph obtained in step 1 in this example is shown in fig. 2.

Example 3

Example 2 is repeated except that said step 2) specifically comprises the following steps:

in this embodiment, the triplet may be represented as t_ijk＝(e_i,r_k,e_j) The entities and relationships are initialized to d-dimensional vectors using random initialization, and the vectorized triples can be represented as

Wherein e_iAnd e_jRespectively representing a head entity and a tail entity, r_kRepresents the relationship h_i、h_jAnd l_kRespectively, their corresponding vectorized representations.

in this embodiment, the calculation formula of the feature vector of the triplet is:

wherein, W₁In the form of a linear transformation matrix, the transformation matrix,

2c) calculating the attention value of the triplet using leakyreu;

in this embodiment, the formula for calculating the attention value of the triplet is as follows:

wherein, W₂Is a weight matrix; LEAKYRELU is an activation function that is calculated as:

wherein α is a fixed parameter.

in this embodiment, the formula for calculating the relative attention value of the triplet is as follows:

wherein N is_iAs entity e_iSet of neighbour entities of R_ijTo be an entity e_iA set of relationships that are head entities; b_inrIs R_ijOne relationship of (1).

in this embodiment, the new vectorization expression of the triplet has a calculation formula as follows:

where M is the number of attention heads.

In this embodiment, the calculation formula of the multi-head attention mechanism is as follows:

in this embodiment, the objective function for training the multi-head attention mechanism is as follows:

wherein S is the set of triplesS' is a set obtained by randomly replacing a head entity or a tail entity in S, and gamma is a hyper-parameter;

as a function of the score.

Example 4

Repeating the embodiment 3, except that the deep case perception model in the step 3) is a trained model, and the method for constructing the deep case perception model specifically comprises the following steps:

in this embodiment, a judgment document using folk lending as a case basis is obtained, and a similar case data set is constructed through manual labeling.

in this embodiment, the case knowledge graph g of the case c is first obtained_cAnd then obtaining the vectorization expression h of the triples in the case knowledge graph through a multi-head attention mechanism_c∈R^n×mAnd n is the number of the case c triples, and m is the vector dimension of the case triples.

In this embodiment, assume that the case of the input deep case awareness network is c_aAnd c_bBecause the legal relationships extracted from different cases are different in number, in order to make vectorization representation dimensions of two cases the same, padding operation is performed on the cases with fewer triples, and finally a tensor with dimensions of 2 × n × m is formed. This embodiment makesA34-layer residual error network is used for constructing a deep case network, and each residual error block can be expressed as:

y＝W_sx+F(x,W_i)

wherein, W_sAnd W_iFor a linear transformation matrix, x and y represent the input and output vectors of the residual block, F (x, W), respectively_i) Is a residual map.

Example 5

Example 4 is repeated except that the case description includes case descriptions existing in legal documents such as a prosecution, a decision document, and the like and case descriptions in informal legal texts in which case information is described.

Example 6

It should be noted that the parts not described in detail in this specification belong to the prior art.

It should be noted that the above-mentioned preferred embodiments are described in detail, and therefore should not be considered as limiting the scope of the invention, and those skilled in the art will be able to make substitutions and modifications within the scope of the invention without departing from the scope of the invention as defined by the appended claims.

It should be understood that the method and the system for calculating the similarity of legal cases based on multi-head attention disclosed by the invention can be applied to actual scenes such as class recommendation, class retrieval and the like. For class case recommendation, the historical cases browsed by the user are respectively subjected to vectorization representation of the cases by using the steps 1 and 2 disclosed by the invention, the vectorization representations of all the cases are connected, dimension reduction is carried out by using a convolutional neural network, a tensor with the dimension of 1 × n × m is obtained, the similarity is calculated by using the step 3 disclosed by the invention with the cases in the database, and the cases with high similarity are recommended to the user. For case retrieval, case description input by a user is subjected to vectorization representation of cases by using the steps 1 and 2 disclosed by the invention, the similarity is calculated by using the steps 3 disclosed by the invention for the cases in the database respectively, and the cases are displayed to the user according to the sequence of the similarity. In the above application scenarios, no substantial modification or improvement on the method disclosed in the present invention is made, and all of the modifications and improvements are included in the scope of the present invention.

Claims

1. A legal case similarity calculation method based on multi-head attention is characterized by comprising the following steps:

2. The method for calculating the similarity of legal cases based on multi-head attention according to claim 1, wherein the case legal relationship recognition model in step 1) comprises:

an entity identification module for identifying the entity in the case description;

a relationship extraction module for identifying the relationship between the entities in the case description;

3. The method for calculating the similarity of legal cases based on multi-head attention according to claim 2, wherein the method for constructing the case legal relationship recognition model in step 1) comprises the following steps:

4. The method for calculating the similarity of legal cases based on multi-head attention according to claim 3, wherein the step 2) comprises the following steps:

2c) calculating the attention value of the triplet using leakyreu;

5. The method for calculating the similarity of legal cases based on multi-head attention according to claim 4, wherein the deep case awareness model in the step 3) is a trained model, and the method for constructing the deep case awareness model specifically comprises:

6. The method for calculating the similarity of legal cases based on multi-head attention according to any one of claims 1-5, wherein the case description comprises case descriptions existing in legal documents such as prosecution, upper prosecution, and decision document, and case descriptions in informal legal documents in which case information is described.

7. A multi-head attention-based legal case similarity calculation system applying the multi-head attention-based legal case similarity calculation method of any one of claims 1 to 6, the system comprising:

8. The multi-head attention-based legal case similarity calculation system of claim 7, wherein the case legal relationship identification model comprises:

a relation extraction module in signal connection with the entity identification module and used for identifying the relation between the entities;

the relation extraction module extracts the relation between the two entities from the sentence by finding the sentence containing the two entities, wherein the two entities comprise a head entity and a tail entity.

9. The system of claim 8, wherein the entity recognition module employs a rule-based entity recognition method when the case description is structured text and class structured text.

10. The system for calculating legal case similarity based on multi-head attention according to any one of claims 7-9, wherein the case description comprises case descriptions existing in legal documents such as prosecution, upper prosecution, decision document and the like and case descriptions in informal legal texts describing case information.