CN111858940B

CN111858940B - Multi-head attention-based legal case similarity calculation method and system

Info

Publication number: CN111858940B
Application number: CN202010733019.2A
Authority: CN
Inventors: 程戈; 张冬良; 肖冬梅
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2023-07-25
Anticipated expiration: 2040-07-27
Also published as: CN111858940A

Abstract

A multiple-head attention-based legal case similarity calculation method, the method comprising: 1) Inputting case situation descriptions of two cases with similarity to be calculated into a case law relation recognition model, respectively extracting triples formed by law relations in the two cases, wherein each triplet comprises a head entity, a relation and a tail entity, and constructing a case situation knowledge graph according to the triples; 2) Converting the case knowledge graph into a vectorized representation through a multi-head attention mechanism; 3) And inputting the vectorized representation of the case into a deep case perception model to obtain the similarity of the two cases. According to the technical scheme, the accuracy of application scenes such as class search, class recommendation and the like in law can be improved by calculating the similarity.

Description

Multi-head attention-based legal case similarity calculation method and system

Technical Field

The invention relates to a legal case similarity calculation method based on multi-head attention, in particular to a legal case similarity calculation method based on multi-head attention, belonging to the technical field of text recognition; the invention also relates to a legal case similarity calculation system based on multi-head attention.

Background

As a grammatical country, a court judge is in accordance with legal regulations rather than precedents, but a popular naive legal emotion requires "classification and judgment", especially "classification and judgment different from classification and judgment" in a short time in the same court has great influence on judicial public confidence, so that research case similarity calculation to accurately match similar cases has important significance in realizing judicial fairness. In practical applications, judicial workers typically use full-text search, keyword matching, and keyword limited search to screen cases, and then still rely on manual judgment to determine whether the screened cases are similar to the target cases, so that the time is long and the efficiency is low.

The method for calculating the case similarity in the scientific research field can be mainly divided into three types: the first class is based on a word vector method, for example, a vector space is formed by nouns in a legal text, and the similarity of the legal text is calculated through TF-IDF; or using law professional vocabulary to form a vector space, and calculating the similarity of legal text through a cosine similarity formula. The second category is based on word embedding models, and the hierarchical attention mechanism is utilized to improve the document representation in the twin network structure and compress the document content so as to avoid the data sparseness problem of long documents. Or the problems of text unbalance and news text redundancy are solved by using the method for calculating the correlation between news and cases of the asymmetric twin network. And thirdly, extracting medical events from the medical dispute cases according to the predefined event templates based on the event templates, and calculating the similarity of the cases through the similarity of the event tuples.

The word vector method and the word embedding model do not relate to knowledge in the legal field and do not consider characteristics of legal text, so that the similarity calculated based on the two methods is not accurate enough. The legal domain knowledge is considered based on the event templates, but events in the cases of the same case are very similar in practice, and civil borrowing is taken as an example, wherein the events are when and where borrowing, when and where debt is tracked, and the like. Therefore, the events are extracted according to the predefined event mode, only coarse-grained representation cases are available, key information affecting case similarity calculation is not contained, and matching accuracy is not high.

Therefore, how to provide a legal case similarity calculation method based on multi-head attention, which can improve the accuracy of application scenarios such as case search and case recommendation in law through calculating similarity, is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

Aiming at the defects of the prior art, the invention forms a case knowledge graph by extracting legal relation in case description, obtains vectorization representation of the case knowledge graph by using a multi-head attention mechanism, and finally calculates the similarity of two cases by using a deep case perception network. The depth case awareness network aided with training of structured legal relations can greatly improve accuracy of case similarity calculation. The invention provides a legal case similarity calculation method based on multi-head attention, which comprises the following steps: 1) Inputting case situation descriptions of two cases with similarity to be calculated into a case law relation recognition model, respectively extracting triples formed by law relations in the two cases, wherein each triplet comprises a head entity, a relation and a tail entity, and constructing a case situation knowledge graph according to the triples; 2) Converting the case knowledge graph into a vectorized representation through a multi-head attention mechanism; 3) And inputting the vectorized representation of the case into a deep case perception model to obtain the similarity of the two cases.

According to a first embodiment of the present invention, there is provided a multiple-head attention-based legal case similarity calculation method:

a multiple-head attention-based legal case similarity calculation method, the method comprising:

1) Inputting case situation descriptions of two cases with similarity to be calculated into a case law relation recognition model, respectively extracting triples formed by law relations in the two cases, wherein each triplet comprises a head entity, a relation and a tail entity, and constructing a case situation knowledge graph according to the triples;

2) Converting the case knowledge graph into a vectorized representation through a multi-head attention mechanism;

3) And inputting the vectorized representation of the case into a deep case perception model to obtain the similarity of the two cases.

Preferably, the case law relationship identification model in step 1) includes: the entity identification module is used for identifying the entity in the case description;

and the relation extraction module is used for identifying the relation between the entities in the case description.

Preferably, when the case descriptions of the structured text and the similar structured text are identified, the entity identification module identifies the entity by adopting a rule-based entity identification mode.

Preferably, the method for constructing the case law relation recognition model in the step 1) includes the following steps:

1a) Determining entities and relations with influence or larger influence on the similarity of the calculated cases, and constructing a case law relation recognition model training data set;

1b) The entity identification module is constructed, and the entity identification module is constructed by adopting an entity identification network based on BiLSTM-CRF;

1c) And constructing the relation extraction module, wherein the relation extraction module is constructed by adopting a relation extraction network based on BERT.

Preferably, the step 2) specifically includes the following steps:

2a) Initializing a vectorized representation of head, relationship and tail entities of the triplet;

2b) Joining the vectorized representations of the entities and relationships of the triples, computing feature vectors of the triples using a projection matrix;

2c) Calculating an attention value of the triplet using a releasorelu;

2d) Calculating relative attention values of triples having the same head entity using a softmax function;

2e) Calculating a new vectorized representation of the entity from the relative attention values of the triples and the feature vectors;

2f) Complex legal relationships are encapsulated using a multi-headed attention mechanism and the vectorized representation of the entity is updated again.

Preferably, the depth case sensing model in the step 3) is a trained model, and the method for constructing the depth case sensing model specifically includes:

3a) Obtaining a certain amount of similar case pairs based on a certain case, wherein the similar case pairs are positive cases, and randomly replacing one case in part of the similar case pairs to form negative cases, so as to construct a similar case data set;

3b) Respectively inputting case pairs in the similar case data set into the case law relation recognition model, respectively constructing case knowledge graphs, and respectively obtaining vectorization representation of the case knowledge graphs of each case by using the multi-head attention mechanism;

3c) And respectively linking the vectorized representations of the entities and the relations of the similar case pairs, constructing a deep case sensing network, and training the network by using the vectorized representations of the cases to obtain the deep case sensing model.

Preferably, the case descriptions include case descriptions in legal documents such as prosecution books, decision books and the like, and case descriptions in informal legal texts in which case information is recorded.

In accordance with a second embodiment of the present invention, there is provided a multiple-head attention-based legal case similarity calculation system:

a multiple-head attention-based legal case similarity calculation system, the system comprising:

the case knowledge graph construction module is used for inputting case descriptions of two cases with similarity to be calculated into the case law relation recognition model, further respectively extracting triples formed by law relations in the two cases, and constructing a case knowledge graph according to the triples, wherein the triples comprise a head entity, a relation and a tail entity;

the case knowledge graph representation module is in signal connection with the case knowledge graph construction module and is used for converting the case knowledge graph into vectorized representation through a multi-head attention mechanism;

the deep case sensing network module is in signal connection with the case knowledge map representation module and is used for inputting the vectorized representation into the deep case sensing model to obtain the similarity of two cases.

Preferably, the case law relation recognition model includes: the entity identification module is used for identifying the entity in the case description; the relation extraction module is in signal connection with the entity identification module and is used for identifying the relation between the entities; the relation extraction module extracts the relation between the two entities from the sentence by finding the sentence containing the two entities, wherein the two entities comprise a head entity and a tail entity.

Preferably, when the case is described as a structured text and a similar structured text, the entity recognition module adopts a rule-based entity recognition method.

In the first embodiment of the application, a triplet formed by legal relations, namely a head entity, a relation and a tail entity, is firstly extracted from two cases of similarity to be calculated through a case legal relation recognition model, and then a case knowledge graph is constructed by utilizing the triplet; and converting the case knowledge graph into vectorized representation by utilizing a multi-head attention mechanism so as to facilitate later calculation. Finally, the similarity of the two cases is calculated by going deep into the emotion perception model. According to the technical scheme, after the triplet information is screened out through the technical means, the vectorization representation method is utilized, the accuracy of calculation of similarity between cases is improved, and compared with the prior art, the calculation result is more accurate and reliable.

In a first embodiment of the present application, a case law relationship recognition model extracts entities in a case, including a head entity and a tail entity, through an entity recognition module. The case law relation recognition model extracts the relation between two entities of the same sentence through the relation extraction module.

It should be noted that, the entity recognition module recognizes entities, which are collectively called named entity recognition, and named entity recognition (Named Entities Recognition, NER) research generally includes 3 major classes (entity class, time class, and number class) and 7 minor classes (person name, place name, organization name, time, date, currency, and percentage), and the purpose of the research is to recognize these named entities in the corpus. There are three main ways: 1) rule-based named entity recognition, 2) statistics-based named entity recognition, 3) hybrid approach. When the case descriptions of the structured texts and the similar structured texts are identified, the entity identification module identifies the entity in a rule-based entity identification mode.

In the first embodiment of the present application, in the method for constructing the case law relationship recognition model, entities and relationships having a great influence or having a great influence on calculating case similarity are determined first, a case law relationship recognition model training dataset is constructed, and then the case law relationship recognition model training dataset is imported into the constructed entity recognition module and the relationship extraction module, so that the case law relationship recognition model is trained, and the trained case law relationship recognition model is constructed.

In a first embodiment of the present application, in step 2), vectorized representations of head entities, relations and tail entities of the triples are initialized, constituting vector parameters. The feature vectors of the triples are calculated using a projection matrix to facilitate later computation when joining the vectorized representations of the entities and relationships of the triples. And then, calculating the relative attention value of the triplet with the same head entity by using the leakage ReLU and the softmax function, and calculating the new vectorization representation of the entity by using the relative attention value of the triplet and the eigenvector. And finally, packaging the complex legal relation by using a multi-head attention mechanism and updating the vectorized representation of the entity again to obtain the vectorized representation of the optimized entity.

In the first embodiment of the application, training the deep case perception model is needed, and a similar case data set is constructed by acquiring a certain amount of similar case pairs based on a certain case, wherein the similar case pairs are positive examples, and randomly replacing one case in part of the similar case pairs to form negative examples; and then, the similar case data set is imported into a case law relation recognition model, the entities in the similar case data set are recognized, and the relation among the entities is extracted. And finally, respectively linking the entity and relation vectorization representation of the cases in the similar case data set to obtain vectorization representation of the cases in the similar case data set, and training the network by using the vectorization representation of the cases in the similar case data set to obtain the deep case perception model. Thereby improving the accuracy of the depth case perception model.

In a second embodiment of the present application, a multiple-head attention-based legal case similarity calculation system running a software program comprising the method described in the first embodiment, the system comprising: the system comprises a case knowledge graph construction module, a case knowledge graph characterization module in signal connection with the case knowledge graph construction module, and a deep case perception network module in signal connection with the case knowledge graph characterization module. The similarity of the legal cases is calculated through the cooperation application of the case knowledge graph construction module, the case knowledge graph characterization module and the deep case perception network module. Compared with the prior art, the accuracy of application scenes such as class search, class recommendation and the like in law is improved.

In a second embodiment of the present application, the case law relationship recognition model recognizes entities in the case description through an entity recognition module, and recognizes relationships between the entities through a relationship extraction module; specifically, the relation extracting module extracts the relation between two entities from the sentence by finding the sentence containing the two entities, wherein the two entities comprise a head entity and a tail entity.

Compared with the prior art, the invention has the following beneficial effects:

1. and forming a case knowledge graph by extracting legal relations in the case description, obtaining vectorized representation of the case knowledge graph by using a multi-head attention mechanism, and finally calculating the similarity of the two cases by using a deep case perception network. The depth case awareness network aided with training of structured legal relations can greatly improve accuracy of case similarity calculation.

Drawings

FIG. 1 is a flow chart of a multi-head attention-based legal case similarity calculation method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a knowledge graph of a case situation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-head attention-based legal case similarity calculation system according to an embodiment of the present invention.

Detailed Description

Preferably, the step 2) specifically includes the following steps:

2c) Calculating an attention value of the triplet using a releasorelu;

the deep case sensing network module is in signal connection with the case knowledge map representation module and is used for inputting the vectorized representation into the deep case sensing model and obtaining the similarity of two cases.

Example 1

Example 2

Example 1 is repeated except that the case law relationship identification model described in step 1) includes: the entity identification module is used for identifying the entity in the case description; and the relation extraction module is used for identifying the relation between the entities in the case description. And when the case descriptions of the structured texts and the similar structured texts are identified, the entity identification module identifies the entity in a rule-based entity identification mode.

The method for constructing the case law relation recognition model in the step 1) comprises the following steps:

in this embodiment, 600 parts of the decision documents are randomly selected for labeling, and the entities and relationships contained in the case description are labeled by using a BIO labeling method.

It should be noted that, in this embodiment, a decision document taking civil loan as a case is obtained, where the decision document is a structured document, and a case main body description part and an examined and ascertained part are separated from the decision document by a regular expression and are used as a case description. Determining entities and relations with great influence on case similarity under the case, wherein the entities comprise: a plague, a natural person subject, a non-natural person subject, etc., the relationship comprising: roles, couples, non-persistent relationships between couples, guarantors, etc.

the present embodiment first uses a pre-trained language model BERT to describe sentences x= (x) in the case description ₁ ,x ₂ ,,x _n ) Is mapped to a low-dimensional dense word vector, where n is the number of words contained in the sentence; then extracting sentence characteristics by using a two-way long-short-term memory network to obtain a hidden state sequence H= (H) _[CLS] ,h ₁ ,h ₂ ,…,h _n )∈R ^n×m The last layer of the two-way long-short-term memory network maps the state vector from m-dimension to k-dimension to obtain p= (P) ₁ ,p ₂ ,…,p _n )∈R ^n×k Wherein k is the kind of entity; finally, using a conditional random field to carry out sentence-level sequence labeling, wherein for the sentence x, the corresponding label sequence is y= (y) ₁ ,y ₂ ,…,y _n ) A scoring function is defined as:

wherein A is a transfer score matrix between output labels, A _ij The transition score from the ith tag to the jth tag is shown.

The probability of tag sequence y is then predicted using a softmax function:

meanwhile, in order to maximize the prediction score, taking the logarithm of the predicted output tag sequence:

in the prediction process, the Viterbi algorithm is used to calculate the optimal predicted tag sequence:

The embedded layer and the feature extraction layer of the BERT-based relation extraction network and the BiLSTM-CRF-based entity identification network in the embodiment are consistent, and two entities e are obtained ₁ And e ₂ Hidden state sequence h= (H) _[CLS] ,h ₁ ,h ₂ ,…,h _n ) And vector representation v of two entities ₁ And v ₂ Using only h _[CLS] As a feature of the sentence. The sentence is characterized by the following steps after being integrated into the entity:

the relation calculation formula contained in the sentence is as follows:

P(y|h)＝softmax(h·w ₂ )

wherein w is ₁ 、w ₂ 、b ₁ Is a neural network parameter.

The knowledge graph of the case obtained in the step 1 in this embodiment is shown in fig. 2.

Example 3

Example 2 was repeated except that step 2) specifically comprises the steps of:

in this embodiment, the triplet may be represented as t _ijk ＝(e _i ,r _k ,e _j ) Initializing the entities and relationships to d-dimensional vectors using random initialization, the vectorized triples may be represented asWherein e _i And e _j Representing a head entity and a tail entity, respectively, r _k Express relationship, h _i 、h _j And l _k Each of which is its corresponding vectorized representation.

in this embodiment, the calculation formula of the feature vector of the triplet is:

wherein W is ₁ In the form of a linear transformation matrix,

2c) Calculating an attention value of the triplet using a releasorelu;

in this embodiment, the calculation formula of the attention value of the triplet is:

wherein W is ₂ Is a weight matrix; the leakyReLU is an activation function, and the calculation formula is:

wherein α is a fixed parameter.

in this embodiment, the calculation formula of the relative attention value of the triplet is:

wherein N is _i For entity e _i Is set of neighbor entities, R _ij To be as entity e _i A relationship set for the head entity; b _inr Is R _ij Is a relationship of (1).

in this embodiment, the calculation formula of the new vectorized representation of the triplet is:

where M is the number of attention points.

In this embodiment, the calculation formula of the multi-head attention mechanism is as follows:

in this embodiment, the objective function for training the multi-head attention mechanism is:

s is a set of the triples, S' is a set obtained by randomly replacing a head entity or a tail entity in S, and gamma is a super parameter;is a scoring function.

Example 4

The embodiment 3 is repeated, except that the depth case sensing model in the step 3) is a trained model, and the depth case sensing model building method specifically includes:

in this embodiment, a decision document using civil lending as a case is obtained, and a similar case data set is constructed by manual labeling.

in this embodiment, first, a knowledge graph g of the case c is obtained _c Obtaining the vectorization representation h of the triplet in the case knowledge graph through a multi-head attention mechanism _c ∈R ^n×m Wherein n is the number of case c triples, and m is the vector dimension of the case triples.

In this embodiment, assume that the case of the input depth case awareness network is c _a And c _b Because the legal relation quantities extracted from different cases are different, in order to make the vectorization representation dimension of the two cases identical, packing operation is carried out on the case with fewer triples, and finally, tensors with the dimension of 2 multiplied by n multiplied by m are formed. The present embodiment uses a 34-layer residual network to construct a deep case network, and each residual block can be expressed as:

y＝W _s x+F(x,W _i )

wherein W is _s And W is _i For a linear transformation matrix, x and y represent the input and output vectors, respectively, of the residual block, F (x, W _i ) Is a residual map.

Example 5

Example 4 was repeated except that the case descriptions included case descriptions in legal documents such as prosecution books, decision books, and the like and case descriptions in informal legal texts in which case information was recorded.

Example 6

It should be noted that portions not specifically described in the present specification belong to the prior art.

It should be noted that the foregoing description of the preferred embodiments is not intended to limit the scope of the invention, but rather to limit the scope of the claims, and that those skilled in the art can make substitutions or modifications without departing from the scope of the invention as set forth in the appended claims.

It should be appreciated that the legal case similarity calculation method and system based on multi-head attention disclosed by the invention can be applied to actual scenes such as case recommendation, case retrieval and the like. For case recommendation, the historical cases browsed by the user are respectively used for obtaining vectorization representations of the cases by the steps 1 and 2 disclosed by the invention, vectorization representations of all cases are linked, the convolution neural network is used for carrying out dimension reduction to obtain tensors with dimension of 1 multiplied by n multiplied by m, then similarity is calculated with the cases in the database by the step 3 disclosed by the invention, and the cases with high similarity are recommended to the user. For case retrieval, the case description input by the user is used for obtaining vectorization representation of cases by using the steps 1 and 2 disclosed by the invention, similarity is calculated by using the step 3 disclosed by the invention on the cases in the database respectively, and the cases are displayed to the user according to the similarity degree order. The above application scenario does not substantially modify or improve the method disclosed in the present invention, and all the modifications are included in the protection scope of the present invention.

Claims

1. A method for calculating the similarity of legal cases based on multi-head attention, comprising:

1) Inputting case situation descriptions of two cases with similarity to be calculated into a case law relation recognition model, respectively extracting triples formed by law relations in the two cases, wherein each triplet comprises a head entity, a relation and a tail entity, and constructing a case situation knowledge graph according to the triples; the method for constructing the case law relation recognition model comprises the following steps:

1c) The relation extraction module is constructed, and the relation extraction module is constructed by adopting a relation extraction network based on BERT;

2) Converting the case knowledge graph into a vectorized representation through a multi-head attention mechanism, wherein the method comprises the following steps of:

2c) Calculating an attention value of the triplet using a releasorelu;

2f) Packaging complex legal relationships using a multi-headed attention mechanism and updating the vectorized representation of the entity again;

3) Inputting the vectorized representation of the case into a deep case perception model to obtain the similarity of two cases; the depth case perception model is a trained model, and the depth case perception model construction method specifically comprises the following steps:

3b) Respectively inputting case pairs in the similar case data set into the case legal relation recognition model, respectively constructing case knowledge graphs, and respectively obtaining vectorization representation of the case knowledge graphs of each case by using the multi-head attention mechanism;

2. The multi-headed attention-based legal case similarity calculation method of claim 1, where the case law relationship identification model comprises: the system comprises an entity identification module for identifying the entities in the case description and a relation extraction module for identifying the relation between the entities in the case description; and when the case descriptions of the structured texts and the similar structured texts are identified, the entity identification module identifies the entity in a rule-based entity identification mode.

3. The multi-headed attention-based legal case similarity calculation method according to claim 1 or 2, wherein the case descriptions include case descriptions existing in prosecution books, decision books and informal legal texts in which case information is recorded.

4. A multi-attention-based legal case similarity calculation system applying the multi-attention-based legal case similarity calculation method of any one of claims 1-3, the system comprising:

5. The multi-head attention-based legal case similarity calculation system of claim 4, wherein the case law relationship identification model comprises:

the entity identification module is used for identifying the entity in the case description;

the relation extraction module is in signal connection with the entity identification module and is used for identifying the relation between the entities;

the relation extraction module extracts the relation between the two entities from the sentence by finding the sentence containing the two entities, wherein the two entities comprise a head entity and a tail entity.

6. The multiple attention based legal case similarity calculation system of claim 5 wherein said entity identification module employs a rule based entity identification method when said case is described as structured text and structured-like text.

7. The multiple attention based legal case similarity calculation system of any of claims 4-6 wherein said case descriptions comprise case descriptions that exist in prosecution books, decision books and informal legal texts that document case information.