CN113868414A

CN113868414A - Interpretable legal dispute focus summarizing method and system

Info

Publication number: CN113868414A
Application number: CN202110982983.3A
Authority: CN
Inventors: 邓蔚; 刘永聪; 赵晨曦; 刘新星; 曹雅筠; 高垒; 查金豆
Original assignee: Chengdu Weichuang Technology Co ltd
Current assignee: Chengdu Weichuang Technology Co ltd
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-12-31

Abstract

The invention relates to an interpretable legal dispute focus summarizing method and system, which comprises the following steps: setting a slice sequence with a certain length, and performing text word segmentation and character conversion processing on an original text through the slice sequence; coding the processed text information by using a BERT prediction model; carrying out weight distribution on the vector subjected to coding processing by adopting an attention mechanism to obtain a comprehensive vector; inputting the vector into a linear layer to obtain an output vector, and performing Sigmoid processing on the output vector to obtain a Probasic vector; and predicting the dispute focus of each category, and outputting the probability that each category is positive to obtain a focus induction result. According to the method, weights are given to token vectors coded by BERT through an attention mechanism, and each token represents the importance degree of a prediction result, so that a certain interpretability is provided while the conclusion effect of a dispute focus is ensured.

Description

Interpretable legal dispute focus summarizing method and system

Technical Field

The invention relates to the technical field of natural language processing, in particular to an interpretable legal dispute focus summarizing method and an interpretable legal dispute focus summarizing system.

Background

Legal intelligence has received increasing attention in recent years to improve jurisdictional efficiency and intelligent assistance in case of case. At present, the technology based on traditional machine learning and deep learning and the like are mainly used for carrying out research and application on legal intelligence. Models such as decision trees, random forests and the like are constructed based on a traditional machine learning method, knowledge in the judicial field is obtained through information extraction, legal intelligent tasks are solved, and interpretability is provided to a certain extent. A representation learning method is adopted based on a deep learning technology, and legal knowledge is embedded into a vector for modeling and prediction based on pre-training language model embedding. In legal intelligent applications, interpretability is very important, and not only is the prediction result of a model required to be correct, but also a certain degree of interpretability needs to be provided for people.

At present, research and application in the aspect of legal intelligence are mainly focused on fields such as legal knowledge maps, legal decision prediction, legal named entity identification, legal event extraction, legal information retrieval, legal question answering, dispute focus identification and class case matching, and related technical disclosures and reports are not provided in the aspects of legal reasoning, interpretable dispute focus identification and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides an interpretable legal dispute focus summarizing method and an interpretable legal dispute focus summarizing system, and solves the problem that the prior art cannot accurately summarize the legal dispute focus.

The invention is realized by the following technical scheme: an interpretable legal dispute focus summarization method, the method comprising:

s1, setting a slice sequence with a certain length, and performing text word segmentation and character conversion processing on the original text through the slice sequence;

s2, coding the processed text information by using a BERT prediction model;

s3, carrying out weight distribution on the vector subjected to the encoding processing by adopting an attention mechanism to obtain a comprehensive vector;

s4, inputting the vector into a full-connection Sigmoid layer, obtaining an output vector with dimension of 1 x n through a linear layer, and carrying out Sigmoid processing on the output vector to obtain a Probasic vector;

and S5, predicting the dispute focus of each category, and outputting the probability that each category is positive to obtain a focus induction result.

The setting of the slice sequence with a certain length, and the text word and character conversion processing of the original text through the slice sequence comprises the following steps:

s11, dividing the original text into a character by taking token as a unit and storing the character into a list, adding a special character [ CLS ] at the beginning of the list, and replacing the character which is not contained in the dictionary with a character [ UNK ];

s12, setting the maximum length of the slicing sequence to be n, directly slicing the list with the length larger than n, and continuously adding characters [ PAD ] at the tail of the list with the length smaller than n until the length of the list is n;

and S13, converting each word in the sliced list into a sequence number corresponding to the word in the dictionary.

The encoding process of the processed text information by using the BERT prediction model comprises the following steps: and performing feature extraction on the context information through a bidirectional Transformer, transmitting the processed data into a BERT prediction model to encode the data, and outputting a vector T encoding token context information correspondingly by each token to realize vectorization representation of token meaning.

The performing weight distribution on the vector subjected to encoding processing by adopting an attention mechanism to obtain a comprehensive vector comprises the following steps:

s31, carrying out nonlinear activation on the BERT coded output vector T except for the special character [ CLS ] at the beginning to obtain an activation matrix T ^ a';

s32, multiplying the learnable matrix W initialized at random with the activation matrix T ^ to obtain a vector with the length of N-1, performing Softmax processing on the vector to obtain a weight vector A with the sum of 1, and calculating the inner product of the weight vector A and each row of the activation matrix T ^ to obtain a vector C ^ integrating text content.

The induction method further comprises the step of constructing a BERT prediction model before processing the original text; the step of constructing the BERT prediction model comprises the following steps:

constructing a BERT prediction model consisting of a BERT coding layer, an attention layer and a fully connected Sigmoid layer;

setting a network parameter L of the model to be 12, representing the number of transform layers, H to 768, representing the internal dimension of the transform, A to be 12 and representing the number of Heads;

and pre-training the network parameters of the BERT model by using all civil legal documents in the Chinese referee document network.

An interpretable legal dispute focus induction system comprises a prediction model construction module, an original text processing module, a text information coding module, a weight vector generation module, a full-connection Sigmoid layer processing module and a prediction module;

the prediction model construction module is used for constructing a BERT prediction model consisting of a BERT coding layer, an attention layer and a full-connection Sigmoid layer;

the original text processing module is used for processing an original text and performing text word segmentation and character conversion;

the text information coding module is used for coding the information subjected to the original text processing module through a BERT coding layer of a BERT prediction model;

the weight vector generation module is used for distributing weights to output vectors of the BERT coding module by using an attention mechanism;

the fully-connected Sigmoid layer processing module is used for transmitting the vector into a Linear layer to obtain an output vector, and performing Sigmoid processing on the output vector to obtain a Probasic vector;

the prediction module is used for predicting the dispute focus of each category and outputting the probability that each category is positive to obtain a focus induction result.

The invention has the following advantages: an interpretable law dispute focus summarizing method and system, wherein weights are given to token vectors coded by BERT through an attention mechanism to observe the importance of each token to a prediction result, and certain interpretability is provided while the effect is ensured. The BERT model with attention mechanism used by the invention is not only superior to other reference models in the effect of law dispute focus summarization, but also can realize interpretability of the model according to the weight value assigned to each word in the attention mechanism. Inputting the sample into the trained BERT model with the attention mechanism can not only obtain the final dispute focus prediction result, but also allocate the weight to each character according to the attention layer in the model. And then, the weight corresponding to each character is highlighted by taking the corresponding character of the first 15% from high to low, so that the highlighted content in the example dispute focus summary is observed according to the marking result, and the interpretable purpose is achieved.

Drawings

FIG. 1 is a flow chart of model construction for an interpretable legal dispute focus generalization method of the present invention based on a BERT model with attention mechanism;

FIG. 2 is a schematic diagram of text encoding of the BERT pre-training model of the present invention;

FIG. 3 is a schematic diagram of the BERT model with attention mechanism of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.

The present invention provides weights to be interpretable in conjunction with a pre-trained language model BERT and in conjunction with an attention mechanism. The BERT model with the attention mechanism comprises a BERT coding layer, an attention layer, a full-connection Sigmoid layer and the like. Of these, attention is paid to observe the importance of each token to the prediction result by giving a weight to the token vector encoded by BERT. The BERT model with attention mechanism can provide certain interpretability while ensuring the effect.

As shown in fig. 1, one embodiment of the present invention relates to an interpretable legal dispute focus summarization method based on BERT model with attention mechanism, which specifically includes the following:

s1, original text processing: setting the maximum length of the sequence, and performing text word and character division conversion;

further, the specific implementation method comprises the following steps:

and S11, segmenting the text into words by taking the word (token) as a unit and storing the words in a list. Then, a [ CLS ] special character is added to the head of the list, and [ UNK ] is used to replace a word not included in the dictionary. Setting the maximum length of the sequence to be 512, directly slicing the list with the length of more than 512, and continuously adding [ PAD ] at the tail of the list with the length of less than 512 until the length of the list is 512;

s12, converting the characters into id. The words (tokens) in the natural language are converted to ids of numeric type for use by the model, where the ids refer to the order numbers of tokens in the dictionary.

S2, text information encoding: encoding the text by using a BERT pre-training model;

as shown in fig. 2, the BERT pre-training model uses a bidirectional Transformer to perform feature extraction on context information:

the Transformer has strong characteristic extraction capability. Transmitting the processed data into a BERT network to encode the data, wherein each token has a corresponding output vector T, and context information of the token is encoded in the T and is vectorized representation of the token meaning;

parameters of this part of the BERT model select parameters of the BERT base model: l ═ 12, representing the number of transform layers; 768, representing the Transformer internal dimension; a is 12, representing the number of Heads;

the pretraining parameters of the BERT model are network parameters of the BERT model pretrained by 2654 million civil law documents in the Chinese referee document network in OpenClap. The training data of the pre-training model is data of the corresponding field, the pre-training language model can obtain better word vector representation, and downstream tasks can also have better effect.

S3, distributing weight to the output vector of the BERT coding module by adopting an attention mechanism;

as shown in fig. 3, the attention mechanism is adopted to assign weights to the output vectors of the BERT coding module, and the weights are integrated to obtain a comprehensive vector C ^ according to:

the attention mechanism distributes weight to the information of the Encoder when predicting words each time, then weights the information of the Encoder according to the normalized weight to obtain a comprehensive vector value C, and finally outputs a prediction result Y through the Decode part;

s31, carrying out nonlinear activation on the BERT coding output vector T except for the initial [ CLS ] special character, wherein the activation function selects a tanh activation function, and the activated matrix is T ^ a';

s32, the matrix T ^' is transmitted to the attention module: multiplying the learnable matrix W initialized at random by T ^' to obtain a vector with the length of N-1, and then performing Softmax processing on the vector to obtain a weight vector A with the sum of 1. The weighted value of attention distribution is obtained through the method, and then the inner product is calculated by each line of A and T ^ to obtain a vector C ^ which integrates the text content.

And S4, inputting the vector into the fully-connected Sigmoid layer, obtaining an output vector with the dimension of 1 × n through the linear layer, and carrying out Sigmoid processing on the output vector to obtain a Proavailability vector.

Wherein the Sigmiod function is

Wherein δ (z)_j) Indicating the application of Sigmoid function to the number z_j，z_jRepresenting a single raw output value, Sigmoid processing is to better judge whether each dimension is 0 or 1, and if more than 0.5 in each dimension is regarded as 1, the rest is regarded as 0.

S41, because the dimension information of the C ^ vector is the same as the BERT coded output vector corresponding to each token, in order to predict the dispute focus of 4 categories, an output vector with the dimension size of 1 × 4 needs to be obtained through the full connection layer. The vector C ^' in the layer is firstly transmitted into 1 Linear layer to obtain an output vector with the dimension of 1 multiplied by 4;

and S42, performing Sigmoid processing on the output vector to obtain a Proavailability vector.

S5, predicting the dispute focus of each category: and outputting the probability that each category is positive to obtain a focus induction result.

Another embodiment of the invention relates to an interpretable legal dispute focus summarization system based on a BERT model with attention mechanism, comprising:

a prediction model construction module: the method comprises the steps of constructing a BERT prediction model consisting of a BERT coding layer, an attention layer and a fully connected Sigmoid layer;

an original text processing module: the system is used for processing the original text and performing text word and character conversion;

the text information coding module: the BERT coding layer is used for coding the information subjected to the original text processing module through the BERT prediction model;

a weight vector generation module: for assigning weights to output vectors of the BERT coding module using an attention mechanism;

the fully-connected Sigmoid layer processing module: the device is used for transmitting the vector into a Linear layer to obtain an output vector, and performing Sigmoid processing on the output vector to obtain a Probasic vector;

a prediction module: the method is used for predicting the dispute focus of each category and outputting the probability that each category is positive to obtain a focus induction result.

The invention gives weight to token vector of BERT coding through attention mechanism to observe the importance of each token to the prediction result, and provides certain interpretability while ensuring effect. The BERT model with attention mechanism used by the invention is not only superior to other reference models in the effect of law dispute focus summarization, but also can realize interpretability of the model according to the weight value assigned to each word in the attention mechanism. Inputting the sample into the trained BERT model with the attention mechanism can not only obtain the final dispute focus prediction result, but also allocate the weight to each character according to the attention layer in the model. And then, the weight corresponding to each character is highlighted by taking the corresponding character of the first 15% from high to low, so that the highlighted content in the example dispute focus summary is observed according to the marking result, and the interpretable purpose is achieved.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An interpretable legal dispute focus summarization method, comprising: the induction method comprises the following steps:

s2, coding the processed text information by using a BERT prediction model;

2. An interpretable legal dispute focus generalizing method according to claim 1, wherein: the setting of the slice sequence with a certain length, and the text word and character conversion processing of the original text through the slice sequence comprises the following steps:

3. An interpretable legal dispute focus generalizing method according to claim 1, wherein: the encoding process of the processed text information by using the BERT prediction model comprises the following steps: and performing feature extraction on the context information through a bidirectional Transformer, transmitting the processed data into a BERT prediction model to encode the data, and outputting a vector T encoding token context information correspondingly by each token to realize vectorization representation of token meaning.

4. An interpretable legal dispute focus generalizing method according to claim 1, wherein: the performing weight distribution on the vector subjected to encoding processing by adopting an attention mechanism to obtain a comprehensive vector comprises the following steps:

5. An interpretable legal dispute focus generalizing method according to claim 1, wherein: the induction method further comprises the step of constructing a BERT prediction model before processing the original text; the step of constructing the BERT prediction model comprises the following steps:

6. An interpretable legal dispute focus summarization system, comprising: the system comprises a prediction model construction module, an original text processing module, a text information coding module, a weight vector generation module, a full-connection Sigmoid layer processing module and a prediction module;