CN114219682A

CN114219682A - Case decision prediction method, system and medium based on BERT hidden layer information

Info

Publication number: CN114219682A
Application number: CN202111386108.5A
Authority: CN
Inventors: 黄熙宇; 张月国; 齐开悦; 蒋兴浩; 姚立红
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-03-22

Abstract

The invention provides a case judgment prediction method, a case judgment prediction system and a case judgment prediction medium based on BERT hidden layer information, which relate to the technical field of legal services and comprise the following steps: step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text; step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained; step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model; step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods. The method and the device can solve the problem of insufficient utilization of network model information and improve the accuracy of each task prediction.

Description

Case decision prediction method, system and medium based on BERT hidden layer information

Technical Field

The invention relates to the technical field of legal services, in particular to a criminal case judgment and prediction method based on BERT hidden layer information, and particularly relates to a case judgment and prediction method, a case judgment and prediction system and a case judgment and prediction medium based on the BERT hidden layer information.

Background

In the prior art, criminal case judgment prediction and supervision are mostly carried out by law professionals one by one, the judgment form is low in efficiency, and a judging mode which is fair and objective, high in efficiency and high in accuracy is required to be provided.

The invention patent with publication number CN110222866A discloses an intelligent civil case prediction system and method combining spoken language description and question and answer, comprising the following steps: s1, receiving spoken language case description input by a user; s2, determining the consultation intention of the user according to the spoken language case description; s3, detecting whether the characteristic content is complete according to the consultation intention, if so, executing a step S4, otherwise, prompting a user to supplement the corresponding characteristic content; and S4, calling the prediction model to output a corresponding consultation result to the user according to the complete characteristic content. Although the case can be predicted according to the partial characteristics of the case, the whole process contains manual participation, complete fairness and objectivity cannot be achieved, and the number of the characteristics to be predicted is limited.

The invention patent with publication number CN110969276A discloses a judgment prediction method, a judgment prediction model obtaining method and a judgment prediction model obtaining device, which are used for obtaining case description texts of cases to be predicted; segmenting words of the case description text to obtain a word sequence; obtaining a matrix formed by vocabulary vectors of all vocabularies in the vocabulary sequence; and inputting the matrix into a preset judgment prediction model to obtain judgment prediction information output by the preset judgment prediction model. Although the method can predict legal decisions, the correlation among a plurality of subtasks is not grasped, the used network structure is simpler, and the accuracy rate is to be improved.

Aiming at the defects in the prior art, the invention aims to provide a criminal case judgment and prediction method using a deep learning network architecture method, which is fair and objective and improves the accuracy.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a case decision prediction method, a case decision prediction system and a case decision prediction medium based on BERT hidden layer information.

According to the case decision prediction method, system and medium based on BERT hidden layer information provided by the invention, the scheme is as follows:

in a first aspect, a case decision prediction method based on BERT hidden layer information is provided, where the method includes:

step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;

step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;

step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;

step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;

step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.

Preferably, the step S1 includes: and denoising the original data of the case text to obtain the denoised case text.

Preferably, the case prediction model in step S4 includes: a deep learning network model;

the deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit;

the BERT information extraction unit arranges the full-text word vectors obtained in the step S3 into a matrix according to the sequence, then the full-text word vectors pass through n coding layers after being embedded into a layer through deep learning words, the output of each coding layer is used as the output of the next layer, and finally the hidden layer outputs of the n layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text;

the attention mechanism information generation unit acquires respective weights of hidden layer outputs of the n layers acquired by the BERT information extraction unit through an attention mechanism, and adds the hidden layer outputs according to the weights to acquire a final information generation vector;

the criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new characteristic vector with the dimension same as that required by criminal name prediction through the linear layer, and the position of the maximum value of the characteristic vector is the predicted value of the criminal name;

the criminal period prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the criminal period prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;

the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.

Preferably, the attention mechanism information generating unit specifically includes:

and solving the matrix Q, K and V relation between the output of the n-th hidden layer and the output of other layers obtained by the BERT information extraction unit according to an attention mechanism, wherein the formula is as follows:

Q＝YW^Q

K＝XW^K

V＝XW^V

wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and W^Q，W^K，W^VIs an initialized matrix, the dimensionality corresponds to the hidden layer output;

and calculating the weight of each hidden layer according to an attention mechanism based on the Q, K and V matrix, wherein the attention mechanism formula is as follows:

wherein, K^TRepresents the transpose of matrix K; d is the dimension of the matrix Q;

and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.

In a second aspect, a case decision prediction system based on BERT hidden layer information is provided, the system comprising:

module M1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;

module M2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;

module M3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;

module M4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;

module M5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.

Preferably, the module M1 includes: and denoising the original data of the case text to obtain the denoised case text.

Preferably, the case prediction model in the module M4 includes: a deep learning network model;

Q＝YW^Q

K＝XW^K

V＝XW^V

In a third aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, performs the steps of the method.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention relates to a case description prediction method based on BERT hidden layer information and an attention mechanism algorithm, which is used for predicting a plurality of tasks for judging case description of a legal text and aims to predict cases in a judicial field according to cases judged and judged by a historical court by analyzing the case description text of the cases in the judicial field;

2. the invention fully uses the BERT pre-training network structure with wide applicability and high accuracy in the field, outputs each hidden layer to be fully utilized, and simultaneously acquires the key information of the surface layer and the deep layer of the text;

3. the invention adds the most popular attention mechanism in the field of natural language processing at present, focuses on the relationship among all layers of the network, solves the problem of insufficient utilization of network model information, and greatly improves the accuracy of each task prediction.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a criminal case decision prediction flow chart;

FIG. 2 is an overall framework diagram of a criminal case judgment network based on BERT hidden layer information;

fig. 3 is a schematic diagram of a BERT network structure.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment of the invention provides a case judgment prediction method based on BERT hidden layer information, researches criminal case judgment prediction technology, and aims to predict cases in the judicial field according to cases judged and judged by a historical court by analyzing case description texts of the cases in the judicial field. Referring to fig. 1, the method comprises the following specific steps:

step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; the preprocessing in the step is to perform denoising processing on the original data of the case text to obtain the denoised case text.

the case prediction model in the step comprises the following steps: and (5) deeply learning the network model. The deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit.

The BERT information extraction unit arranges the full-text word vectors obtained in step S3 into a matrix according to the sequence, then embeds the words in a deep learning layer, and then passes through 12 encoding layers, the output of each encoding layer is used as the output of the next layer, and finally the hidden layer outputs of 12 layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text.

The attention mechanism information generation unit obtains respective weights of the hidden layer outputs of the 12 layers obtained by the BERT information extraction unit through the attention mechanism, and adds the hidden layer outputs according to the weights to obtain a final information generation vector. Specifically, a matrix Q, K, V relationship between the hidden layer output of the 12 th layer and the output of each other layer obtained by the BERT information extraction unit is first obtained according to the attention mechanism, and the formula is as follows:

Q＝YW^Q

K＝XW^K

V＝XW^V

wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and W^Q，W^K，W^VIs the initialized matrix, the dimensionality, the hidden layer output corresponds.

According to the Q, K and V matrix, the weight of each hidden layer can be calculated according to an attention mechanism, wherein the attention mechanism formula is as follows:

The criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And converting the information generation vector into a new feature vector with the dimension same as that required by the criminal name prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal name.

And the criminal stage prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And (3) converting the information generation vector into a new feature vector with the dimension same as that required by the criminal period prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal period.

And the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And converting the information generation vector into a new feature vector with the dimension same as the dimension required by the prediction of the related legal provision through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the related legal provision.

The embodiment of the invention also provides a case decision prediction system based on the BERT hidden layer information, which comprises the following concrete steps:

module M1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; the preprocessing in the step is to perform denoising processing on the original data of the case text to obtain the denoised case text.

Q＝YW^Q

K＝XW^K

V＝XW^V

The criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. Converting the information generation vector into a new feature vector with the dimension same as that required by the criminal name prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal name;

and the criminal stage prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. The information generation vector is converted into a new feature vector with the dimension same as that required by the criminal period prediction through a linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;

The present invention will be described in more detail below.

As shown in fig. 1, the present embodiment provides a case decision prediction method based on BERT hidden layer information. Including network model structure, input and output, etc.

As shown in fig. 2, the coded full-text word vectors are put into the BERT network to obtain each layer of output, and then weighted and added according to the attention mechanism, and finally the final prediction result is obtained through the linear layer.

Note that the force mechanism operation is schematically intended as shown in fig. 3, and the purpose is to obtain respective weights of hidden layer outputs of 12 layers obtained by the BERT information extraction unit, and add the hidden layer outputs according to the weights to obtain a final information generation vector. Specifically, the Q, K, V relationship between the hidden layer output of the 12 th layer and the output of each other layer obtained by the BERT information extraction unit is first obtained according to the attention mechanism, and the formula is as follows:

Q＝YW^Q

K＝XW^K

V＝XW^V

wherein X is the twelfth hidden layer output, Y is the other hidden layer output, W^Q，W^K，W^VIs an initialized matrix, with dimensions corresponding to hidden layer outputs.

where d is the dimension of the matrix Q.

And multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector. And finally, obtaining final legal criminal names, related legal terms and criminal period prediction results through the linear layer.

The embodiment selects the Chinese legal text data set CAIL-2108 which is relatively common at present to generate a data set as the training data of the network proposed by the embodiment. The experimental result shows that the accuracy of predicting each task reaches the SOTA level in the industry.

The test result of the embodiment shows that the invention has strong commercial value on the basis of simple network design and no need of selection of complicated manual characteristics.

The embodiment of the invention provides a case judgment prediction method, a case judgment prediction system and a case judgment prediction medium based on BERT hidden layer information, wherein a plurality of tasks for judging case description of legal text are predicted based on the BERT hidden layer information and an attention mechanism algorithm, and the case judgment prediction method aims at predicting the case in the judicial field according to a case judged and judged by a historical court by analyzing the case description text of the case in the judicial field; the BERT pre-training network structure with wide applicability and high accuracy in the field is fully used, each hidden layer is output and fully utilized, and key information of the surface layer and the deep layer of the text is acquired; the invention also adds the most popular attention mechanism in the current natural language processing field, focuses on the relationship among all layers of the network, solves the problem of insufficient utilization of network model information, and greatly improves the accuracy of each task prediction.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A case decision prediction method based on BERT hidden layer information is characterized by comprising the following steps:

2. The case decision prediction method based on BERT hidden layer information as claimed in claim 1, wherein the step S1 comprises: and denoising the original data of the case text to obtain the denoised case text.

3. The case decision prediction method based on BERT hidden layer information as claimed in claim 1, wherein the case prediction model in step S4 comprises: a deep learning network model;

the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, the information generation vector is converted into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.

4. The case decision prediction method based on BERT hidden layer information as claimed in claim 3, wherein the attention mechanism information generating unit specifically comprises:

Q＝YW^Q

K＝XW^K

V＝XW^V

wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and W^Q，W^K，W^VIs an initialized matrix, dimensionThe hidden layer outputs correspondingly;

5. A case decision prediction system based on BERT hidden layer information, comprising:

6. The BERT hidden layer information-based case decision prediction system according to claim 5, wherein the module M1 comprises: and denoising the original data of the case text to obtain the denoised case text.

7. The case decision prediction system based on BERT hidden layer information as claimed in claim 5, wherein the case prediction model in the module M4 comprises: a deep learning network model;

8. The case decision prediction system based on BERT hidden layer information as claimed in claim 7, wherein the attention mechanism information generating unit specifically comprises:

Q＝YW^Q

K＝XW^K

V＝XW^V

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.