CN114219682A - Case decision prediction method, system and medium based on BERT hidden layer information - Google Patents

Case decision prediction method, system and medium based on BERT hidden layer information Download PDF

Info

Publication number
CN114219682A
CN114219682A CN202111386108.5A CN202111386108A CN114219682A CN 114219682 A CN114219682 A CN 114219682A CN 202111386108 A CN202111386108 A CN 202111386108A CN 114219682 A CN114219682 A CN 114219682A
Authority
CN
China
Prior art keywords
case
prediction
text
information generation
bert
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111386108.5A
Other languages
Chinese (zh)
Inventor
黄熙宇
张月国
齐开悦
蒋兴浩
姚立红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202111386108.5A priority Critical patent/CN114219682A/en
Publication of CN114219682A publication Critical patent/CN114219682A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a case judgment prediction method, a case judgment prediction system and a case judgment prediction medium based on BERT hidden layer information, which relate to the technical field of legal services and comprise the following steps: step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text; step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained; step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model; step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods. The method and the device can solve the problem of insufficient utilization of network model information and improve the accuracy of each task prediction.

Description

Case decision prediction method, system and medium based on BERT hidden layer information
Technical Field
The invention relates to the technical field of legal services, in particular to a criminal case judgment and prediction method based on BERT hidden layer information, and particularly relates to a case judgment and prediction method, a case judgment and prediction system and a case judgment and prediction medium based on the BERT hidden layer information.
Background
In the prior art, criminal case judgment prediction and supervision are mostly carried out by law professionals one by one, the judgment form is low in efficiency, and a judging mode which is fair and objective, high in efficiency and high in accuracy is required to be provided.
The invention patent with publication number CN110222866A discloses an intelligent civil case prediction system and method combining spoken language description and question and answer, comprising the following steps: s1, receiving spoken language case description input by a user; s2, determining the consultation intention of the user according to the spoken language case description; s3, detecting whether the characteristic content is complete according to the consultation intention, if so, executing a step S4, otherwise, prompting a user to supplement the corresponding characteristic content; and S4, calling the prediction model to output a corresponding consultation result to the user according to the complete characteristic content. Although the case can be predicted according to the partial characteristics of the case, the whole process contains manual participation, complete fairness and objectivity cannot be achieved, and the number of the characteristics to be predicted is limited.
The invention patent with publication number CN110969276A discloses a judgment prediction method, a judgment prediction model obtaining method and a judgment prediction model obtaining device, which are used for obtaining case description texts of cases to be predicted; segmenting words of the case description text to obtain a word sequence; obtaining a matrix formed by vocabulary vectors of all vocabularies in the vocabulary sequence; and inputting the matrix into a preset judgment prediction model to obtain judgment prediction information output by the preset judgment prediction model. Although the method can predict legal decisions, the correlation among a plurality of subtasks is not grasped, the used network structure is simpler, and the accuracy rate is to be improved.
Aiming at the defects in the prior art, the invention aims to provide a criminal case judgment and prediction method using a deep learning network architecture method, which is fair and objective and improves the accuracy.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a case decision prediction method, a case decision prediction system and a case decision prediction medium based on BERT hidden layer information.
According to the case decision prediction method, system and medium based on BERT hidden layer information provided by the invention, the scheme is as follows:
in a first aspect, a case decision prediction method based on BERT hidden layer information is provided, where the method includes:
step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;
step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
Preferably, the step S1 includes: and denoising the original data of the case text to obtain the denoised case text.
Preferably, the case prediction model in step S4 includes: a deep learning network model;
the deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit;
the BERT information extraction unit arranges the full-text word vectors obtained in the step S3 into a matrix according to the sequence, then the full-text word vectors pass through n coding layers after being embedded into a layer through deep learning words, the output of each coding layer is used as the output of the next layer, and finally the hidden layer outputs of the n layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text;
the attention mechanism information generation unit acquires respective weights of hidden layer outputs of the n layers acquired by the BERT information extraction unit through an attention mechanism, and adds the hidden layer outputs according to the weights to acquire a final information generation vector;
the criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new characteristic vector with the dimension same as that required by criminal name prediction through the linear layer, and the position of the maximum value of the characteristic vector is the predicted value of the criminal name;
the criminal period prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the criminal period prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;
the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.
Preferably, the attention mechanism information generating unit specifically includes:
and solving the matrix Q, K and V relation between the output of the n-th hidden layer and the output of other layers obtained by the BERT information extraction unit according to an attention mechanism, wherein the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs an initialized matrix, the dimensionality corresponds to the hidden layer output;
and calculating the weight of each hidden layer according to an attention mechanism based on the Q, K and V matrix, wherein the attention mechanism formula is as follows:
Figure BDA0003367165180000031
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
In a second aspect, a case decision prediction system based on BERT hidden layer information is provided, the system comprising:
module M1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;
module M2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
module M3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
module M4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
module M5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
Preferably, the module M1 includes: and denoising the original data of the case text to obtain the denoised case text.
Preferably, the case prediction model in the module M4 includes: a deep learning network model;
the deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit;
the BERT information extraction unit arranges the full-text word vectors obtained in the step S3 into a matrix according to the sequence, then the full-text word vectors pass through n coding layers after being embedded into a layer through deep learning words, the output of each coding layer is used as the output of the next layer, and finally the hidden layer outputs of the n layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text;
the attention mechanism information generation unit acquires respective weights of hidden layer outputs of the n layers acquired by the BERT information extraction unit through an attention mechanism, and adds the hidden layer outputs according to the weights to acquire a final information generation vector;
the criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new characteristic vector with the dimension same as that required by criminal name prediction through the linear layer, and the position of the maximum value of the characteristic vector is the predicted value of the criminal name;
the criminal period prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the criminal period prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;
the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.
Preferably, the attention mechanism information generating unit specifically includes:
and solving the matrix Q, K and V relation between the output of the n-th hidden layer and the output of other layers obtained by the BERT information extraction unit according to an attention mechanism, wherein the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs an initialized matrix, the dimensionality corresponds to the hidden layer output;
and calculating the weight of each hidden layer according to an attention mechanism based on the Q, K and V matrix, wherein the attention mechanism formula is as follows:
Figure BDA0003367165180000041
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
In a third aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, performs the steps of the method.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention relates to a case description prediction method based on BERT hidden layer information and an attention mechanism algorithm, which is used for predicting a plurality of tasks for judging case description of a legal text and aims to predict cases in a judicial field according to cases judged and judged by a historical court by analyzing the case description text of the cases in the judicial field;
2. the invention fully uses the BERT pre-training network structure with wide applicability and high accuracy in the field, outputs each hidden layer to be fully utilized, and simultaneously acquires the key information of the surface layer and the deep layer of the text;
3. the invention adds the most popular attention mechanism in the field of natural language processing at present, focuses on the relationship among all layers of the network, solves the problem of insufficient utilization of network model information, and greatly improves the accuracy of each task prediction.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a criminal case decision prediction flow chart;
FIG. 2 is an overall framework diagram of a criminal case judgment network based on BERT hidden layer information;
fig. 3 is a schematic diagram of a BERT network structure.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention provides a case judgment prediction method based on BERT hidden layer information, researches criminal case judgment prediction technology, and aims to predict cases in the judicial field according to cases judged and judged by a historical court by analyzing case description texts of the cases in the judicial field. Referring to fig. 1, the method comprises the following specific steps:
step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; the preprocessing in the step is to perform denoising processing on the original data of the case text to obtain the denoised case text.
Step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
the case prediction model in the step comprises the following steps: and (5) deeply learning the network model. The deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit.
The BERT information extraction unit arranges the full-text word vectors obtained in step S3 into a matrix according to the sequence, then embeds the words in a deep learning layer, and then passes through 12 encoding layers, the output of each encoding layer is used as the output of the next layer, and finally the hidden layer outputs of 12 layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text.
The attention mechanism information generation unit obtains respective weights of the hidden layer outputs of the 12 layers obtained by the BERT information extraction unit through the attention mechanism, and adds the hidden layer outputs according to the weights to obtain a final information generation vector. Specifically, a matrix Q, K, V relationship between the hidden layer output of the 12 th layer and the output of each other layer obtained by the BERT information extraction unit is first obtained according to the attention mechanism, and the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs the initialized matrix, the dimensionality, the hidden layer output corresponds.
According to the Q, K and V matrix, the weight of each hidden layer can be calculated according to an attention mechanism, wherein the attention mechanism formula is as follows:
Figure BDA0003367165180000061
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
The criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And converting the information generation vector into a new feature vector with the dimension same as that required by the criminal name prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal name.
And the criminal stage prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And (3) converting the information generation vector into a new feature vector with the dimension same as that required by the criminal period prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal period.
And the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And converting the information generation vector into a new feature vector with the dimension same as the dimension required by the prediction of the related legal provision through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the related legal provision.
Step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
The embodiment of the invention also provides a case decision prediction system based on the BERT hidden layer information, which comprises the following concrete steps:
module M1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text; the preprocessing in the step is to perform denoising processing on the original data of the case text to obtain the denoised case text.
Module M2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
module M3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
module M4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
the case prediction model in the step comprises the following steps: and (5) deeply learning the network model. The deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit.
The BERT information extraction unit arranges the full-text word vectors obtained in step S3 into a matrix according to the sequence, then embeds the words in a deep learning layer, and then passes through 12 encoding layers, the output of each encoding layer is used as the output of the next layer, and finally the hidden layer outputs of 12 layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text.
The attention mechanism information generation unit obtains respective weights of the hidden layer outputs of the 12 layers obtained by the BERT information extraction unit through the attention mechanism, and adds the hidden layer outputs according to the weights to obtain a final information generation vector. Specifically, a matrix Q, K, V relationship between the hidden layer output of the 12 th layer and the output of each other layer obtained by the BERT information extraction unit is first obtained according to the attention mechanism, and the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs the initialized matrix, the dimensionality, the hidden layer output corresponds.
According to the Q, K and V matrix, the weight of each hidden layer can be calculated according to an attention mechanism, wherein the attention mechanism formula is as follows:
Figure BDA0003367165180000081
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
The criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. Converting the information generation vector into a new feature vector with the dimension same as that required by the criminal name prediction through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the criminal name;
and the criminal stage prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. The information generation vector is converted into a new feature vector with the dimension same as that required by the criminal period prediction through a linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;
and the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model. And converting the information generation vector into a new feature vector with the dimension same as the dimension required by the prediction of the related legal provision through a linear layer, wherein the position of the maximum value of the feature vector is the predicted value of the related legal provision.
Module M5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
The present invention will be described in more detail below.
As shown in fig. 1, the present embodiment provides a case decision prediction method based on BERT hidden layer information. Including network model structure, input and output, etc.
As shown in fig. 2, the coded full-text word vectors are put into the BERT network to obtain each layer of output, and then weighted and added according to the attention mechanism, and finally the final prediction result is obtained through the linear layer.
Note that the force mechanism operation is schematically intended as shown in fig. 3, and the purpose is to obtain respective weights of hidden layer outputs of 12 layers obtained by the BERT information extraction unit, and add the hidden layer outputs according to the weights to obtain a final information generation vector. Specifically, the Q, K, V relationship between the hidden layer output of the 12 th layer and the output of each other layer obtained by the BERT information extraction unit is first obtained according to the attention mechanism, and the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the twelfth hidden layer output, Y is the other hidden layer output, WQ,WK,WVIs an initialized matrix, with dimensions corresponding to hidden layer outputs.
According to the Q, K and V matrix, the weight of each hidden layer can be calculated according to an attention mechanism, wherein the attention mechanism formula is as follows:
Figure BDA0003367165180000091
where d is the dimension of the matrix Q.
And multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector. And finally, obtaining final legal criminal names, related legal terms and criminal period prediction results through the linear layer.
The embodiment selects the Chinese legal text data set CAIL-2108 which is relatively common at present to generate a data set as the training data of the network proposed by the embodiment. The experimental result shows that the accuracy of predicting each task reaches the SOTA level in the industry.
The test result of the embodiment shows that the invention has strong commercial value on the basis of simple network design and no need of selection of complicated manual characteristics.
The embodiment of the invention provides a case judgment prediction method, a case judgment prediction system and a case judgment prediction medium based on BERT hidden layer information, wherein a plurality of tasks for judging case description of legal text are predicted based on the BERT hidden layer information and an attention mechanism algorithm, and the case judgment prediction method aims at predicting the case in the judicial field according to a case judged and judged by a historical court by analyzing the case description text of the case in the judicial field; the BERT pre-training network structure with wide applicability and high accuracy in the field is fully used, each hidden layer is output and fully utilized, and key information of the surface layer and the deep layer of the text is acquired; the invention also adds the most popular attention mechanism in the current natural language processing field, focuses on the relationship among all layers of the network, solves the problem of insufficient utilization of network model information, and greatly improves the accuracy of each task prediction.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (9)

1. A case decision prediction method based on BERT hidden layer information is characterized by comprising the following steps:
step S1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;
step S2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
step S3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
step S4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
step S5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
2. The case decision prediction method based on BERT hidden layer information as claimed in claim 1, wherein the step S1 comprises: and denoising the original data of the case text to obtain the denoised case text.
3. The case decision prediction method based on BERT hidden layer information as claimed in claim 1, wherein the case prediction model in step S4 comprises: a deep learning network model;
the deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit;
the BERT information extraction unit arranges the full-text word vectors obtained in the step S3 into a matrix according to the sequence, then the full-text word vectors pass through n coding layers after being embedded into a layer through deep learning words, the output of each coding layer is used as the output of the next layer, and finally the hidden layer outputs of the n layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text;
the attention mechanism information generation unit acquires respective weights of hidden layer outputs of the n layers acquired by the BERT information extraction unit through an attention mechanism, and adds the hidden layer outputs according to the weights to acquire a final information generation vector;
the criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new characteristic vector with the dimension same as that required by criminal name prediction through the linear layer, and the position of the maximum value of the characteristic vector is the predicted value of the criminal name;
the criminal period prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the criminal period prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;
the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, the information generation vector is converted into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.
4. The case decision prediction method based on BERT hidden layer information as claimed in claim 3, wherein the attention mechanism information generating unit specifically comprises:
and solving the matrix Q, K and V relation between the output of the n-th hidden layer and the output of other layers obtained by the BERT information extraction unit according to an attention mechanism, wherein the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs an initialized matrix, dimensionThe hidden layer outputs correspondingly;
and calculating the weight of each hidden layer according to an attention mechanism based on the Q, K and V matrix, wherein the attention mechanism formula is as follows:
Figure FDA0003367165170000021
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
5. A case decision prediction system based on BERT hidden layer information, comprising:
module M1: acquiring original data of case text, and preprocessing the original data of the case text to obtain a preprocessed case text;
module M2: segmenting words of the preprocessed case text to obtain preprocessed segmented case text;
module M3: the preprocessed word segmentation case text is coded according to a Chinese dictionary of the BERT to obtain word codes, and finally word vectors of the full text are obtained;
module M4: constructing a case prediction model and training the case prediction model to obtain a trained case prediction model;
module M5: and inputting the word vectors of the full-text word vectors and word pairs into the trained case prediction model to obtain the prediction results of related legal rules, criminals and criminal periods.
6. The BERT hidden layer information-based case decision prediction system according to claim 5, wherein the module M1 comprises: and denoising the original data of the case text to obtain the denoised case text.
7. The case decision prediction system based on BERT hidden layer information as claimed in claim 5, wherein the case prediction model in the module M4 comprises: a deep learning network model;
the deep learning network model comprises: the system comprises a BERT information extraction unit, an attention mechanism information generation unit, a criminal name prediction unit, a criminal period prediction unit and a related legal provision prediction unit;
the BERT information extraction unit arranges the full-text word vectors obtained in the step S3 into a matrix according to the sequence, then the full-text word vectors pass through n coding layers after being embedded into a layer through deep learning words, the output of each coding layer is used as the output of the next layer, and finally the hidden layer outputs of the n layers are obtained, wherein the outputs are in the form of vectors, and the vectors contain main information of the full text;
the attention mechanism information generation unit acquires respective weights of hidden layer outputs of the n layers acquired by the BERT information extraction unit through an attention mechanism, and adds the hidden layer outputs according to the weights to acquire a final information generation vector;
the criminal name prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new characteristic vector with the dimension same as that required by criminal name prediction through the linear layer, and the position of the maximum value of the characteristic vector is the predicted value of the criminal name;
the criminal period prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of a deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the criminal period prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the criminal period;
the related legal provision prediction unit predicts the criminal name of the information generation vector obtained by the attention mechanism information generation unit through a linear layer of the deep learning network model, converts the information generation vector into a new feature vector with the dimension same as the dimension required by the related legal provision prediction through the linear layer, and the position of the maximum value of the feature vector is the predicted value of the related legal provision.
8. The case decision prediction system based on BERT hidden layer information as claimed in claim 7, wherein the attention mechanism information generating unit specifically comprises:
and solving the matrix Q, K and V relation between the output of the n-th hidden layer and the output of other layers obtained by the BERT information extraction unit according to an attention mechanism, wherein the formula is as follows:
Q=YWQ
K=XWK
V=XWV
wherein X is the output of the n-th hidden layer, Y is the output of the other hidden layers, and WQ,WK,WVIs an initialized matrix, the dimensionality corresponds to the hidden layer output;
and calculating the weight of each hidden layer according to an attention mechanism based on the Q, K and V matrix, wherein the attention mechanism formula is as follows:
Figure FDA0003367165170000041
wherein, KTRepresents the transpose of matrix K; d is the dimension of the matrix Q;
and finally, multiplying the hidden layer output by the corresponding weight according to the weight obtained by softmax, and summing to obtain a final information generation vector.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202111386108.5A 2021-11-22 2021-11-22 Case decision prediction method, system and medium based on BERT hidden layer information Pending CN114219682A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111386108.5A CN114219682A (en) 2021-11-22 2021-11-22 Case decision prediction method, system and medium based on BERT hidden layer information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111386108.5A CN114219682A (en) 2021-11-22 2021-11-22 Case decision prediction method, system and medium based on BERT hidden layer information

Publications (1)

Publication Number Publication Date
CN114219682A true CN114219682A (en) 2022-03-22

Family

ID=80697729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111386108.5A Pending CN114219682A (en) 2021-11-22 2021-11-22 Case decision prediction method, system and medium based on BERT hidden layer information

Country Status (1)

Country Link
CN (1) CN114219682A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548080A (en) * 2022-04-24 2022-05-27 长沙市智为信息技术有限公司 Chinese wrong character correction method and system based on word segmentation enhancement

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548080A (en) * 2022-04-24 2022-05-27 长沙市智为信息技术有限公司 Chinese wrong character correction method and system based on word segmentation enhancement

Similar Documents

Publication Publication Date Title
CN108536679B (en) Named entity recognition method, device, equipment and computer readable storage medium
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN109885756B (en) CNN and RNN-based serialization recommendation method
CN112818861B (en) Emotion classification method and system based on multi-mode context semantic features
CN111538809B (en) Voice service quality detection method, model training method and device
CN110556130A (en) Voice emotion recognition method and device and storage medium
CN106960206A (en) Character identifying method and character recognition system
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN112967725A (en) Voice conversation data processing method and device, computer equipment and storage medium
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge
CN112232087A (en) Transformer-based specific aspect emotion analysis method of multi-granularity attention model
CN113065347B (en) Criminal case judgment prediction method, system and medium based on multitask learning
CN112699213A (en) Speech intention recognition method and device, computer equipment and storage medium
CN112863529A (en) Speaker voice conversion method based on counterstudy and related equipment
CN110222168A (en) A kind of method and relevant apparatus of data processing
CN114219682A (en) Case decision prediction method, system and medium based on BERT hidden layer information
CN115115986A (en) Video quality evaluation model production method, device, equipment and medium
CN113611293B (en) Mongolian data set expansion method
CN112990196B (en) Scene text recognition method and system based on super-parameter search and two-stage training
CN111445545A (en) Text-to-map method, device, storage medium and electronic equipment
CN111259673A (en) Feedback sequence multi-task learning-based law decision prediction method and system
CN113761935B (en) Short text semantic similarity measurement method, system and device
CN115168576A (en) Method, model and medium for analyzing aspect emotion
CN115169363A (en) Knowledge-fused incremental coding dialogue emotion recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination