CN114781389B

CN114781389B - Crime name prediction method and system based on label enhancement representation

Info

Publication number: CN114781389B
Application number: CN202210209170.5A
Authority: CN
Inventors: 但静培; 胥岚林; 廖晓爽
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2024-04-05
Anticipated expiration: 2042-03-04
Also published as: CN114781389A

Abstract

The invention provides a crime name prediction method and a system based on label enhancement representation, wherein the method comprises the following steps: selecting cases as a sample set, and inputting descriptions for each case in the given sample set; giving a label input description of a crime name corresponding to each case; encoding each case description and obtaining a contextually relevant embedded representation of each word in each case description; encoding each crime name tag and obtaining an embedded representation of each crime name tag; alternately using a self-attention mechanism and a cross-attention mechanism for the coded crime name label to obtain a crime name enhancement label representation; splicing the case text representation and the crime name enhancement tag representation, and training a classifier of a convolutional neural network model; and predicting the cases to be predicted in the trained crime name prediction model to obtain the predicted crime name. According to the method, semantic information contained in the criminal name enhancement tag representation enables training data to have better interpretation, and therefore higher prediction accuracy is obtained.

Description

Crime name prediction method and system based on label enhancement representation

Technical Field

The invention relates to the technical field of machine learning, in particular to a crime name prediction method and a crime name prediction system based on label enhancement representation.

Background

Legal judgment is to complete the prediction of the crime name according to the description of the case facts, and can play an effective auxiliary role in the judgment of criminal cases, and more attention is paid in recent years, mainly providing higher-quality judgment results for people without legal basis on one hand; and on the other hand, provides legal reference for professional legal persons.

In recent years, many studies have been made on automatic decisions. Initially, the problem of autodecision was treated as a simple text classification problem, which was handled by some conventional means, such as keyword matching. With the development of deep learning, more students began to extract information in text using the framework of deep learning to assist in automatic decision making. However, most of the methods focus on text content of case descriptions, the model needs to learn the characteristics of the case descriptions, and the crime label is ignored to have certain semantic information, so that accuracy in the aspect of crime prediction is always unsatisfactory.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a crime name prediction method and a crime name prediction system based on label enhancement representation.

In order to achieve the above object of the present invention, the present invention provides a crime name prediction method based on tag enhancement representation, comprising the steps of:

selecting cases as a sample set, and inputting descriptions for each case in the given sample set; giving a label input description of a crime name corresponding to each case;

encoding each case description and obtaining a contextually relevant embedded representation of each word in each case description, denoted as a case text representation X _f ；

Encoding each crime name tag description and obtaining an embedded representation of each crime name tag, and marking a tag set containing embedded representations of all crime name tags as E ^T ；

Fusing the encoded crime name labels with the case text representation to alternately use a self-attention mechanism and a cross-attention mechanism to obtain a crime name enhancement label representation

Representing the case text X _f Enhanced tag representation with crime nameSplicing, and training a classifier of the model through a convolutional neural network model to obtain a trained crime prediction model;

and predicting the cases to be predicted in the trained crime name prediction model to obtain the predicted crime name.

According to the method, the crime label is mapped to a potential semantic space through embedding the crime label, important information of case fact description is fused into the crime enhancement label, and a classifier is trained based on the important information, so that a crime prediction task of case description is completed. Under the condition of a small sample, the model can obtain higher prediction precision and has certain generalization capability on low-frequency criminals.

The preferable scheme of the crime name prediction method based on label enhancement representation is that the input description of each case in a sample set is given, and the input description S of each case is given ^d Performing word granularity processing to obtain case fact descriptionThe i-th word in the case input description text is represented, m is the number of words in the case input description text, i is a positive integer, and i is more than or equal to 1 and less than or equal to m;

performing word granularity processing on each crime name tag input description to obtain a crime name tag C is a positive integer not more than L, L represents the number of the crime name labels, and p represents the number of the crime names;

description of case factsCrime name label-> Encoding is performed.

The method can restore the fact of the case and the description of the crime label to the greatest extent in the form of text features, and improves the accuracy of the crime prediction.

The preferable scheme of the crime name prediction method based on label enhancement representation describes the case factsEncoding, the last hidden layer output of the encoder is used as the context-dependent embedded representation of each word in the case fact description, i.e. +.>Wherein d is _s Representing the dimension of the last hidden layer of the encoder, of->Representing the embedded representation corresponding to the i-th word in the case fact description.

Tag crime nameEncoding, outputting the last hidden layer of the encoder as embedded representation of each crime label on word granularity +.>Representing the embedded representation corresponding to the jth word in the criminal name label, and summing the embedded representation of each criminal name label to obtaine ^c An embedded representation representing the c-th crime name label, resulting in a tag set E containing embedded representations of all crime name labels ^T ＝[e ¹ ,e ² ,...,e ^c ,...,e ^L ]。

The case description and the crime label can be mapped in the same semantic space, the information learned by the pre-training model is simultaneously applied to the case description and the crime label, and the semantic information of the crime label is brought into the training process of the model, so that the training data has better interpretation, and higher prediction precision is obtained.

In the preferred scheme of the crime name prediction method based on label enhancement representation, when a self-attention mechanism and a cross-attention mechanism are alternately used for coded crime name labels, an attention model with Q-K-V is adopted according to a transducer model:

let the key matrix beThe query matrix is +.>The value matrix isWherein W is _k 、W _q 、W _v The attention output is obtained by the scaled dot product of the convertors for the attention as an all-zero matrix>Wherein N and M represent the length of the query vector and key value, respectively, D is the word embedding dimension, D _k Representing the dimensions of a key or query matrix, D _v Representing the dimension of the value matrix;

residual connection is performed during feed-forward, and final output is obtained as a crime name enhancement tag representation:wherein h is _c Refer to a specific representation of the crime name tag c.

The crime label is fused with the case text to realize the enhancement representation of the crime label, and the model realizes the fusion of the preliminary case and the crime before the classifier, so that the training data has better interpretation on the model, and the accuracy of the crime prediction is improved.

The invention also provides a crime name prediction system, which comprises a processing module and a storage module, wherein the processing module and the storage module are mutually connected in a communication way, and the storage is used for storing at least one executable instruction which enables the processor to execute the operation corresponding to the crime name prediction method based on the tag enhancement representation.

The processing module comprises a case description encoder, a tag characteristic enhancer and a classifier;

the case description encoder encodes each case description to obtain a case text representation;

the tag feature enhancer maps the crime name tag to a potential semantic space to obtain an embedded representation of the crime name tag, and fuses the embedded representation with the case text representation to obtain a crime name enhanced tag representation;

and the classifier fuses the case text representation and the crime name enhancement tag representation, trains a classification model for classification prediction, and obtains a prediction result.

The crime name prediction system has all the advantages of the above crime name prediction method.

According to the invention, the semantic information contained in the criminal name enhancement tag representation enables the training data to have better interpretation, so that higher prediction precision is obtained.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

fig. 1 is a functional block diagram of a crime name prediction system.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.

The invention provides a crime name prediction method based on label enhancement representation, which mainly comprises the steps of fusing important information of case fact description into label representation of corresponding subtasks, and training a classifier based on the important information to complete the crime name prediction task of case fact description. The method comprises the following specific steps:

the case is selected as the sample set. The sample set contains a large number of cases, and the types of the criminal names corresponding to the cases are as many as possible.

The sample set used in this embodiment is a CAIL2018 dataset, where each sample in the dataset is a legal case, and each case has the same structure, including the following parts, the fact description of the case, and the results of related laws, criminal names, criminal periods, and the like. The CAIL2018 dataset consists of two parts: the details of CAIL-small and CAIL-big are shown in Table 1.

Table 1 data set introduction

	Training set size	Test set size	Number of crime names
				CAIL-small	154592	32508	196
CAIL-big	1710856	217016	196

Wherein CAIL-small additionally provides 17131 pieces of data as a validation set. In the cai-big dataset, there is a small number of fact descriptions corresponding to multiple crime names, because our goal is to verify only if the tag semantics can improve the performance of crime name prediction, and because the number of training samples for multiple crime name tags is sparse, to reduce the complexity of the model, delete the data samples for multiple crime name tags in the cai-big dataset, only the data samples for single crime name tags are retained.

Given each case input description in the sample set, for each case input description S ^d Performing word granularity processing to obtain case fact descriptionThe i-th word in the case input description text is represented, m is the number of words in the case input description text, i is a positive integer, and i is more than or equal to 1 and less than or equal to m.

Likewise, each case corresponds to a corresponding crime name, the corresponding crime names corresponding to different cases are the same, and the different cases are different, and the label input of the crime name corresponding to each case is givenPerforming word granularity processing on the input description of each crime name label to obtain a crime name labelThe j-th word of the descriptive text is input by the crime name label c, c is a positive integer not more than L, L represents the number of the crime name labels, and p represents the number of the current crime name labels. Aggregate notation s= (S) for all the criminal name labels ¹ ,S ² ,...,S ^c ,...,S ^L )。

And then respectively describing the facts of the casesCrime name labelEncoding is performed.

In this embodiment, bert is preferably but not limited to selected as the basic encoder to encode the case fact description and criminal name label. The method comprises the following steps:

description of case factsWhen encoding, word sequence describing the fact of case +.>Input to Bert pre-training model, the last hidden layer output of the pre-training model is +.>Wherein d is _s The dimension of the last hidden layer of the Bert is represented, and the Bert pre-training model describes a word sequence S of the case fact ^d Each word of (a) is expanded into a ds-dimensional column vector, ">Representing the ith column vector in the ds dimension column vector, representing the embedded representation corresponding to the ith word in the case fact description. The Bert is subjected toThe last hidden layer output is used as a context dependent embedded representation of each word in the case fact description, i.e./i> Text representation X of a case _f 。

Tag crime nameWhen coding is carried out, the method encodes the crime name label through the Bert, and selects the last hidden layer of the Bert to output as the embedded representation of each crime name label on the word granularityBert pre-training model tags crime names S ^c Each word of (a) is expanded into a ds-dimensional column vector, ">Representing the j-th column vector in the column vector of the ds dimension, representing the embedded representation corresponding to the j-th word in the crime label c, and summing the embedded representations of each crime label to obtain +.>e ^c The embedded representation representing the c-th crime name label is finally collected to obtain a label set E containing embedded representations of all crime name labels ^T ＝[e ¹ ,e ² ,...,e ^c ,...,e ^L ]。

And then enhancement processing is carried out on the crime name label, in the embodiment, the encoded crime name label is fused with the case text representation to alternately use a self-attention mechanism and a cross-attention mechanism, so that the enhancement processing is carried out on the crime name label, and the crime name enhancement label representation is obtained.

In particular, the embodiment improves the attention of multiple heads in a decoder in a transducer model, and innovatively proposes a method for enhancing label representation. The crime name tag feature enhancer is implemented using a self-attention mechanism and a cross-attention mechanism alternately, and according to a transducer model, an attention model with Q-K-V is employed:

let the key matrix beThe query matrix is +.>The value matrix isWherein W is _k 、W _q 、W _v The attention output is obtained by the scaled dot product of the convertors for the attention as an all-zero matrix>Namely, the criminal name tag weight is dispersed to a case description result, wherein N and M respectively represent the length of a query vector and a key value, D is a word embedding dimension, and K ^T Transposed matrix of K, D _k Representing the dimensions of a key or query matrix, D _v Representing the dimension of the value matrix.

Finally, residual connection is carried out during feed-forward, and final output is obtained and used as crime name enhancement tag representation:wherein h is _c Refers to an enhanced representation of the crime name tag c. Representing the case text by H ^d Enhancement tag representation with crime name->Splicing to obtain->Training a classifier of the model through a convolutional neural network model CNN to obtain a trained criminal name prediction model.

In this embodiment, a DPCNN model is used as a classifier in a convolutional neural network model CNN, and an H input DPCNN classifier is used to obtain a text feature representation Z fused with a crime name tag feature ^T And connecting the output of the function with a ReLU activation function to obtain a crime name prediction result. In particular, the text feature representation Z fused with the tag feature ^T Is passed to a fully connected layer using the ReLU activation function for final prediction.

Predictive valueWherein W is ₀ And b ₀ Representing a randomly initialized parameter matrix.

Defining a loss functionWherein->Represents the predicted value, y _c Representing a true value. And optimizing and training the target loss function by using a gradient descent method, and obtaining a training completion crime name prediction model when the training completion condition is reached.

In the training process of this embodiment, the size of the word embedding dimension D is preferably but not limited to 128, the number of multi-head attentions in the transducer is preferably but not limited to 8, and the AdamW optimizer is preferably but not limited to be used, and the learning rate is preferably but not limited to be 0.001, and the regularization parameter is preferably but not limited to be 10 ^-4 。

When the crime name of the case is predicted, the case to be predicted is predicted in a trained crime name prediction model, and the predicted crime name is obtained.

In order to embody the superiority of the method, the method is compared with a BiLSTM+ ATT, textCNN, DPCNN, BERT + fine tuning model, and the same sample set is adopted during comparison.

BiLSTM+ATT: is a classical text classification model that captures both upper and lower Wen Yuyi using bi-directional LSTM with attention mechanisms and automatically selects important features through attention during training, a variant of neural networks based on attention mechanisms.

TextCNN: the CNN model is widely applied in the field of image processing, and the textCNN model is applied to process text data, so that the method has remarkable effect on text classification.

DPCNN: a commonly used text classification model is a variant of the CNN model.

Bert+ fine tuning: combining the pre-training model with the downstream task model and fine-tuning the parameters of the pre-training model. Fine tuning is currently the most common way to apply pre-trained models to specific tasks, and by combining with various downstream task models, various NLP tasks can be completed.

By testing accuracy (Acc), macro-precision (MP), macro-recall (MR), macro F1 macro-F1 (F1), and top five accuracies at top 5 (Acc@5) of the predicted result rank as a test index.

The test results are shown below:

TABLE 1 criminal name prediction on CAIL-small dataset

TABLE 2 criminal name prediction on CAIL-big dataset

	Acc	MP	MR	F1	Acc@5
						BiLSTM+ATT	0.948	0.811	0.815	0.810	0.991
TextCNN	0.944	0.799	0.804	0.798	0.989
						DPCNN	0.961	0.857	0.859	0.855	0.993
Bert+Fine Tune	0.958	0.914	0.915	0.914	0.987
						The application	0.960	0.921	0.924	0.921	0.993

Table 3 manifestation on low frequency crime names

As shown in Table 1, on the CAIL-small data set, compared with BiLSTM+ATT, textCNN and DPCNN, the test results of the present application all obtain the highest scores, and the accuracy is improved by more than 6%. The method and the device have little difference with the Bert+Fine Tune, but because the method freezes the parameters of the pre-training model in the training process and does not need to update the parameters of the pre-training model in the reverse updating process, the training time is obviously shortened compared with the Fine tuning, and the model can be converged more quickly.

As shown in Table 2, on the CAIL-big dataset, the baseline model and the proposed method approach 100% and the accuracy difference is not more than 3% because the dataset is very large.

In CAIL-small, part of crime training data is less than 100, and the test is carried out on the part of data, wherein the test result is shown in a table 3, the performance of the low-frequency crime training data on the low-frequency crime is greatly improved compared with that of BiLSTM+ATT, textCNN and DPCNN, the accuracy rate of the low-frequency crime training data is respectively different from that of the model by 26.0%, 16.1% and 25.3%, and the accuracy rate of the Bert+Fine Tune is slightly different from that of the model, but the semantic information of the low-frequency crime is improved by 14.8% on an ACC@5 index, so that the semantic information of the low-frequency crime can be captured by the low-frequency crime training data and the semantic information can be further utilized in a crime prediction task.

The processing module is shown in fig. 1 and comprises a case description encoder, a tag characteristic enhancer and a classifier;

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A crime name prediction method based on label enhancement representation is characterized by comprising the following steps:

selecting cases as a sample set, giving each case input description in the sample set, and giving the label input description of the criminal name corresponding to each case:

for each casePart input description S ^d Performing word granularity processing to obtain case fact description S ^d ＝ The i-th word in the case input description text is represented, m is the number of words in the case input description text, i is a positive integer, and i is more than or equal to 1 and less than or equal to m;

encoding each case fact description and obtaining a contextually relevant embedded representation of each word in each case fact description, denoted as a case text representation X _f ：

Description of case factsEncoding, using the last hidden layer output of the encoder as a context-dependent embedded representation of each word in the case fact description, i.e.Wherein d is _s Representing the dimension of the last hidden layer of the encoder, of->Representing an embedded representation corresponding to an i-th word in the case fact description;

encoding each crime name tag description and obtaining an embedded representation of each crime name tag, and marking a tag set containing embedded representations of all crime name tags as E ^T ：

Tag crime nameEncoding, outputting the last hidden layer of the encoder as embedded representation of each crime label on word granularity +.>Wherein->Representing the embedded representation corresponding to the j-th word in the crime label, summing the embedded representation of each crime label to obtain +.>e ^c An embedded representation representing the c-th crime name label, resulting in a tag set E containing embedded representations of all crime name labels ^T ＝[e ¹ ,e ² ,...,e ^c ,...,e ^L ]；

Fusing the encoded crime name labels with the case text representation to alternately use a self-attention mechanism and a cross-attention mechanism to obtain a crime name enhancement label representationSpecific:

according to the transducer model, an attention model with Q-K-V is used:

residual connection is performed during feed-forward, and final output is obtained as a crime name enhancement tag representation:wherein h is _c A specific representation referring to a crime name tag c;

2. The tag-enhanced representation-based crime prediction method of claim 1, wherein the loss function of the convolutional neural network model isWherein->Represents the predicted value, y _c Representing the true value, L represents the total number of criminal name tags.

3. A system for predicting a crime name, comprising a processing module and a storage module, wherein the processing module and the storage module are in communication connection with each other, and the storage module is configured to store at least one executable instruction, and the executable instruction causes the processing module to perform the operation corresponding to the method for predicting a crime name based on the tag enhancement representation according to any one of claims 1-2.

4. A crime prediction system according to claim 3, wherein the processing module comprises a case description encoder, a tag feature enhancer and a classifier;