CN113642756B

CN113642756B - Criminal investigation period prediction method based on deep learning technology

Info

Publication number: CN113642756B
Application number: CN202110584847.9A
Authority: CN
Inventors: 张鹏; 池瑶; 卢暾; 顾宁
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2023-11-24
Anticipated expiration: 2041-05-27
Also published as: CN113642756A

Abstract

The invention belongs to the technical field of judicial judgment, in particular to a criminal investigation period prediction method based on a deep learning technology. The invention designs a multi-document crime-preventing predictive model based on a hierarchical attention network of document classification, which is based on the main content of a crime-preventing document, and migrating contents according to the combined crime document and laws, and better predicting and identifying criminal periods of the criminal reduction cases, wherein the concrete steps comprise: collecting data and preprocessing; encoding, by a text encoder, vectors of the multiple documents; and (5) determining the distribution of the criminal investigation period. The invention is that the method can to meet the requirement of criminal investigation the criminal period predicts the demand more accurately.

Description

Criminal investigation period prediction method based on deep learning technology

Technical Field

The invention belongs to the technical field of judicial judgment, and particularly relates to a criminal investigation period prediction method based on a deep learning technology.

Background

According to the invention, a crime forecast identification model is constructed based on a multi-document attention mechanism model, and crime forecast periods are forecast by combining other document information related to crime according to basic document information of the crime forecast document, so that the method is more in line with the logic of crime forecast in reality, and the crime forecast periods are more accurately and effectively forecast.

Disclosure of Invention

The invention aims to provide a method capable of objectively and correctly predicting a criminal investigation period.

The method for predicting the criminal investigation period is based on a deep learning technology, and specifically utilizes a multi-document attention mechanism to construct a criminal investigation prediction model, and predicts the criminal investigation period by migrating and combining other document information related to the criminal investigation according to basic document information of the criminal investigation document;

in order to enable the deep learning model to combine more information across documents and realize the requirement of more accurately predicting the criminal investigation period, the invention designs a novel model for predicting the criminal investigation period across documents and classifies the documents through a hierarchical attention network (Hierarchical Attention Networks for Document Classification) ^[1] Based on the main content of the criminal document, the local context vector mechanism in the criminal document migrates the content according to the criminal document and the legal standard, and the criminal period is better predicted and identified. The model of the invention can be trained and deployed on a server, and can be predicted by using an interface calling model.

The method comprises the following specific steps:

(one) collecting data and preprocessing

Firstly, collecting criminal document data, criminal document data corresponding to the criminal document data and key legal basis, and preprocessing.

The pretreatment comprises the following steps:

(1) Regular matching: according to the principal material submitted during criminal investigation, namely a criminal investigation (false release) audit watch, the principal material is compared with the content of a document, and the parts of criminal names (Crime), original criminal period changes (Judge), transformation Performance (Performance), crime facts (Fact) and legal basis (Artistic) in the document are confirmed to be the key parts for determining the criminal period. Because the document format has certain standard, the corresponding data can be extracted by utilizing regular matching in a rule writing mode. And extracting the key information from the criminal document and the corresponding criminal document respectively to obtain data required by the model.

(2) Data cleaning: repeatedly mentioned words and sentences which are irrelevant to the predicted criminal investigation period in the text for many times are deleted in a regular matching mode; converting the Chinese number into Arabic number; and deleting the cleaned blank data.

(3) Word segmentation: for the data of the cleaning completion, there can be two processes: character or word segmentation and stop word removal.

(II) design model text encoder

The main part of the multi-document criminal investigation period prediction model is a Text Encoder (Text Encoder), which consists of three parts, namely: static text Encoder (Static Text Encoder), (criminal name) entity embedded Encoder (Crime Encoder), dynamic text Encoder (Dynamic Text Encoder). The function of the text encoder is to input crime name, criminal period change, transformation performance, criminal facts and legal basis, and the corresponding encoder generates a coding vector-d _crime 、d _judge 、d _performance 、d _fact 、d _article ；

a) Static text encoder

Static text encoder (Static Text Encoder) employs a hierarchical attention network model (Hierarchical Attention Networks for Document Classification) ^[1] . The original criminal period change (Judge) and reformulation Performance (Performance) paragraph documents will generate representation Vectors by a static text encoder (StaticText Encoder), where a global level Context vector (Context Vectors) is used in the model for selecting words and sentences with rich information, i.e. u shown in fig. 1.

The following describes the specific contents:

the hierarchical attention network model consists of several parts: (1) a word sequence encoder; (2) a word-level attention layer; (3) a sentence encoder; (4) a sentence-level attention layer; the specific explanation of each part is as follows:

assume that a document has L sentences in a text _i Representing the ith sentence, wherein the ith sentence contains T _i Individual words, word _it (i∈[1，L]，t∈[1，T _i ]) Representing the t-th word in the i-th sentence. Hierarchical note network models can map individual document raw text into a vector representation. Wherein:

(1) Word sequence Encoder (Word Encoder)

The sentence is preprocessed, the word can be converted into a word vector by using an embedding matrix, and then the context representation of the word is obtained through a bidirectional GRU layer.

The coding part can introduce BERT pre-training, so that the model can transfer and learn information of words and sentences of other texts, optimize the expression degree of some word and sentence vectors and further improve the accuracy of the model.

For the ith sentence, the word encoding process of the sentence is formulated as follows:

x _it ＝W _we word _it ，t∈[1，T _i ]；

wherein word _it A one-hot (one-hot) encoded representation for the T-th word in the i-th sentence, the length of the ith sentence word is T _i ，W _we Representing the embedded vector, the resulting x _it I.e., the pre-training vector representation of the t-th word in the i-th sentence.And->As a result of the bi-directional GRU layer, is a hidden vector representation of the t-th word in the i-th sentence.

(2) Word level Attention layer (Word Attention)

Not all words have equal effect on the representation of sentence semantics. Accordingly, attention mechanisms are introduced to extract words that are important for semantic representation of sentences and to aggregate representations of these words to form sentence vectors. Firstly, the hidden vector representation of the t word in the ith sentence in the text is passed through a single-layer full-connection layer to obtain the learning representation u as the t word in the ith sentence _it The method comprises the steps of carrying out a first treatment on the surface of the A context vector u representing word level is then initialized _w General purpose medicineCalculating the similarity of the t word in the i-th sentence and the context vector of the word level, and carrying out SoftMax normalization to obtain a method capable of measuring the importance degree alpha of the t word in the i-th sentence _it . Thereafter, weights of the weight-based word representations are calculated to obtain sentence vector s _i . Wherein the context vector u _w "advanced representation" considered as a fixed problem, i.e. "which are useful words", the model is randomly initialized during training and combined with the learning word context vector u _w ；

The specific process is expressed as follows:

u _it ＝tanh(W _wa h _it +b _wa )；

wherein W is _wa And b _wa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer; h is a _it Is the hidden vector of the t-th word in the i-th sentence obtained by the bi-directional GRU layer mentioned in the above section (1); u (u) _it Representing a representation of the t-th word in the i-th sentence obtained through the single fully-concatenated layer. u (u) _w A context vector representing a word level; alpha _it Representing that the T-th word in the i-th sentence occupies all T of the i-th sentence _i Weights in words. s is(s) _i Representing T in the comprehensive ith sentence _i And (3) a word vector, and calculating a vector representation of the ith sentence.

(3) Sentence Encoder (Sentence Encoder)

The sentence level coding and the word level coding adopt similar methods, and the calculated expression vectors of the L sentences are subjected to bidirectional GRU to obtain the context expression of each sentence, and the specific process is expressed as follows:

wherein s is _i Is the vector representation of the i-th sentence calculated above;and->As a result of the bi-directional GRU layer, a hidden vector representation of the ith sentence is represented.

(4) Sentence-level attention layer (Sentence Attention)

The sentence level attention and word level attention adopt similar methods, firstly, through a single-layer full-connection layer, then, the weight of the ith sentence in the document is obtained through calculation of a context vector us of the sentence boundary, and finally, a document vector d is obtained, which integrates all the information of the sentences in the document;

the specific process is expressed as follows:

u _i ＝tanh(W _sa h _i +b _sa )；

wherein W is _sa And b _sa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer, h _i Is the hidden vector of the ith sentence obtained by the bi-directional GRU layer mentioned in the above section (2); u (u) _i A vector representation representing the ith sentence obtained by a single fully connected layer. u (u) _s Representative of sentenceLevel context vector, alpha _i Representing the weight of the ith sentence in all L sentences of the whole document. d represents a document vector representation obtained by integrating all L sentence vectors of the entire document.

d _judge 、d _performance The coding vector representing the original congratulatory criminal period change and the modification expression is generated through all text coding steps of (1) - (4) of the static text coder.

b) (criminal name) entity embedded encoder

The crime name is limited and discrete entity data, the model regards a crime name as a word, and the crime name is also represented by a single-hot (one-hot) code, so that the crime name can be coded into a vector form by referring to a text coding mode, and is input into the model. Entity embedded Encoder (Crime Encoder) for criminal names (Crime) consulting hierarchical attention network model (Hierarchical Attention Networks for Document Classification) ^[1] The structures of (1) - (2), the resulting sentence code vector represents the code vector d of the Crime name (Crime) _crime 。

c) Dynamic text encoder (Dynamic Text Encoder)

Dynamic text encoder (Dynamic Text Encoder) of criminal facts (Fact), legal bases (arc) also references hierarchical attention network model (Hierarchical Attention Networks for Document Classification) ^)[1] . However, since the text of the Crime facts (Fact) and the legal evidence (rule) aims at a specific task of predicting the criminal period, only part of text information can play a key role, and the key information of the Crime facts and the legal evidence can be extracted according to the text information of the Crime names (Crime), the original criminal period change (Judge) and the transformation Performance (Performance) which are relatively related to the task of predicting the criminal period, so that the code vector of the text information can be formed. In order to efficiently combine the associated information of all documents to achieve the cross-document migration of information, the dynamic text encoder (Dynamic Text Encoder) employs a special process: receiving the coding results of a criminal name (Crime), an original criminal period change (Judge) and a modification Performance (Performance) text coder, dynamic Context vector (Context) depending on the remodeled document coding modelvector), embedded into the text codes of crime facts (Fact) and legal bases (arc), and obtaining the text code vector for the specific task of predicting the criminal investigation period.

Modifying word context vector u in text encoder encoding _w And u _s They are not randomly initialized during the training process, but are represented by vectors of related source documents, obtained through single-layer full-link training learning. The abstract dynamic context generation process is formulated as follows:

u _w ＝W _w d+b _w ；

u _s ＝W _s d+b _s ；

d represents the source document coding vector which needs to be referred to for extracting the related information, u _w And u _s Context vectors representing word and sentence levels, respectively; w (W) _w And b _w Weight parameters and bias parameters of single-layer full-connection layers of word layers respectively, W _s And b _s Respectively sentence level is connected in a single layer weight parameters and bias parameters of the layers.

Since extracting relevant information of Crime facts (Fact) generally requires combining key information of Crime names (Crime) of criminals in practice to obtain more accurate information of Crime facts (Fact), context vector (Context vector) of dynamic text encoder (Dynamic Text Encoder) thereof requires expression vector d of Crime names (Crime) _crime Carrying out single-layer full connection to obtain; in practice, the related information of the legal basis (Article) generally needs to be combined with the key information of the original criminal period variation (Judge) and the transformation Performance (Performance) to obtain the information of the more accurate legal basis (Article), so the Context vector (Context vector) of the dynamic text encoder (Dynamic Text Encoder) needs to be the expression vector d of the original criminal period variation (Judge) and the transformation Performance (Performance) _judge 、d _performance And (5) carrying out single-layer full connection.

Combining information of other documents the following vector is specifically formulated as follows:

wherein d is _crime 、d _judge And d _performance The code vectors are respectively obtained through a static text encoder (Static Text Encoder) and used for modifying the expression of criminal names and criminal period variation of the original criminal period.And->A context vector at word level for crime facts and laws, respectively; />And->Crime facts and laws depend on sentence-level context vectors, respectively.

The context vectors of Crime facts (Fact) and legal bases (decision) in the dynamic text encoder (Dynamic Text Encoder) are not directly randomly initialized, but are obtained by combining key information of Crime names (Crime), original criminal period change (Judge) and modification Performance (Performance) to perform training learning. The model can correlate and extract the crime facts and relevant information according to laws of the case to generate a text code which is more relevant to the crime-reduction information——d _fact 、d _article 。

Determination of model criminal investigation phase distribution

Finally, these related text information codes are concatenated d., and the criminal investigation period distribution of the input cases is predicted by a SoftMax classifier.

d＝concat(d _crime ，d _judge ，d _performance ，d _fact ，d _article )；

p＝softmax(W _c d+b _c )；

Wherein d represents the coding vector of a text obtained by a text encoder, d represents the coding vector required in the predicted criminal investigation period, W _c And b _c Is the weight parameter and the bias parameter of the single-layer full-connection layer.

The cross entropy loss (Categorical Cross Entropy) is used as the loss for training:

wherein x represents an input sample, C is the total number of the types of the criminal investigation period to be classified, y _i For the real criminal investigation period label category corresponding to the ith category, logf _i (x) And outputting a criminal investigation period category value for the corresponding model.

By minimizing cross entropy loss, model parameters are trained to obtain a model that can predict the distribution of the criminal investigation period.

The beneficial effects of the invention are as follows:

the novel cross-document criminal investigation prediction model is provided, and based on the model, relevant criminal document information can be better combined, and the accuracy of criminal investigation prediction period is improved.

Drawings

FIG. 1 is a diagram of a predictive recognition model of a multi-document criminal reduction case of the present invention.

FIG. 2 is a diagram illustrating an implementation of the present invention.

Detailed Description

Examples: the criminal investigation prediction model adopts Tensorflow2.0 (Keras) to realize training and prediction of the model and is based on Django deployment, an interface for model management configuration is provided for an administrator through Django Admin, and a restful API interface is provided for a user to call the model for prediction.

As shown in fig. 2, the model deployment is divided into a training end and a prediction end, wherein the training end deploys the trained model by researchers and can be directly used for prediction; and the subsequent manager is allowed to add data by himself to perform remote training.

Firstly, a training end needs to add a configured training model data (corresponding to dataset data defined by Django) and needs to set: model name, training process (Keras-based Python function defining a training process to get model), hyper-parameters (hyper-parameters required in model function), if uploading trained model, uploading: the method comprises the steps of training a model weight file, training a criminal name dictionary, training an original judgment information dictionary, training a transformation expression dictionary and a model label (label list of data corresponding to a trained model), and if the model to be trained is uploaded, setting: the batch size (the size of the training set is taken in each round of training), the Epochs (the round of training), the Verbose (whether the training process is visible in the server), the training progress (the training progress percentage is synchronously modified in the training process), and the data label is added to the model. The data tags may be uploaded in bulk, individually. The batch uploading file type is csv file. After the untrained model configuration and corresponding data set are uploaded, the model may be trained using a server.

The predictive end may directly input the pdf format of the original document, text content can be input according to the requirement, and a trained model result can be obtained. The pdf-format file is uploaded to, the background converts pdf into txt format files, then extracting the input content required by the model through regular matching based on rules, then obtaining a criminal investigation period prediction result by the model; and (5) typing in content information, selecting a corresponding trained model, and obtaining a criminal investigation period prediction result.

Reference is made to:

[1]Yang,Z.,et al.(2016).Hierarchical attention networks for document classification.Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics:human language technologies.

Claims

1. a crime-reduction period prediction method based on a deep learning technology is characterized in that a multi-document attention mechanism is specifically utilized to construct a crime-reduction prediction model, and according to basic crime-reduction document information, other document information related to crimes is migrated and combined to predict the crime-reduction period; the method comprises the following specific steps:

(one) collecting data and preprocessing

Firstly, collecting criminal document data, corresponding criminal document data and key legal bases, and preprocessing;

the pretreatment comprises the following steps: regular matching, namely comparing the material submitted during criminal investigation-criminal investigation (false release) audit list-with the content of a document, and confirming that the criminal name, the original criminal period change, the transformation performance, the criminal fact and the legal rules in the document are key parts for determining the criminal period according to the parts; extracting the key information from the criminal document and the corresponding criminal document through regular matching to obtain data required by the model; data cleaning; word segmentation and stop word removal;

(II) design model text encoder

The multi-document criminal investigation period prediction model is a text encoder, which consists of three parts, namely: a static text encoder, a criminal name entity embedded encoder and a dynamic text encoder; attention to the network model based on the hierarchy; the text encoder has the functions of inputting crime name, original criminal period change, transformation expression, crime facts and legal basis, and the corresponding encoder generates a coding vector-d _crime 、d _judge 、d _performance 、d _fact 、d _article ；

Wherein, the composition of the text encoder, namely the static text encoder, comprises the following parts: (1) a word sequence encoder; (2) a word-level attention layer; (3) a sentence encoder; (4) a sentence-level attention layer; the concrete explanation is as follows:

assume that a document has L sentences in a text _i Representing the ith sentence, wherein the ith sentence contains T _i Individual words, word _it (i∈[1，L]，t∈[1，T _i ]) Representing the t-th word in the i-th sentence; the hierarchical note network model can map the original text of each document into a vector representation; wherein:

(1) Word sequence encoder

Preprocessing sentences, converting words into word vectors by adopting an embedding matrix, and obtaining the context representation of the words through a bidirectional GRU;

the coding part introduces BERT pre-training, so that the model can transfer and learn the information of words and sentences of other texts, optimize the representation degree of some word and sentence vectors and further improve the accuracy of the model;

x _it ＝W _we word _it ，t∈[1，T _i ]；

wherein word _it Is the one-hot coded representation of the T-th word in the i-th sentence, the i-th sentence word length is T _i ，W _we Representing the embedded vector, the resulting x _it Namely the pre-training vector representation of the t word in the i sentence;and->Is double in numberThe result of the GRU layer is the hidden vector representation of the t word in the i sentence;

(2) Word level attention layer

Not all words have equal effect on the representation of sentence semantics; thus, attention mechanisms are introduced to extract words that are important for semantic representation of sentences, and summarizing the representations of the words to form sentence vectors; firstly, the hidden vector representation of the t word in the ith sentence in the text is passed through a single-layer full-connection layer to obtain the learning representation u as the t word in the ith sentence _it The method comprises the steps of carrying out a first treatment on the surface of the A context vector u representing word level is then initialized _w By calculating the similarity of the t word in the i-th sentence and the context vector of the word level and carrying out SoftMax normalization, the importance degree alpha of the t word in the i-th sentence where the t word is located can be measured _it The method comprises the steps of carrying out a first treatment on the surface of the Thereafter, weights of the weight-based word representations are calculated to obtain sentence vector s _i The method comprises the steps of carrying out a first treatment on the surface of the Wherein the context vector u _w "advanced representation" considered as a fixed problem, i.e. "which are useful words", the model is randomly initialized during training and combined with the learning word context vector u _w ；

The specific process is expressed as follows:

u _it ＝tanh(W _wa h _it +b _wa )；

wherein W is _wa And b _wa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer; h is a _it Is obtained from the bidirectional GRU layer mentioned in the above (1) the hidden vector of the t word in the i-th sentence; u (u) _it A representation representing a t-th word in an i-th sentence obtained through a single full-concatenation layer;u _w representative word level of context vector; alpha _it Representing that the T-th word in the i-th sentence occupies all T of the i-th sentence _i Weights in words; s is(s) _i Representing T in the comprehensive ith sentence _i The word vector is calculated, and the vector representation of the i-th sentence is calculated;

(3) Sentence encoder

wherein s is _i Is the vector representation of the i-th sentence calculated above;and->As a result of the bi-directional GRU layer, representing a hidden vector representation of the ith sentence;

(4) Sentence-level attention layer

Sentence-level attention and word-level attention are similarly performed by first passing through a single full-connection layer and then passing through the context vector u of sentence boundary _s Calculating to obtain the weight of the ith sentence in the document, and finally obtaining a document vector d which sums all the information of the sentences in the document;

the specific process is expressed as follows:

u _i ＝tanh(W _sa h _i +b _sa )；

wherein W is _sa And b _sa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer, h _i Is the hidden vector of the ith sentence obtained by the bi-directional GRU layer mentioned in the above section (2); u (u) _i A vector representation representing an ith sentence obtained through a single fully connected layer; u (u) _s Context vector, alpha, representing sentence level _i Representing the weight of the ith sentence in all L sentences of the whole document; d represents a document vector representation obtained by integrating all L sentence vectors of the whole document;

d _judge 、d _performance the coding vector representing the original congratulatory period variation and the modification expression is generated through all text coding steps of (1) - (4) of the static text coder;

the method comprises the steps that the composition of a text encoder, namely a criminal name entity embedded encoder, refers to a static text encoding mode, a model regards a criminal name as a word, the criminal name is represented by single-hot encoding, the criminal name is encoded into a vector form, and the vector form is input into the model; the entity embedded encoder of the crime refers to the structures of (1) - (2) of the hierarchical attention network model, and the generated sentence code vector represents the code vector d of the crime _crime ；

Wherein, the composition of the text encoder, namely the dynamic text encoder, also refers to the hierarchical attention network model; because of the text according to crime facts and laws, aiming at a specific task of predicting criminal investigation period, only part of text information can play a key role, and the key information according to crime facts and laws can be extracted according to the text information of crime names, original criminal investigation period changes and modification performances which are relatively related to the task of predicting criminal investigation period, so that the code vector is formed; in order to effectively combine the associated information of all documents and realize the cross-document migration of information, the dynamic text encoder adopts a special process: receiving the code results of a crime name, the change of an original criminal investigation period and the modification expression text coder, and embedding the code results into the text codes according to crime facts and laws by means of the dynamic context vector of the modified document code model to obtain a text code vector aiming at the specific task of predicting the criminal investigation period;

modifying word context vector u in text encoder encoding _w And u _s They are not randomly initialized in the training process, but are represented by vectors of related source documents, and are obtained through single-layer full-connection layer training learning; the abstract dynamic context generation process is formulated as follows:

u _w ＝W _w d+b _w ；

u _s ＝W _s d+b _s ；

d represents the source document coding vector which needs to be referred to for extracting the related information, u _w And u _s Context vectors representing word and sentence levels, respectively; w (W) _w And b _w Weight parameters and bias parameters of single-layer full-connection layers of word layers respectively, W _s And b _s Weight parameters and bias parameters of a single-layer full-connection layer of a sentence layer respectively;

in a cross-document criminal period predication model, a specific formula of a context vector combined with information of other document documents is as follows:

wherein d _crime 、d _j u _dge And d _perfomance The code vectors are respectively obtained through a static text encoder, the crime name, the original criminal judgment period variation and the transformation expression;and->A context vector at word level for crime facts and laws, respectively; />And->Crime facts and laws are based on sentence-level context vectors, respectively;

the context vector according to crime facts and laws in the dynamic text encoder is not directly and randomly initialized, but is obtained by combining key information of crime names, original criminal period change and modification expression, and training and learning are performed; the model can correlate the crime facts of the extracted cases and the related information according to the laws, and generate a text code-d with more related criminal investigation information _fact 、d _article ；

Determination of model criminal investigation phase distribution

Finally, these related text information codes are concatenated d., and the criminal investigation period distribution of the input case is predicted by a SoftMax classifier:

d＝concat(d _crime ，d _judge ，d _performance ，d _fact ，d _article )；

p＝softmax(W _c d+b _c )；

wherein d. tableThe coding vector obtained by a text through a text coder is shown, d represents the coding vector required by the predictive criminal investigation period, W _c And b _c The weight parameter and the bias parameter of the single-layer full-connection layer;

the cross entropy loss is used as the loss for training:

wherein x represents an input sample, C is the total number of the types of the criminal investigation period to be classified, y _i For the real criminal investigation period label category corresponding to the ith category, logf _i (x) Outputting a criminal investigation period category value for the corresponding model;