CN110119449A

CN110119449A - A kind of criminal case charge prediction technique based on sequence enhancing capsule net network

Info

Publication number: CN110119449A
Application number: CN201910396510.8A
Authority: CN
Inventors: 彭黎; 何从庆
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2019-08-13
Anticipated expiration: 2039-05-14
Also published as: CN110119449B

Abstract

The present invention relates to intelligent legal field more particularly to a kind of criminal case charge prediction techniques based on sequence enhancing capsule net network.The following steps are included: S1 construct training dataset, obtain case the fact description and charge penalty result as training data；S2 building sequence enhancing capsule net network model is simultaneously trained by training data；S3 enhances capsule net network model by the sequence of S2 after training, the fact that new case is described text input into sequence capsule network model, the corresponding charge of model automatic Prediction is as charge prediction result.Model proposed by the present invention can not only preferably capture the notable feature and semantic information of Law Text, and have preferable competitiveness in low frequency charge forecasting problem；Focal loss loss function is introduced, as the loss function of sequence enhancing capsule net network model, further alleviates the charge height imbalance problem of low frequency charge prediction task.

Description

A kind of criminal case charge prediction technique based on sequence enhancing capsule net network

Technical field

The present invention relates to intelligent legal field more particularly to a kind of criminal case charges based on sequence enhancing capsule net network Prediction technique.

Background technique

In recent years, quantum jump is obtained using deep learning and natural language processing as the artificial intelligence technology of representative, started Show up prominently in intelligent legal field, receives the extensive concern of academia and industrial circle.Intelligent law assigns machine and understands method The ability restrained text, analyze case can carry out intelligence according to case and handle a case.

Automatic charge prediction plays in Legal Assistant's system as most one of the subtask of table in intelligence law Important role, also have a wide range of applications in real life.For example, it can be law expert (such as lawyer and judge) The charge reference of case defendant is provided, assists judge to decide a case with this, improves working efficiency；It can be simultaneously to be unfamiliar with law art The ordinary people of language and complicated process provides legal advice.Automatic charge prediction is to utilize machine learning or depth learning technology training The charge (such as steal, plunder, traffic accident) of machine court conclusion part defendant.Previous studies work proposes many The method for realizing automatic charge prediction.These methods are broadly divided into three classes: (1) conventional method；(2) machine learning method；(3) deep Spend learning method.

Conventional method is frequently with mathematical formulae or quantitative calculating.Kort[Fred Kort.Predicting Supreme Court decisions mathematically:A quantitative analysis of the“right to 1957,51 (1): counsel " cases.American Political Science Review 1-12] attempts with quantitative Method prediction is typically considered to the uncertain People events of height, the i.e. judgement of the US Supreme Court.The research is intended to prove, until Less in a field of the judicial review, the fact that influence decision factor is determined with the case of some decisions, is asked with formula Then the numerical value of these factors out is correctly predicted the decision of remaining case in specified field.Nagel[Stuart S Nagel.Applying correlation analysis to case prediction.Tex.L.Rev.,1963,42: 1006] think scientifically to predict lawsuit as a result, he is demonstrated using the example redistributed by occurring in case Four variables distribute related coefficient, and prediction is possible.This prediction will be helpful to the party of planning lawsuit, understand judicial journey The theoretician of sequence explains the judicial legislator reacted and seeks the public to abide by the law.Keown[R Keown.Mathematical models for legal prediction.Computer/LJ, 1980,2:829] it proposes Mathematically predict the feasibility of judicial decision.He using Haar, Sawyer and Cummings linear model method and The nearest neighbor method of Mackaay and Robillard is in case a more than 1000, correctly predicted 99% decision.It is this to be successfully Real chance and urgent need are provided in other special dimensions exploitation linear model, this is not only for from empirically testing It is typically effective to demonstrate,prove this method, but also provides additional prediction model for legal industry.These are traditional Method achieves some effects in certain scenes, but they are only limitted to the small data set with a small amount of label.

Success due to machine learning in many fields, it is pre- to handle charge that researcher begins to use machine learning method It surveys.This kind of work, which is usually laid particular emphasis on, extracts feature from the case fact, is then predicted using machine learning algorithm.Liu et al. People [Chao-Lin Liu, Cheng-Tsung Chang, Jim-How Ho.Case instance generation and refinement for case-based criminal summary judgments in Chinese.2004.,Chao- Lin Liu,Chwen-Dar Hsieh.Exploring phrase-based classification of judicial documents for criminal charges in chinese.In:Proc of International Symposium On Methodologies for Intelligent Systems.Springer, 2006,681-690] one kind is proposed to be based on K-Nearest Neighbor (KNN) algorithm, for automatically generating and refining from the judgement text of real world for criminal The case example of summary judgement.The algorithm attempts to extract important legal information from past charging document to construct case reality Then example deletes relatively incoherent information through the similar case of merging and from case to refine these case examples.Lin Et al. [Wan-Chen Lin, Tsung-Ting Kuo, Tung-Jia Chang.Exploiting machine learning models for Chinese legal documents labeling,case classification,and Sentencing prediction.ROCLING XXIV (2012), 2012.140] for " crime of robbery " and " blackmail crime " Define 21 kinds of law element labels, then classified using law element information classify " crime of robbery " and " blackmail crime " with And predict that this two kinds guilty are sentenced the prison term.Mackaay et al. [Ejan Mackaay, Pierre Robillard.Predicting judicial decisions:The nearest neighbor rule and visual representation of Case patterns.1974] feature extracted by the semantic similar N-grams of cluster.Sulea et al. [Octavia- Maria Sulea,Marcos Zampieri,Shervin Malmasi,et al.Exploring the Use of Text Classification in the Legal Domain.CoRR, 2017, abs/1710.09306] utilize the French Supreme Judicial Court Case and ruling, investigated file classification method in the application of legal field, then proposed a kind of based on support vector machines Case description, time span and adjudicate feature decision system, with predict case legal field and judgement aspect it is accurate Property.However, these methods only extract shallow-layer text characteristics or hand labeled, it is difficult to collect these features on big data set. Therefore, when data volume is very big, their performance will not be fine.

In recent years, as deep neural network is in natural language processing (NLP), computer vision (CV) and voice field Success, a few thing starts to apply it in automatic charge prediction task, and shows huge performance boost.Luo et al. [Bingfeng Luo,Yansong Feng,Jianbo Xu,et al.Learning to Predict Charges for Criminal Cases with Legal Basis.arXiv preprint arXiv:1707.09168,2017.] think phase It closes legal provision and very important effect is played to charge prediction task in this task.Therefore it proposes a kind of based on attention Neural network method, charge prediction task and related provision are extracted into task and carry out joint modeling under unified frame, from And the appropriate charge of different expression way cases can be effectively predicted.However, this work not can solve low frequency charge prediction with And the problem of multiple charge prediction.Zhong et al. [Haoxi Zhong, Guo Zhipeng, Cunchao Tu, et al.Legal Judgment Prediction via Topological Learning.In:Proc of Proceedings of the 2018Conference on Empirical Methods in Natural Language Processing.2018,3540– 3549] by relying on and closing in view of charge, law article, fine, the topology between punishment these subtasks of time limit in legal provision System proposes a kind of frame of topological multi-task learning, and the dependence of multiple subtasks is integrated in charge judgement prediction. Hu et al. [Zikun Hu, Xiang Li, Cunchao Tu, et al.Few-shot charge prediction with discriminative legal attributes.In:Proc of Proceedings of the 27th International Conference on Computational Linguistics.2018,487-498] for low frequency crime Name is predicted and holds confusing charge, introduces between the fact that several discrimination properties of charge are as case description and charge Internal maps, these attributes provide additional information for low frequency charge and the validity feature of charge is obscured in differentiation, then propose A kind of Attribute-attentive charge prediction model comes while inferring attribute and charge.By studying above-mentioned scholar The further analysis of content it can be found that academic circles at present and industry although have been proposed it is a series of based on deep learning Automatic charge prediction algorithm, and make a lot of progress.But existing method is still in Shortcomings: (1) existing big portion The low frequency charge scene that [9,10] ignore automatic charge prediction task is made in the division of labor, takes into consideration only high frequency charge scene, therefore not It can solve low frequency charge forecasting problem.(2) Hu et al. [11] is using manually generated auxiliary information in low frequency charge field It achieves good results in scape, however, artificial markup information wastes a large amount of time, and can not realize deep end to end Spend learning model.

National inventing patent application " a kind of criminal case charge prediction technique based on Memory Neural Networks " (publication date: 2019.02.22 training dataset) is built using the description of the merit of standard and its charge as training data, passes through training dataset pair The Memory Neural Networks model constructed is trained, and " merit Expressive Features vector "-" charge coding " is refreshing to memory is converted to Key-value pair through storing in network model judges criminal case charge using multi-layer perception (MLP) classifier, and this method mentions Although model out also can be carried out prediction to low frequency charge, memory module needs to compare the charge of true charge and prediction Relationship, however low frequency charge data volume is less, even only there was only several cases in the charge of part, therefore, it is difficult to low The effect got in frequency charge prediction scene.

Summary of the invention

It analyses in depth by the research achievement to above-mentioned lot of domestic and foreign scholar, existing in the prior art is asked for above-mentioned Topic, the present invention proposes a kind of criminal case charge prediction technique based on sequence enhancing capsule net network, to alleviate in criminal case Low frequency charge forecasting problem.

To achieve the goals above, the technical solution adopted by the present invention is a kind of punishment based on sequence enhancing capsule net network Thing case charge prediction technique, comprising the following steps:

The fact that S1 constructs training dataset, obtains case description and charge penalty result are as training data；

S2 building sequence enhancing capsule net network model is simultaneously trained by training data, comprising the following steps:

S2.1, which constructs sequence, enhances capsule net network model, the specific steps are as follows:

S2.1.1 constructs initial capsule layer: text described to the fact that case and is segmented, and is mapped as term vector sequence, As initial capsule layer u={ u₁,u₂,…,u_n}；

S2.1.2 constructs Multiple seq-caps layers: by the initial capsule layer u obtained to S2.1.1, utilizing Seq-caps layers of extraction feature of Multiple, obtain the principal eigenvector that case facts describe text, the Multiple Seq-caps layers are formed by two seq-caps layers；

S2.1.3 constructs the residual unit layer (attention layers) based on attention mechanism, obtains to S2.1.1 initial Capsule layer u uses attention mechanism, obtains the supplemental characteristic vector c that case facts describe text:

Described attention layers as follows: by n initial capsule u in initial capsule layer u_i, (i=1,2 ..., n) passes through Weight matrix W obtains a vector e after matrixing_i, then to vector e_i, by softmax function, obtain each A initial capsule u_iImportance weight α_i, all initial capsules are added according to importance weight, case facts is finally obtained and retouches State the supplemental characteristic vector c of text；Formula is as follows:

e_i=tanh (Wu_i+b)

Wherein W is weight matrix, and b is bias vector.

S2.1.4 construct output layer, the case facts that S2.1.2 is obtained describe text principal eigenvector and The supplemental characteristic vector c that the case facts that S2.1.3 is obtained describe text combines, and is conveyed to and connects layer network entirely.

S2.2 training sequence enhances capsule net network model；

S3 enhances capsule net network model by the sequence of S2 after training, and the fact that new case is described text input and is arrived In sequence capsule network model, the corresponding charge of model automatic Prediction is as charge prediction result.

Further, the data set in S1 nets disclosed true criminal case, every case from Chinese judgement document Including two parts: description and charge penalty are as a result, as training data for the fact that case.

Further, it segments in S2.1.1 using Peking University Open-Source Tools pkuseg, and utilizes Embedding skill The term vector of Word2vec training is mapped as term vector sequence by art.

Further, capsule net network model is enhanced using focal loss loss function training sequence in S2.2.

Compared with prior art:

(1) the invention proposes a kind of sequences to enhance capsule net network model, which can not only preferably capture law The notable feature and semantic information of text, and there is preferable competitiveness in low frequency charge forecasting problem.

(2) focal loss loss function is introduced, as the loss function of sequence enhancing capsule net network model, further Alleviate the charge height imbalance problem of low frequency charge prediction task.

(3) by comparing current state-of-the-art method, sequence proposed by the present invention increases capsule network model and is really counting It is promoted according to the F1 for realizing 4.5% and 6.4% in collection Criminal-S and Criminal-L respectively.Experimental result unanimously demonstrates Sequence enhances capsule net network model and is solving superiority and competitiveness in low frequency charge scene.

Detailed description of the invention

Fig. 1 is the flow chart of the method for the invention；

Fig. 2 is the schematic diagram of sequence capsule network model of the present invention；

Fig. 3 is the schematic diagram of Seq-caps layer of the invention；

Fig. 4 is the schematic diagram of Attention layer of the invention.

Specific embodiment

The present invention is further elaborated with specific embodiment with reference to the accompanying drawings of the specification.

Brief flow diagram of the invention is as shown in Figure 1, the present invention is based on the crime of the criminal case of sequence capsule network model Name prediction technique the following steps are included:

The present invention tests in disclosed three real data sets, these data sets are both from Chinese judgement document The description of the fact that three criminal cases disclosed in net, acquisition case and charge penalty result are as training data；Due to public affairs The main charge of case is only remained in the data set opened, therefore it may only be necessary to by each charge be mapped as a unique integer into Row coding.

S2.1, which constructs sequence, enhances capsule net network model, and sequence enhancing capsule net network model of the invention is as shown in Figure 2.Structure Build the model the following steps are included:

S2.1.1 constructs initial capsule layer: describing text to the fact that case and segments, and utilizes Embedding technology The term vector of Word2vec training is mapped as term vector sequence, as initial capsule layer u={ u₁,u₂,…,u_n}。

S2.1.2 constructs Multiple seq-caps layers, by the initial capsule layer u obtained to S2.1.1, utilizes Seq-caps layers of Multiple obtain the principal eigenvector that case facts describe text.

Described Multiple seq-caps layers are formed by two seq-caps layers, for each seq-caps layers, are such as schemed Shown in 3, by a sequence information encoder (Sequence Information Encode) and a dynamic routing converter (Dynamic Routing) composition.The present invention uses shot and long term memory network (LSTM) as sequence information encoder.With first For seq-caps layers a, by initial capsule layer u={ u₁,u₂,…,u_nBe passed in seq-caps layers, shot and long term memory network Formula it is as follows:

f_t=σ (W_fu_t+U_fh_t-1+b_f),

i_t=σ (W_iu_t+U_ih_t-1+b_i),

o_T=σ (W_ou_t+U_oh_t-1+b_o),

h_t=O_tOtanh(c_t)

H is solved by above-mentioned formula_tThe sequence information at moment, wherein f_t、i_t、o_tIt is forgetting door, the input of LSTM respectively Door, out gate,Indicate be currently can the moment candidate value, c_tIndicate the state at current time, h_tIndicate the output at current time Value, W_f、W_i、W_o、W_cIndicate weight matrix, U_f、U_i、U_o、U_cIndicate weight matrix, b_f、b_i、b_o、b_cIndicate bias vector, u_t Indicate current input value, c_t-1Indicate the state of last moment, h_t-1Indicate the output valve of last moment, σ indicates sigmoid letter Number.

Then the output of sequence information encoder is passed in dynamic routing converter, first by low layer capsule u_j|iPass through Matrix w_jIt is mapped to low layer capsule copy.Then, low layer capsule copy utilizes Dynamic routing mechanisms by u_j|iAggregate into high-rise capsule Layer, in this step, has obtained the output v={ v of dynamic routing converter₁,v₂,…,v_n, v indicates that case facts describe text Principal eigenvector.

S2.1.3 constructs the residual unit layer (attention layers) based on attention mechanism, to initial capsule layer u={ u₁, u₂,…,u_nAttention mechanism is used, obtain the supplemental characteristic vector c that case facts describe text.

Described attention layers is as shown in Figure 4:

By n initial capsule u in initial capsule layer u_i, (i=1,2 ..., n) obtains a warp by weight matrix W Vector e after crossing matrixing_i, then to vector e_i, by softmax function, obtain each initial capsule u_iImportance Weight α_i, according to importance weight by all initial capsule addition of vectors, finally obtain the auxiliary spy that case facts describe text Levy vector c；Formula is as follows:

e_i=tanh (Wu_i+b)

Wherein W is weight matrix, and b is bias vector.

S2.2 training sequence enhances capsule net network model: the sequence obtained using focal loss loss function training S2.1 Enhance capsule net network model.The focal loss loss function formula is shown below:

Wherein,It is the model estimated probability being calculated by softmax function, α is focal loss α-balanced variable.It is a regulatory factor, γ (γ ≠ 0) is adjustable parameter, in order to improve and adjust Save the effect of the factor.

In order to illustrate the validity of the criminal case charge prediction technique proposed by the present invention based on sequence capsule network, originally Invention is by it with the file classification method of several classics and existing two state-of-the-art charge prediction techniques in three data Concentration is compared.In addition, we carry out in order to prove the validity of model of the invention in terms of handling the prediction of low frequency charge The charge prognostic experiment of one group of different frequency.

Table 1 shows the result of the baseline model based on three data sets.Generally speaking, proposed by the present invention to be based on sequence Performance of the criminal case charge prediction technique of capsule network on three data sets is better than all baselines, has significant excellent Gesture.Specifically, compared with state-of-the-art charge prediction technique before, model of the invention utilizes F1 evaluation index, in three numbers According to 4.5%, 2.5% and 6.4% absolutely considerable improvement is obtained on collection respectively, illustrate proposed by the present invention based on sequence glue Validity of the criminal case charge prediction technique of keed network to charge prediction task.This trend shows base proposed by the present invention Can capture the height that vital Law Text is predicted charge in the criminal case charge prediction technique of sequence capsule network Grade semantic expressiveness.

Table 1: the charge prediction result under real data set compares, and wherein MP indicates that macro precision, MR are indicated Macro recall, F1 indicate macro f1.

Low frequency charge compares

Table 2: the low frequency charge under real data set compares

The criminal case charge prediction technique based on sequence capsule network proposed in order to further illustrate the present invention is being located The validity in terms of low frequency charge is managed, We conducted the charge split-run tests of one group of different frequency.Charge is pressed frequency by us It is divided into three parts (low frequency, intermediate frequency and high frequency).Low frequency is defined as the charge occurred in all data sets less than 10 times (containing 10 times), High frequency is defined as the charge occurred in all data sets greater than 100 times (in addition to 100 times), other then belong to intermediate frequency.

Table 2 shows that the criminal case charge prediction technique proposed by the present invention based on sequence capsule network exists Performance on Criminal-S data set under different frequency, we compare model of the invention and state-of-the-art charge predicts mould Type and state-of-the-art textual classification model are in the low frequency of macro-f1, intermediate frequency and high frequency result.As can be seen from the table, low frequency Macro-f1 be 53.8%, improve 65% or more than LSTM-200 model, improved than state-of-the-art charge prediction model 4.1%.With the help of SECaps model, low frequency charge forecasting problem is not only alleviated, but also proposes one kind end to end Model reduces artificial data label.Wherein there is SECaps model stronger vector to indicate that ability and sequence indicate ability, Focal loss has preferable performance in the problem that processing classification is uneven and classification is difficult, can alleviate the prediction of low frequency charge Deficiency.

Claims

1. it is a kind of based on sequence enhancing capsule net network criminal case charge prediction technique, which is characterized in that this method include with Lower step:

S2.1.1 constructs initial capsule layer: describing text to the fact that case and segments, and be mapped as term vector sequence, by it As initial capsule layer u={ u₁, u₂..., u_n}；

S2.1.2 constructs Multiple seq-caps layers: by the initial capsule layer u obtained to S2.1.1, utilizing Multipleseq-caps layers of extraction feature, obtain the principal eigenvector that case facts describe text；

S2.1.3 constructs attention layers, uses attention mechanism to the initial capsule layer u that S2.1.1 is obtained, obtains case thing The supplemental characteristic vector c of real description text；

S2.1.4 constructs output layer, and the case facts that S2.1.2 is obtained describe the principal eigenvector of text and S2.1.3 is obtained To case facts describe the supplemental characteristic vector c of text and combine, and be conveyed to and connect layer network entirely；

S2.2 training sequence enhances capsule net network model；

S3 enhances capsule net network model by the sequence of S2 after training, and the fact that new case is described text input to sequence In capsule network model, the corresponding charge of model automatic Prediction is as charge prediction result.

2. a kind of criminal case charge prediction technique according to claim 1 based on sequence enhancing capsule net network, feature Be: the data set in S1 nets disclosed true criminal case from Chinese judgement document, and every case includes two parts: The description of the fact that case and charge penalty are as a result, as training data.

3. a kind of criminal case charge prediction technique according to claim 1 based on sequence enhancing capsule net network, feature Be: participle is using Peking University Open-Source Tools pkuseg in S2.1.1, and utilizes Embedding technology by Word2vec Trained term vector is mapped as term vector sequence.

4. a kind of criminal case charge prediction technique according to claim 1 based on sequence enhancing capsule net network, feature Be: in S2.1.2, described Multiple seq-caps layers are formed by two seq-caps layers.

5. a kind of criminal case charge prediction technique according to claim 4 based on sequence enhancing capsule net network, feature Be: each seq-caps layers is made of a sequence information encoder and a dynamic routing converter.

6. a kind of criminal case charge prediction technique according to claim 5 based on sequence enhancing capsule net network, feature It is: using shot and long term memory network as sequence information encoder.

7. a kind of criminal case charge prediction technique according to claim 1 based on sequence enhancing capsule net network, feature Be: in S2.1.3, described attention layers is as follows: by the initial capsule ui of n in initial capsule layer u, (i=1,2 ..., N) by weight matrix W, a vector e after matrixing is obtained_i, then to vector e_i, by softmax function, obtain To each initial capsule u_iImportance weight α_i, all initial capsules are added according to importance weight, finally obtain case The supplemental characteristic vector c of fact description text；Formula is as follows:

e_i=tanh (Wu_i+b)

Wherein W is weight matrix, and b is bias vector.

8. a kind of criminal case charge prediction technique according to claim 1 based on sequence enhancing capsule net network, feature It is: in S2.2, capsule net network model is enhanced using focal loss loss function training sequence.

9. a kind of criminal case charge prediction technique according to claim 8 based on sequence enhancing capsule net network, feature Be: the focal loss loss function formula is shown below:

Wherein,It is the model estimated probability being calculated by softmax function, α is the α-of focal loss Balanced variable,It is a regulatory factor, γ (γ ≠ 0) is adjustable parameter, in order to improve and adjust The effect of the factor.