CN111753059A

CN111753059A - Neural Embedding-based intelligent analysis method for judicial cases

Info

Publication number: CN111753059A
Application number: CN202010626120.8A
Authority: CN
Inventors: 肖利; 蒋欣辰; 曾芳伊雯
Original assignee: Hangzhou Shufeng Technology Co ltd; Chengdu Ruima Technology Co ltd
Current assignee: Hangzhou Shufeng Technology Co ltd; Chengdu Ruima Technology Co ltd
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2020-10-09

Abstract

A Neural Embedding-based intelligent analysis method for judicial cases adopts collecting criminal law documents and preprocessing case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; and calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment. The invention assists judges the criminals and lawyer professionals in terms of criminal names, recommends related law rules and judges the law in terms of criminal period, thereby saving the time and labor cost for case examination, improving the accuracy of judge law, and enabling the criminals and lawyer professionals to finish judge law more efficiently.

Description

Neural Embedding-based intelligent analysis method for judicial cases

Technical Field

The invention relates to the technical field of intelligent judicial judgment assistance, in particular to a Neural Embedding-based intelligent analysis method for judicial cases.

Background

The traditional case examination and judicial judgment rely on the professional solution and debate process of judges, lawyers and persons related to the inspection and official laws. In the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination. And for the same case, different law practitioners have great difference and strong controversy on the criminal result of the case, and professionals who deal with the case have strong subjectivity on the case. Therefore, an intelligent judicial judgment method based on embedding is designed, and the judicial officers and lawyer professionals are assisted to judge the names of the criminals, recommend related legal regulations and judge the criminal period, so that the time and labor cost for examining the cases are saved, the accuracy of judicial judgment is improved, and the officers and the lawyer professionals can finish the legal judgment more efficiently.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a Neural Embedding-based intelligent analysis method for judicial cases, and the method adopts the steps of collecting criminal legal documents and preprocessing case situation fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.

The technical scheme adopted by the invention for solving the problems is as follows:

a Neural Embedding-based intelligent analysis method for judicial cases comprises the following steps,

s1: collecting criminal law documents and preprocessing case fact description documents;

s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode;

s3: and calculating the similarity between the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2, and making case judgment by professionals such as judges, lawyers and the like according to the similarity.

Further, the preprocessing of the case fact description document in the step S1 includes the following sub-steps,

s101: filtering stop words and special symbols in case fact description documents, reserving Chinese character numbers and punctuation mark information, and segmenting words of texts;

s102: and discretizing the criminal sensitive word number in the case fact description text.

Further, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.

A specific method of obtaining the Embedding vector in step S2 includes the following steps,

s201: firstly, acquiring a vector of a stable entity a by adopting a characteristic summation mode;

s202: then, mapping out a vector of a tag entity b consistent with the dimension of the document entity;

s203: and finally, establishing a loss function for the vector of the document entity a and the vector of the label entity b to obtain an Embedding vector.

S201 comprises the following specific steps of using N-Gram in a word bag form as the characteristics of case description documents, wherein each document is composed of a group of documents fromDescribing discrete features of a fixed-length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as vectors of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where F_iIndex ith^thCharacteristic, ∑_i∈aF_iAs a vector for document entity a;

s202 comprises the following specific steps of taking all discrete values of crime names and criminal period long and short elements of the law clauses, the judged notifier as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;

the specific steps of S203 are as follows, a loss function is established for the document entity a and the tag entity b, see formula (1):

this formula consists of the following parts:

1) generating a positive case entity pair (a, b) from the data, wherein a is a document entity described by case facts, and b is a label entity corresponding to the document entity a, such as a french article corresponding to the document entity a, a crime name of a judged notifier and a criminal period long and short label;

2) obtaining negative example entity b from random sampling in label not belonging to document entity a^-Forming negative case entity pair (a, b)^-)；

3) Calculating the similarity of different entity pairs by using the cosine similarity;

4) loss function L^batchCalculating the loss by comparing the difference between the positive and negative example entity pairs; here, margin ranking loss is used as a loss function, see equation 2:

wherein B is the batch size, N is the number of negative cases, p_bRepresents a positive case entity pair, p_nRepresenting negative example entity pairs, the score function is a similarity function of the entity pairs, the loss function is optimized by random gradient descent,so that entity vectors with correlation have higher similarity.

In conclusion, the beneficial effects of the invention are as follows:

the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently.

Drawings

FIG. 1 is a schematic flow chart of the steps of the present invention.

Detailed Description

In order to solve the problem that the traditional case examination and judicial judgment in the prior art depends on professional answers and debate processes of judges, lawyers and related persons of the inspection official laws, in the face of complicated criminal cases, a large amount of manpower and material resources are needed for examination, different law practitioners have great difference and strong dispute on the criminal results of the cases for the same case, and the case-handling professional persons have strong subjectivity on the case; the method collects criminal law documents and preprocesses case fact description documents; taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; calculating the similarity of the document entity and the tag entity for the new data by using the trained Embedding vector, thereby making case judgment; the judicial officer and lawyer professional are assisted to judge the names of the criminals, recommend related legal rules and judge the judicial judgment of the criminal period, so that the time and labor cost for case examination are saved, the accuracy of judicial judgment is improved, and the judicial officer and lawyer can finish the legal judgment more efficiently. The present invention will be described in further detail with reference to the following examples and accompanying drawings, but the embodiments of the present invention are not limited thereto, and the drawings are only an example of the application of the present invention and do not essentially restrict the principle of the present invention.

Example (b):

as shown in fig. 1, a Neural Embedding-based intelligent analysis method for judicial cases includes the following steps,

s1: collecting criminal law documents and preprocessing case fact description documents; each case consists of case description and fact parts in a legal document and also comprises the legal provision related to each case, the name of the crime judged by the notifier, the duration of the criminal period and other elements;

s2: taking the preprocessed case fact description text as a document entity, taking the legal provision related to the case, the criminal name of the notifier and the long and short elements of the criminal period as a plurality of label entities corresponding to the document entity, and training the Embelling vectors of the document entity and the label entities by using a random gradient descent mode; because there is certain correlation among the law strip, the crime name of the judged notifier and the criminal phase, the three elements are combined with the case fact description text for combined training to obtain better effect, that is, all the document entities and the label entities are embedded into the same space, so that the distance between the entity vectors with the correlation is closer;

Further in this embodiment, the preprocessing of the case fact description document in step S1 includes the following sub-steps,

Further in the present embodiment, the criminal-sensitive words described in step S102 include three types of money, weight, and blood alcohol concentration.

Further in this embodiment, step S2 further includes the following sub-steps,

s201, using N-Gram to describe characteristics of documents in a word bag mode, wherein each document is described by a group of discrete characteristics from a fixed-length dictionary, allocating a D-dimensional vector to each characteristic, then summing the characteristics to be used as a vector of document entities, and assuming that the characteristics of a dictionary D are defined as a matrix F of D × D, wherein F is_iIndex ith^thCharacteristic, ∑_i∈aF_iAs a vector for document entity a;

s202: taking all discrete values of the criminal names and the long and short criminal periods of the law clauses and the referees as different label entities b, and mapping all the label entities into different vectors with the same dimension as the document entities;

s203, establishing a loss function for the document entity a and the label entity b, and obtaining a result according to the formula (1):

this formula consists of the following parts:

wherein B is the batch size, N is the number of negative cases, p_bRepresents a positive case entity pair, p_nAnd representing negative example entity pairs, wherein the score function is a similarity function of the entity pairs, and the loss function is optimized through random gradient descent, so that the entity vectors with correlation have higher similarity.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A Neural Embedding-based intelligent analysis method for judicial cases is characterized by comprising the following steps,

s3: and calculating the similarity of the document entity and the label entity for the new data by using the Embedding vector obtained by training in the S2.

2. A Neural Embedding-based intelligent analysis method for judicial cases according to claim 1, wherein the preprocessing of case fact description documents in step S1 includes the following sub-steps,

3. The intelligent forensic case-based analysis method according to claim 2 in which the criminal sensitive words in step S102 comprise three types of money, weight and blood alcohol concentration.

4. The judicial case based Neural Embedding intelligent analysis method according to claim 1, wherein the specific method for obtaining the Embedding vector in step S2 comprises the following steps,

5. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S201 are as follows,

using N-Gram in bag of words as a feature for case describing documents, each document being described by a set of discrete features from a fixed length dictionary, assigning a D-dimensional vector to each feature, and then summing the features as a vector of document entities, assuming that the features of dictionary D are defined as a matrix F of D × D, where F is_iIndex ith^thCharacteristic, ∑_i∈aF_iAs a vector of document entity a.

6. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S202 are as follows,

all discrete values of the criminal names and the long and short criminal period elements of the law clauses, the judged notifiers are used as different label entities b, and all the label entities are mapped into different vectors with the same dimension as the document entities.

7. The judicial case based Neural Embedding intelligent analysis method according to claim 4, wherein the specific steps of S203 are as follows,

establishing a loss function for the document entity a and the tag entity b, and obtaining a loss function according to formula (1):

this formula consists of the following parts:

wherein B is the batch size, N is the number of negative cases, p_bRepresents a positive case entity pair, p_nRepresenting negative example entity pairs, score function being a similarity function of the entity pairs, and loss function being optimized by random gradient descent to haveCorrelated entity vectors have higher similarity.