CN113033176B

CN113033176B - Court case judgment prediction method

Info

Publication number: CN113033176B
Application number: CN202110548108.4A
Authority: CN
Inventors: 姜森; 谢绍韫
Original assignee: Suzhou Black Cloud Intelligent Technology Co ltd
Current assignee: Suzhou Black Cloud Intelligent Technology Co ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-09-17
Anticipated expiration: 2041-05-19
Also published as: CN113033176A

Abstract

The invention relates to a court case judgment and prediction method, and belongs to the field of machine learning. The method comprises the steps of analyzing and extracting characteristics of a case full text, training a prediction model based on extracted training data, predicting a case criminal period and assisting court staff in case reference; the method specifically comprises the following steps: the case guilty name is analyzed by adopting a rule-based method and a similarity model; extracting characteristics; vectorizing the extracted key features into numerical values; training the prediction model, and training the processed features by adopting a Support Vector Machine (SVM) algorithm and a Logistic Regression (LR) algorithm; predicting case situation and sentencing; and calling the prediction model to obtain a prediction result through the text, the file and the option information input by the user interface, and displaying the prediction result on a user page. The invention effectively avoids the phenomenon of 'same case and different judgments', and remarkably improves the accuracy and the operation efficiency of criminal period prediction compared with the prior art.

Description

Court case judgment prediction method

Technical Field

The invention belongs to the field of machine learning, and relates to a court case judgment and prediction method.

Background

The number of court case judgment documents is multiplied geometrically, and the phenomenon that the existing cases are more and less in number of judges becomes more and more obvious. The phenomenon of 'same case and different judgment' caused by non-uniform judgment scales always causes the judicial judgment to suffer from scaling, and is considered to be one of the main reasons influencing judicial trust. Therefore, how to efficiently and accurately analyze the criminal period of a reasonable case judgment and avoid the phenomenon of 'same case and different judgment' as much as possible becomes an important problem to be solved urgently in the intelligent court field.

The existing court case judgment and prediction method is mainly a series of judgment and prediction methods based on data mining and deep learning, and the method can well obtain a predicted value for a court worker to refer to for criminal investigation, but has a certain promotion space in the accuracy of a prediction result. Taking the existing decision prediction method based on machine learning as an example, the implementation details thereof include: obtaining a first candidate keyword of a judgment document; taking a first candidate keyword as a keyword, wherein the deviation degree of the first prediction weight value and the actual weight value is smaller than a first preset threshold value, and the first prediction weight value is larger than a second preset threshold value; training the judgment documents and corresponding keywords to obtain judgment models; and obtaining a decision suggestion document through the decision model and the document to be decided, and obtaining a decision suggestion according to the decision suggestion document. The scheme can obtain a prediction value to assist the criminal reference based on criminal case experience in big data, but the prediction is carried out.

Disclosure of Invention

In view of the above, the present invention provides a method for judging and predicting a court case, which assists a court worker to quickly analyze a case of a criminal case, and intelligently predicts a criminal name and a criminal period corresponding to the case through feature extraction and a support vector machine algorithm. The method can meet the requirement of rapidly and accurately predicting case criminals in various judicial scenes by combining a machine learning algorithm, a natural language processing technology and a Web development technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a court case judgment prediction method is characterized in that the case full text is analyzed and feature extracted, a prediction model is trained based on extracted training data, a case criminal period is predicted, and case judgment reference of court staff is assisted; the method specifically comprises the following steps:

s1: the case guilty name is analyzed by adopting a rule-based method and a similarity model;

the similarity model method is an algorithm for judging the similarity degree of two articles or sentences according to the cosine similarity of word2 vec. And according to the vector coordinates, drawing in the space to obtain the cos value of the included angle. The closer the Cos value is to 1, the smaller the included angle, i.e., the two vectors are similar.

S2: the feature extraction adopts a rule-based method and an entity identification-based method to extract key feature elements in the case;

s3: vectorizing the extracted key features into numerical values through onehot coding to construct training data;

processing the training data by using the one-hot code;

encoding N states by using an N-bit state register, wherein each state has an independent register bit, only one or more of the register bits are effective, and then obtaining a data set with only 0 and 1;

s4: training the data set obtained in the S3, and training the characteristics processed in the S3 by adopting a Support Vector Machine (SVM) algorithm and a Logistic Regression (LR) algorithm;

the SVM algorithm comprises the following steps:

s411: generating an SVM description file;

s412: reading the description file into a container;

s413: reading in the number of samples, and generating a sample matrix and a type matrix;

s414: extracting HOG characteristics;

s415: writing the HOG characteristic into a txt file;

s416: carrying out SVM training;

the logistic regression LR algorithm includes the following steps:

s421: processing data;

s422: initializing parameters;

s423: gradient descending;

s424: saving the model;

s5: predicting case situation and sentencing;

s6: and calling the prediction model to obtain a prediction result through the text, the file and the option information input by the user interface, and displaying the prediction result on a user page.

Optionally, in S1, the rule-based method includes:

constructing a criminal term rule base based on the syntactic format of the court case judgment book, and extracting criminal term data matched with the rule base through a regular expression; if the extraction is invalid and the crime data are not extracted from the judgment book, carrying out prediction analysis on the crime by adopting a similarity model;

the prediction analysis method comprises the following steps: based on a plurality of case-sharing documents, after word segmentation and word deactivation are carried out, one-hot vectors of corpus words are used as word2vec input, low-dimensional word vectors are trained based on the word2vec, non-computable unstructured words are converted into computable structured vectors, case-sharing crime name context corpus models are trained, crime name prediction is carried out on a section of new judgment document context by utilizing the trained models, and the defects of a rule-based crime name analysis method are overcome.

The step of training the low-dimensional word vector based on word2vec comprises a Skip-gram processing step and a CBOW processing step:

skip-gram processing step:

s11: the window size window is determined, and 2 training samples of window are generated for each word, (i, i-window), (i, i-window +1),

s12: determining the size of the batch _ size, wherein the size of the batch _ size is an integral multiple of 2 window, so as to ensure that each batch contains all samples corresponding to one vocabulary;

s13: there are two training algorithms: levels Softmax and Negative Sampling;

s14: carrying out iterative training for a certain number of times by the neural network to obtain a parameter matrix from the input layer to the hidden layer, wherein the transposition of each row in the matrix is the word vector of the corresponding word;

processing steps of CBOW:

s21: determining a window size window, generating 2 training samples of window for each word, (i-window, i), (i-window +1, i), (i + window-1, i), (i + window, i);

s22: determining the size of the batch _ size, wherein the size of the batch _ size is an integral multiple of 2 window, so as to ensure that each batch contains all samples corresponding to one vocabulary;

s23: there are two training algorithms: levels Softmax and Negative Sampling;

s24: and (3) carrying out iterative training for a certain number of times by the neural network to obtain a parameter matrix from the input layer to the hidden layer, wherein the transposition of each row in the matrix is the word vector of the corresponding word.

Optionally, in S2, the features to be extracted are different for different names of guilties;

extracting the characteristics of the numerical patterns in a mode based on sentence pattern rules, and extracting correct numerical terms through regular expressions and sentence pattern semantics; the numerical type is characterized in that the numerical sequence is generated after the original corpus is subjected to single-hot coding;

constructing a complete word stock for the enumerated features, and screening feature values in the case through regular expressions and keyword sentence pattern semantics based on the complete word stock; enumerated features are illustrative features such as: occupation: doing agricultural affairs and free occupation;

and extracting the entity item characteristics of the harmed people and the involved places by adopting an entity identification method.

Optionally, the extracting, by using an entity identification method, the entity item features of the victim and the involved site specifically includes:

(1) selecting a data set: in the part-of-speech tagging task, a name daily tagging corpus is adopted and divided into a training set and a test set according to a ratio of 7: 3;

(2) data preprocessing: for a Chinese text, preprocessing data, splitting the text into Chinese characters, and labeling the part of speech of each Chinese character;

the label adopts a BIO mode, wherein B represents that the Chinese character is the starting character of a vocabulary and simultaneously represents a single word;

"I" indicates that the Chinese character is the middle character of the vocabulary;

"O" indicates that the Chinese character is not in the vocabulary;

setting the maximum sequence length according to the requirement of a BERT model, and padding the sequence according to the parameter;

(3) model training: configuring a storage path, a word list, pre-training model configuration information, checkpoint, a maximum sequence length, num _ epochs and a parameter training model of a learning rate of the model, and ensuring that all part-of-speech labels appear in training data when the data are segmented;

(4) and (3) entity identification and extraction: as with the training models (1) - (3), a sentence to be predicted is split into a series of single characters and then input into the trained model, and the model outputs the predicted part of speech corresponding to each single character; the Chinese characters of the beginning of the 'B' and the following 'I' are spliced until the next 'B' label Chinese character is met, so that the single word and the word which are marked with the part of speech are separated.

Optionally, in S3, one-hot coding is adopted, and the characteristic values of each enumerated type are mapped to integer values;

representing each integer value as a binary vector, all but the index of the integer value being a zero value;

the feature vector bit corresponding to the index of the integer value is marked as 1, and the extracted keyword feature value of each character type is vectorized into a form which is easy to process by a machine learning algorithm.

Optionally, in S4, input X in the training data is a feature vector after serialization, output Y is a specific type of sentencing and sentencing period, the SVM is used to predict the specific type of sentencing, the output result includes "innocent", "commission", "active", "inactive" and "dead", LR is used to predict the specific sentencing period, and the prediction result is accurate to the month; when the algorithm is used for parameter adjustment and optimization, a cross validation mode is adopted, grid search in a scimit-leann package, namely GridSearchCV class is used for adjusting and selecting the penalty coefficient C, the kernel function and the coefficient gamma of the kernel function, the correlation coefficient is finally determined by comparing the prediction accuracy, an optimal prediction model is further obtained, and the optimal model obtained through training is stored off line and used for predicting a new case criminal period subsequently.

Optionally, in S5, after feature extraction, extracting relevant feature elements in a new decision book input file, and vectorizing through one-hot coding, thereby obtaining an input X of the prediction model; calling an SVM and LR optimal model stored offline, calculating and predicting input X to obtain specific sentencing types and sentencing values, and converting the sentencing values into standard sentencing terms for output through the mapping relation between the sentencing values and the standard sentencing terms.

The method for calling the SVM and LR optimal model stored off line comprises the following steps:

s511: saving the model after the model training is finished;

s512: loading the saved model;

s513: analyzing the stored model to obtain a calculation process;

the computational prediction of the input X comprises the steps of:

s521: inputting a prediction corpus;

s522: performing single hot coding on the corpus;

s523: inputting the one-hot coded sequence into a model;

s524: acquiring a one-hot coding result;

s525: and analyzing the single hot fruit to obtain a final result.

The invention has the beneficial effects that: the occurrence of the phenomenon of 'different judgments on the same case' is effectively avoided, and compared with the prior art, the accuracy and the operation efficiency of criminal period prediction are remarkably improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a flow chart of predictive model training;

FIG. 3 is a flow chart of calling model prediction.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

Fig. 1 is a system architecture diagram of the present invention, and the core lies in the design and implementation of the sentencing prediction system, and mainly includes two parts of prediction model training and model calling prediction.

FIG. 2 is a flow chart of predictive model training. The core lies in the extraction of features and the training and optimization of a prediction model.

FIG. 3 is a flow chart of calling model prediction. The core lies in the extraction and serialization of input features.

A court case judgment and prediction method comprises the steps of analyzing the full case text and extracting features, training a prediction model based on a large amount of extracted training data, accurately predicting the case criminal period and assisting court staff in case reference; it includes:

s1: the case and the criminal names are analyzed by a rule-based method and a similarity model method. The rule-based method is characterized in that a criminal name sentence rule base is constructed based on the syntactic format of a court case judgment book, and criminal name data matched with the rule base are extracted through a regular expression; for criminal case judgment documents, the extraction success rate of the mode is more than 99%. If the above way extraction is invalid and the crime data is not extracted from the judgment book, the crime is predicted and analyzed by adopting a similarity model, the method is based on a large amount of case-sharing documents, after word segmentation and word deactivation are carried out, the one-hot vector of the corpus word is used as the input of word2vec, the low-dimensional word vector is trained based on the word2vec, so that the non-computable unstructured word is converted into a computable structured vector, a context corpus model of the case-sharing names is trained, and the trained model can be used for carrying out the crime prediction on a new judgment document context, thereby making up the defects of the rule-based crime analysis method.

S2: the feature extraction adopts a rule-based method and an entity identification-based method to extract key feature elements in the case. The features that need to be extracted differ for different names of guilties. For example, the characteristics of the intentional injury crime which needs to be extracted are a crime tool, a crime means, the injury severity of the victim, an adding criminal item and a deductive item, and the illegal operation crime needs to be extracted are an involved article, an involved money amount, an adding criminal item and a deductive item. The method comprises the steps of extracting the characteristics of the numerical form in a sentence rule-based mode, and extracting correct numerical terms through regular expressions and sentence semantics. And constructing a complete word bank for the enumerated features, and screening feature values in the case through regular expressions and keyword sentence pattern semantics based on the complete word bank. For the characteristics of the entity items such as the victim, the involved places and the like, an entity identification method is adopted for extraction, and the specific method comprises the following steps:

(1) selecting a data set: in the part-of-speech tagging task, a name daily tagging corpus is adopted and is divided into a training set and a test set according to a ratio of 7: 3;

(2) data preprocessing: for a Chinese text, preprocessing data, splitting the text into a series of Chinese characters, and performing part-of-speech tagging on each Chinese character. The label adopts a BIO mode, wherein B represents that the Chinese character is the starting character of a vocabulary and simultaneously represents a single word; "I" indicates that the Chinese character is the middle character of the vocabulary; "O" indicates that the Chinese character is not in the vocabulary. According to the requirement of the BERT model, the maximum sequence length is set, and the sequence is padded according to the parameter.

(3) Model training: and configuring storage paths, word lists, pre-training model configuration information, checkpoint, maximum sequence length, num _ epochs, learning rate and other parameter training models of the models, and ensuring that all part-of-speech tags appear in training data during data segmentation.

(4) And (3) entity identification and extraction: like the training models in the previous (1) - (3), the sentence to be predicted is split into a series of single characters and then input into the trained model, the model outputs the predicted part of speech corresponding to each single character, and through further processing, the beginning of the 'B' is spliced with the Chinese character followed by the 'I' until the next 'B' labeled Chinese character is encountered, so that word words and phrases with part of speech marked are separated.

S3: since the extracted features are a series of enumerated classification keywords, not continuous numerical values, the features need to be vectorized into numerical values, and the process adopts one-hot coding and maps the characteristic values of the enumerated types to integer values. Then, each integer value is expressed as a binary vector, except for the index of the integer value, the rest values are zero values, and the feature vector bit corresponding to the index of the integer value is marked as 1, so that the extracted keyword feature value of each character type is vectorized into a form which is easy to process by a machine learning algorithm.

S4: the prediction model adopts a Support Vector Machine (SVM) algorithm and a Logistic Regression (LR) algorithm to train data processed and completed in the extraction process, wherein input X in the training data is a serialized feature vector, output Y is a specific sentencing type and a sentencing period, the SVM is used for predicting the specific sentencing type, output results comprise 'innocence', 'arrest', 'active', 'inactive' and 'dead', and LR is used for predicting the specific sentencing period, and the prediction result is accurate to the month. When the algorithm is used for parameter adjustment and optimization, a cross validation mode is adopted, grid search in a scimit-leann package, namely GridSearchCV class, is adopted to adjust and select the penalty coefficient C, the kernel function and the coefficient gamma of the kernel function, the correlation coefficient is finally determined by comparing the prediction accuracy, an optimal prediction model is further obtained, and the optimal model obtained through training is stored off line and used for predicting the criminal period of a new case.

S5: in the case sentention forecasting part, firstly, the characteristic extraction method is used for extracting relevant characteristic elements in a new judgment input file, and one-hot coding vectorization is carried out, so that the input X of a forecasting model is obtained. And calling an SVM and LR optimal model stored offline, calculating and predicting the input X to obtain specific sentencing types and sentencing values, and converting the sentencing values into standard sentencing terms for output through the mapping relation between the sentencing values and the standard sentencing terms.

S6: the user interface supports three input models of texts, files and options, the three input data are processed into a standard input format of a prediction model through the characteristic extraction and characteristic value vectorization modes, the prediction model is called to obtain a prediction result, and the prediction result is displayed on a user page for related personnel to measure criminals for reference.

The first embodiment is as follows:

hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. As shown in fig. 2, a method for training and optimizing a prediction model is provided in an embodiment of the present invention. The method comprises the following steps:

specifically, the document is judged, and a user downloads a large number of judgment document files of the same case from a judgment document network and classifies the files according to corresponding criminal names;

and further, extracting key words, namely extracting key feature elements in the case by adopting a rule-based method and an entity identification-based method. The features that need to be extracted differ for different names of guilties. For example, the characteristics of the intentional injury crime which needs to be extracted are a crime tool, a crime means, the injury severity of the victim, an adding criminal item and a deductive item, and the illegal operation crime needs to be extracted are an involved article, an involved money amount, an adding criminal item and a deductive item. The method comprises the steps of extracting the characteristics of the numerical form in a sentence rule-based mode, and extracting correct numerical terms through regular expressions and sentence semantics. And constructing a complete word bank for the enumerated features, and screening feature values in the case through regular expressions and sentence semantics based on the complete word bank.

Further, serializing into a feature vector, and mapping each enumerated feature value to an integer value by adopting one-hot coding. Then, each integer value is expressed as a binary vector, except for the index of the integer value, the rest values are zero values, and the feature vector bit corresponding to the index of the integer value is marked as 1, so that the extracted keyword feature value of each character type is vectorized into a form which is easy to process by a machine learning algorithm.

Further, model training, namely training the data extracted and processed in the above process by adopting an integrated learning comprehensive Support Vector Machine (SVM) algorithm and a Logistic Regression (LR) algorithm, wherein input X in the training data is a serialized feature vector, and output Y is a specific criminal measuring type and a criminal period.

Further, model optimization is carried out, when the algorithm is used for parameter adjustment and optimization, a cross validation mode is adopted, grid search in a scimit-leann package, namely GridSearchCV class is adopted to adjust and select the penalty coefficient C, the kernel function and the coefficient gamma of the kernel function, the correlation coefficient is finally determined by comparing the prediction accuracy, an optimal prediction model is obtained, and the optimal model obtained by training is stored off line and used for predicting a new case criminal period subsequently.

Example two:

hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. As shown in fig. 3, the second embodiment of the present invention provides an offline model-based sentencing prediction method. The method comprises the following steps: specifically, the document to be judged and the input document to be analyzed by the user;

Further, an offline prediction model is invoked. The criminal period is predicted by adopting a model of a Support Vector Machine (SVM) algorithm and a Logistic Regression (LR) algorithm obtained by model training in the first embodiment, and using a feature vector obtained after the serialization in the previous step as a model input X.

And further, outputting a criminal judging result. And converting the predicted value into standard criminal expression, then designing a page, and displaying the predicted criminal result to the user in a visual mode.

Finally, it should be noted that: the above examples are only for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. A court case judgment prediction method is characterized by comprising the following steps: the method comprises the steps of analyzing and extracting characteristics of a case full text, training a prediction model based on extracted training data, predicting a case criminal period and assisting court staff in case reference; the method specifically comprises the following steps:

the rule-based method comprises the following steps:

the prediction analysis method comprises the following steps: based on a plurality of case-sharing documents, after word segmentation and word deactivation are carried out, one-hot vectors of corpus words are used as input of word vectors word2vec, low-dimensional word vectors are trained based on word2vec, words which cannot be calculated and are unstructured are converted into vectors which can be calculated and structured, context corpus models of case-sharing crimes are trained, the trained models are used for carrying out crime prediction on a new judgment document context, and the defects of a rule-based crime analysis method are overcome;

the training of the low-dimensional word vector based on word2vec comprises Skip-gram processing and continuous word CBOW processing:

the Skip-gram process is:

s11: determining window size window, generating 2 training samples of window for each word: (i, i-window), (i, i-window +1), (i, i + window-1), (i, i + window);

s13: there are two training algorithms: level 0-1 curve Softma and Negative Sampling of a novel;

the treatment of CBOW is:

s21: determining window size window, generating 2 training samples of window for each word: (i-window, i), (i-window +1, i), (i + window-1, i), (i + window, i);

s23: there are two training algorithms: levels Softmax and Negative Sampling;

s24: carrying out iterative training for a certain number of times by the neural network to obtain a parameter matrix from the input layer to the hidden layer, wherein the transposition of each row in the matrix is a word vector of a corresponding word;

s3: vectorizing the extracted key features into numerical values through one-hot coding to construct training data;

processing the training data by using one-hot coding;

the SVM algorithm comprises the following steps:

s411: generating an SVM description file;

s412: reading the description file into a container;

s414: extracting HOG (histogram of oriented gradients) features represented by multi-dimensional numbers;

s415: writing the HOG characteristic into a txt file;

s416: carrying out SVM training;

the logistic regression LR algorithm includes the following steps:

s421: processing data;

s422: initializing parameters;

s423: gradient descending;

s424: saving the model;

s5: predicting case situation and sentencing;

2. The forensic case decision prediction method of claim 1 wherein: in S2, the features to be extracted are different for different names of crimes;

constructing a complete word stock for the enumerated features, and screening feature values in the case through regular expressions and keyword sentence pattern semantics based on the complete word stock; enumerated features are illustrative features;

3. The forensic case decision prediction method of claim 2 wherein: the method for extracting the entity item characteristics of the harmed people and the involved places by adopting an entity identification method specifically comprises the following steps:

the label adopts a BIO mode;

wherein "B" indicates that the Chinese character is the beginning character of the vocabulary; "I" indicates that the Chinese character is the middle character of the vocabulary; "O" indicates that the Chinese character is not in the vocabulary;

setting the maximum sequence length according to the requirement of a text preprocessing model BERT, and setting the data length padding of the sequence according to the parameter;

(3) model training: configuring a storage path, a word list, pre-training model configuration information, a parameter training model with the maximum sequence length, a training batch num _ epochs and a learning rate of the model, and ensuring that all part-of-speech labels appear in training data when the data are segmented;

(4) and (3) entity identification and extraction: splitting a sentence to be predicted into a series of single characters, inputting the single characters into a trained model, and outputting a predicted part of speech corresponding to each single character by the model; the Chinese characters of the beginning of the 'B' and the following 'I' are spliced until the next 'B' label Chinese character is met, so that the single word and the word which are marked with the part of speech are separated.

4. The forensic case decision prediction method of claim 2 wherein: in S3, one-hot coding is adopted, and the characteristic values of each enumerated type are mapped to integer values;

5. The forensic case decision prediction method of claim 4 in which: in the S4, input X in training data is a feature vector after serialization, output Y is a specific sentencing type and a sentencing period, an SVM is used for predicting the specific sentencing type, output results comprise 'innocent', 'restricted service', 'available', 'unavailable' and 'dead sentencing', and logistic regression LR is used for predicting the specific sentencing period, and the prediction result is accurate to the month; when the algorithm parameter adjustment and optimization are carried out, a cross validation mode is adopted, grid search in an algorithm library scimit-learn package is utilized, namely grid search GridSearchCV types are utilized to adjust and select the penalty coefficient C, the kernel function and the coefficient gamma of the kernel function, the correlation coefficient is finally determined by comparing the prediction accuracy, an optimal prediction model is further obtained, and the optimal model obtained through training is stored off line and used for predicting a new case criminal period subsequently.

6. The forensic case decision prediction method of claim 5 wherein: in S5, after feature extraction, extracting relevant feature elements in the new decision book input file, and vectorizing through one-hot encoding, thereby obtaining an input X of the prediction model; calling an SVM and LR optimal model stored offline, calculating and predicting input X to obtain specific sentencing types and sentencing values, and converting the sentencing values into standard sentencing terms for output through a mapping relation between the sentencing values and the standard sentencing terms;