CN112396201A

CN112396201A - Criminal name prediction method and system

Info

Publication number: CN112396201A
Application number: CN201910695855.3A
Authority: CN
Inventors: 戴威
Original assignee: Beijing Gridsum Technology Co Ltd
Current assignee: Beijing Gridsum Technology Co Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-02-23

Abstract

The present invention provides a method and system for predicting a crime. The method includes: obtaining a first judicial document for which the crime needs to be predicted. Intercept or supplement the first judicial document to obtain the second judicial document. Inputting the second judicial document into the pre-established crime prediction model for crime prediction, and obtaining the crime prediction result corresponding to the first judicial document, wherein the crime prediction model is obtained by training a language model based on the sample data. In this scheme, the crime prediction model is obtained by pre-training the language model through a large number of judicial documents, and the judicial documents that need to be predicted are intercepted or supplemented, and the redundant witness testimony in the judicial documents is filtered. The intercepted or supplemented judicial document is used as the input of the crime prediction model, and the crime prediction result of the judicial document is obtained, thereby saving labor cost and time cost, and improving the accuracy and efficiency of judgment.

Description

Criminal name prediction method and system

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a system for predicting a criminal name.

Background

With the development of modern society, law is one of the products in the development process of civilized society. Law is generally a specific behavior rule which is set by a social approved national validation legislation and has general constraint on all members of the society, and the national mandatory guarantees define the rights and obligations of parties as contents. When disputes occur among the members of the society, the judicial authorities carry out official working and adjudication according to laws.

At present, when legal judgment is carried out, the judgment is usually made according to legal regulations by people according to the description of case and referring to relevant laws. However, because of the large number of criminal names regulated by laws of various countries, higher time and labor cost are required for manually combing cases one by one and then judging. On the other hand, due to the diversity of languages, when a case is combed, a plurality of different description and expression modes are usually provided for the same conviction element, and the accuracy and the efficiency of judgment can be influenced.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for predicting a criminal name, so as to solve the problems of high labor cost, high time cost, low accuracy, low efficiency, and the like in the existing manual criminal name judgment.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the first aspect of the embodiment of the invention discloses a method for predicting a criminal name, which comprises the following steps:

acquiring a first judicial literature needing to predict a criminal name;

intercepting or supplementing the first judicial literature to obtain a second judicial literature;

and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.

Preferably, when the language model is a BERT model, the process of obtaining a criminal name prediction model by training the language model based on sample data includes:

performing character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents;

taking the first training data as an input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges;

taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model;

and taking the sample data as the input of the second BERT model, and training the second BERT model until the second BERT model converges by combining a preset second loss function and a criminal name label corresponding to the sample data to obtain the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.

Preferably, the training the first BERT model by using the first training data as an input of the first BERT model and combining a preset first loss function and the sample data until the first BERT model converges includes:

the first training data is used as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position;

calculating a text error between an actual text of the text replacement position and the text prediction result using a first sub-loss function, and calculating a sentence error between an actual sentence of the sentence splicing position and the sentence prediction result using a second sub-loss function;

training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges;

wherein the actual words and sentences are derived from the sample data.

Preferably, the intercepting or supplementing the first judicial literature to obtain a second judicial literature includes:

if the number of the words of the first judicial literature is less than the preset number of the words, adding n blank characters into the first judicial literature to obtain a second judicial literature, wherein n is the preset number of the words minus the number of the words of the first judicial literature;

if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing;

if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.

Preferably, the performing text replacement and sentence splicing on the sample data to obtain first training data includes:

randomly replacing characters in the sample data with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence.

The second aspect of the embodiment of the invention discloses a system for predicting a criminal name, which comprises:

the acquiring unit is used for acquiring a first judicial literature needing to predict a criminal name;

the processing unit is used for intercepting or supplementing the first judicial literature to obtain a second judicial literature;

and the prediction unit is used for inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction so as to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.

Preferably, when the language model is a BERT model, the prediction unit includes:

the processing module is used for carrying out character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents;

a first training module, configured to use the first training data as an input of a first BERT model, and train the first BERT model until the first BERT model converges by combining a preset first loss function and the sample data;

the setting module is used for taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model;

and the second training module is used for taking the sample data as the input of the second BERT model, combining a preset second loss function and a criminal name label corresponding to the sample data, training the second BERT model until the second BERT model converges, and obtaining the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.

Preferably, the first training module comprises:

the prediction submodule is used for taking the first training data as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position;

an error submodule, configured to calculate a text error between an actual text at the text replacement position and the text prediction result using a first sub-loss function, and calculate a sentence error between an actual sentence at the sentence splicing position and the sentence prediction result using a second sub-loss function;

a training sub-module for training the first BERT model based on the text error and sentence error in combination with the first training data until the first BERT model converges;

wherein the actual words and sentences are derived from the sample data.

A third aspect of the embodiments of the present invention discloses a storage medium, where the storage medium includes a stored program, and when the program runs, the device on which the storage medium is located is controlled to execute the method for predicting the name of a crime as disclosed in the first aspect of the embodiments of the present invention.

The fourth aspect of the embodiment of the present invention discloses a device for predicting a guilt name, which includes a storage medium and a processor, wherein the storage medium stores a program, and the processor is configured to run the program, and when the program runs, the method for predicting a guilt name as disclosed in the first aspect of the embodiment of the present invention is performed.

Based on the method and the system for predicting the criminal name provided by the embodiment of the invention, the method comprises the following steps: a first judicial essay whose name is to be predicted is obtained. And intercepting or supplementing the first judicial literature to obtain a second judicial literature. And inputting the second judicial literature into a pre-established criminal name prediction model for criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data. In the scheme, a language model is trained in advance through a large number of judicial documents to obtain a criminal name prediction model, the judicial documents of which the criminal names need to be predicted are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, cases do not need to be combed one by one manually and then judged, the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a Transformer according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for predicting a guilt name according to an embodiment of the present invention;

FIG. 3 is a flow chart of obtaining a criminal name prediction model according to an embodiment of the present invention;

FIG. 4 is a flowchart of training a first BERT model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As known in the background art, at present, when legal judgment is carried out, judgment is usually made manually according to the description of case and referring to relevant laws and legal regulations. However, because of the large number of criminal names regulated by laws of various countries, higher time and labor cost are required for manually combing cases one by one and then judging. On the other hand, due to the diversity of languages, when a case is combed, a plurality of different description and expression modes are usually provided for the same conviction element, and the accuracy and the efficiency of judgment can be influenced.

Therefore, the embodiment of the invention provides a method and a system for predicting a name of a crime, wherein a language model is pre-trained by using a large amount of judicial documents to obtain a name of the crime prediction model, and the judicial documents needing to predict the name of the crime are used as input of the name of the crime prediction model to obtain a name of the crime prediction result, so that the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.

It should be noted that the (bidirectional encoder Representation from transforms, BERT) model involved in the embodiments of the present invention is a language model proposed by google, and has a strong abstraction capability for text in the field of natural language processing. The BERT model has a 12-layer Transformer structure. The concrete structure of the BERT model is as follows: segmenting a text input into an embedding (embedding) layer according to words, mapping the words into 768-dimensional vectors based on word vector mapping weights provided by Google, and obtaining a coding vector Enc through a 12-layer Transformer structure.

Referring to fig. 1, a schematic structural diagram of a Transformer is shown, in which fig. 1, the Transformer includes a multi-head Attention (Multihead Attention), a Residual Unit (Residual Unit), a Layer normalization (Layer Norm), and a two-Layer full join (FFN).

Referring to fig. 2, a flow of a method for predicting a guilty name provided by an embodiment of the present invention is shown, where the method for predicting a guilty name includes the following steps:

step S201: a first judicial essay whose name is to be predicted is obtained.

Step S202: and intercepting or supplementing the first judicial literature to obtain a second judicial literature.

It should be noted that, in step S202, the word count of the first judicial literature may be adjusted by interception or supplementation according to the word count requirement of the criminal name prediction model for the judicial literature. The text lengths of the text description paragraphs of the document facts in different judicial documents are inconsistent, some text paragraphs of the testimonial testimonials in the judicial documents are tens of thousands of characters, and some text paragraphs of the testimonial testimonials have only a few crosses. For witness certificates with text fields up to tens of thousands of characters, most witnesses are redundant for criminal name prediction, so the redundant witness needs to be truncated. For witness testifiers with character paragraphs having only a few crosses, because the input text lengths of the guiltname prediction model need to be consistent, character paragraphs with only a few crosses need to be subjected to word number adding processing until the word number meets the requirements of the guiltname prediction model.

In the process of specifically implementing step S202, if the number of words of the first judicial literature is less than the preset number of words, n blank characters are added to the first judicial literature to obtain the second judicial literature, where n is the preset number of words minus the number of words of the first judicial literature.

Such as: and if the preset word number is 600 words, if the word number of the first judicial literature is less than 600 words, adding blank characters into the first judicial literature until the word number of the first judicial literature is 600 words to obtain the second judicial literature.

And if the word number of the first judicial literature is equal to the preset word number, taking the first judicial literature as the second judicial literature.

Such as: assuming that the preset word number is 600 words, if the word number of the first judicial literature is greater than 600 words, intercepting the first 550 words and the last 50 words of the first judicial literature, and taking the obtained first 550 words and the obtained last 50 words as the second judicial literature.

Step S203: and inputting the second judicial literature into a pre-established criminal name prediction model to carry out criminal name prediction, so as to obtain a criminal name prediction result corresponding to the first judicial literature.

In the process of specifically implementing step S203, a training set text is acquired from the CAIL2018 data set, and the training set text is processed in the manner of intercepting the text in step S202, so as to obtain sample data. And training a language model based on the sample data to obtain the criminal name prediction model.

It should be noted that the language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element analysis model. The types of language models include, but are not limited to: ELMo model, GPT model, and BERT model.

It should be noted that the CAIL2018 data set is a data set constructed based on a decision book published by a referee document network, and includes 170 ten thousand training data and 20 ten thousand testing data. The concrete construction mode is as follows: and deleting the contents related to the criminal name in the fact description segment corresponding to the judgment book, and taking the fact description segment as a training set text. And taking the name of the crime judged by the judgment section in the judgment book as a crime name label.

It should be noted that the reason for deleting the content related to the crime name in the fact description segment is to use the fact description segment as training data of a crime name prediction model, and if the content related to the crime name exists in the fact description segment, the crime name can be directly obtained from the fact description segment, instead of deducing the crime name according to the content in the fact description segment, which is inconsistent with the purpose of training the crime name prediction model.

In the embodiment of the invention, a language model is trained in advance through a large amount of judicial documents to obtain a criminal name prediction model, the judicial documents needing to predict the criminal names are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, cases do not need to be combed one by one manually and then judged, the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.

In the above embodiment of the present invention, in the process of training the language model to obtain the criminal name prediction model based on the sample data in step S203 disclosed in fig. 2, in the case that the language model is a BERT model, reference may be made to fig. 3, which shows a flowchart of obtaining the criminal name prediction model provided in the embodiment of the present invention, and includes the following steps:

step S301: and performing character replacement and sentence splicing on the sample data to obtain first training data.

In the process of specifically implementing step S301, the sample data is obtained by performing interception processing on the published judicial literature, and the specific way of the interception processing is referred to the corresponding content in step S202 disclosed in fig. 2 in the above embodiment of the present invention, which is not described herein again.

When carrying out character replacement and sentence splicing processing, randomly selecting characters from the sample data, replacing the characters with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence. Such as: randomly replacing the characters in the sample with "[ MASK ]". And selecting the sentence needing sentence splicing, wherein 50% of the probability is the next sentence corresponding to the sentence splicing, and 50% of the probability is the sentence splicing other sentences.

It should be noted that the above-mentioned randomly performing text replacement and sentence splicing is only applicable to illustration, and a technician may also specifically select which texts need to be replaced by characters and which sentences need to be spliced. Similarly, some characters may be replaced by characters every other characters with a preset number, and a sentence may be spliced every other sentences with a preset number, which is not specifically limited in the embodiment of the present invention.

Step S302: and taking the first training data as the input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges.

It should be noted that, in the process of specifically implementing step S302, the first training data is used as the input of the first BERT model, the sentences at the text and sentence splicing parts of the text replacement part are predicted, and the judgment capability of the first BERT model on the texts and sentences is trained by combining the error between the prediction result and the actual result. For example: for a complete sentence, randomly replacing a character in the sentence with a preset character, and training the first BERT model to judge what the actual character of the preset character part of the sentence is. And for the whole content composed of a plurality of sentences, sentence splicing processing is carried out on a certain sentence, and the first BERT model is trained to judge what the actual sentence corresponding to the sentence splicing part is.

Step S303: and taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model.

In the process of specifically implementing step S303, the converged parameters of the embedding layer and 12-layer transformer structure in the first BERT model are used as initialization parameters of the embedding layer and 12-layer transformer structure in the second BERT model.

Step S304: and taking the sample data as the input of the second BERT model, and training the second BERT model by combining a preset second loss function and a criminal name label corresponding to the sample data until the second BERT model converges to obtain the criminal name prediction model.

In the process of the specific implementation step S304, a 768-dimensional vector is selected after the coding vector Enc in the second BERT model, the 768-dimensional vector connects the number of categories of guilties to be predicted through a 768-dimensional full-link layer, and the second BERT model is trained by using a weighted cross entropy loss function (sigmoid cross entropy) as the second loss function. The specific training process is as shown in process A1-A3. It should be noted that the dimensions of the vectors and the fully-connected layer referred to above include, but are not limited to 768 dimensions.

A1: and inputting the sample data into the second BERT model to obtain a criminal name prediction result.

A2: and calculating the error of the criminal name prediction result and the criminal name label by using the second loss function.

A3: and if the error is smaller than a threshold value, constructing the guiltname prediction model based on the current model parameters of the second BERT model. And if the error is larger than a threshold value, adjusting the model parameters of the second BERT model based on the error, training the second BERT model based on the sample data and the criminal name label until the error is smaller than the threshold value, and determining the trained second BERT model as the criminal name prediction model.

It should be noted that the guilt name tag is obtained from the decision section corresponding to the published judicial paper.

It should be noted that the above processes a1-A3 are only for illustration.

It should be noted that an initial parameter is needed for training the neural network model, the initial parameter of the conventional neural network model generally adopts a random parameter with a normally distributed distribution with a small mean value and variance of 0, and the initial parameter determination method of the conventional neural network model has a poor prediction effect on text elements. In the embodiment of the invention, the first BERT model is trained in advance until convergence, and when the second BERT model is trained, the parameter structure of the trained first BERT model is used for initializing the parameter of the second BERT model, so that sufficient legal field prior information is provided for the second BERT model, and the element prediction accuracy of the element analysis model is effectively improved.

In the embodiment of the invention, the first BERT model is trained through sample data until convergence, the converged model parameters of the first BERT model are used as the initialization model parameters of the second BERT model, and the second BERT model is trained through the sample data until convergence to obtain a criminal name prediction model. The criminal name prediction model is used for carrying out criminal name prediction on the judicial documents, manual work is not needed for combing cases one by one and then judgment is conducted, the labor cost and the time cost can be effectively reduced, and the judgment accuracy and the judgment efficiency are improved.

The above-mentioned process for training the first BERT model, which is disclosed in step S302 of fig. 3 in the embodiment of the present invention, is shown in fig. 4, which is a flowchart for training the first BERT model provided in the embodiment of the present invention, and includes the following steps:

step S401: and taking the first training data as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position.

It should be noted that, for the process of obtaining the first training data, reference is made to the content corresponding to step S301 disclosed in fig. 3 in the embodiment of the present invention, and details are not repeated here.

Step S402: and calculating a text error between the actual text of the text replacement position and the text prediction result by using a first sub-loss function, and calculating a sentence error between the actual sentence of the sentence splicing position and the sentence prediction result by using a second sub-loss function.

In the process of implementing step S402 specifically, a first 768-dimensional vector is selected from the encoding vectors Enc, and the 768-dimensional vector is connected to the first sub-loss function and the second sub-loss function through a 768-dimensional full-connection layer. It should be noted that the dimensions of the vectors and the fully-connected layer referred to above include, but are not limited to 768 dimensions.

It should be noted that the first sub-loss function includes, but is not limited to: a multi-class softmax cross-entropy loss function, the second sub-loss function including, but not limited to: softmax cross entropy loss function of two classes.

Step S403: training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges.

In the process of implementing step S403 specifically, the actual characters and the actual sentences are derived from the sample data, that is, the actual characters at the character replacement positions and the actual sentences at the sentence splicing positions can be obtained through the sample data. And if the text error and the sentence error are both smaller than a threshold value, converging the first BERT. If the text error and the sentence error are both larger than the threshold value, the model parameters of the first BERT model are adjusted based on the text error and the sentence error, and the first BERT model is continuously trained by using the first training data until the text error and the sentence error are both smaller than the threshold value.

In the embodiment of the invention, before the acquirement of the criminal name prediction model, the first BERT model is trained until convergence through first training data based on a first sub-loss function and a second sub-loss function, the model parameters of the converged first BERT model are used as the initialization model parameters of the second BERT model, and then the second BERT model is trained until convergence to obtain the criminal name prediction model based on the training data, so that the accuracy of the criminal name prediction model for predicting the criminal name can be improved.

Corresponding to the above-described method for predicting a guilty name disclosed in the embodiment of the present invention, referring to fig. 5, the embodiment of the present invention further provides a schematic structural diagram of a guilty name prediction system, where the guilty name prediction system includes: an acquisition unit 501, a truncation unit 502, and a prediction unit 503.

An obtaining unit 501 is configured to obtain a first judicial literature whose name is to be predicted.

The processing unit 502 is configured to intercept or perform supplementary processing on the first judicial literature to obtain a second judicial literature.

In a specific implementation, the processing unit 502 is specifically configured to add n blank characters to the first judicial literature to obtain the second judicial literature if the number of words of the first judicial literature is less than a preset number of words, where n is the preset number of words minus the number of words of the first judicial literature;

For the processing procedure of the first judicial literature, reference is made to the content corresponding to step S202 disclosed in fig. 2 of the embodiment of the present invention.

The predicting unit 503 is configured to input the second judicial literature into a pre-established guiltname prediction model for performing guiltname prediction, so as to obtain a guiltname prediction result corresponding to the first judicial literature, where the guiltname prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine an initialization model parameter of the element analysis model. For the obtaining process and the processing process of the sample data, the content corresponding to the step S203 disclosed in fig. 2 in the embodiment of the present invention is referred to.

Referring to fig. 6, a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention is shown, where when the language model is a BERT model, the prediction unit 503 includes: a processing module 5031, a first training module 5032, a setup module 5033, and a second training module 5034.

The processing module 5031 is configured to perform text replacement and sentence splicing on the sample data to obtain first training data, where the sample data is obtained by intercepting a published judicial literature.

In a specific implementation, the processing module 5031 is specifically configured to randomly replace the text in the sample data with a preset character, and randomly splice a second sentence with a first sentence in the sample data, where the second sentence is a next sentence corresponding to the first sentence or is not a next sentence corresponding to the first sentence. For specific content, refer to the content corresponding to step S301 disclosed in fig. 3 of the above embodiment of the present invention.

A first training module 5032, configured to use the first training data as an input of a first BERT model, and train the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges.

A setting module 5033, configured to use the converged model parameters of the first BERT model as initialization model parameters of a second BERT model.

A second training module 5034, configured to use the sample data as an input of the second BERT model, and combine a preset second loss function and a guilty name tag corresponding to the sample data to train the second BERT model until the second BERT model converges, so as to obtain the guilty name prediction model, where the guilty name tag is obtained from a decision segment corresponding to the published judicial literature. The specific process of training the second BERT model refers to the content corresponding to step S304 disclosed in fig. 3 of the above embodiment of the present invention.

Preferably, referring to fig. 7 in conjunction with fig. 6, a schematic structural diagram of a criminal name prediction system provided in an embodiment of the present invention is shown, where the first training module 5032 includes: a predictor sub-module 50321, an error sub-module 50322, and a training sub-module 50323.

The prediction sub-module 50321 is configured to use the first training data as an input of the first BERT model to obtain a text prediction result corresponding to the text replacement position and obtain a sentence prediction result corresponding to the sentence splicing position.

An error sub-module 50322 for calculating a text error between the actual text at the text replacement position and the text prediction result using a first sub-loss function and calculating a sentence error between the actual sentence at the sentence splicing position and the sentence prediction result using a second sub-loss function.

A training sub-module 50323 configured to train the first BERT model with the first training data based on the text error and the sentence error until the first BERT model converges. The process of training the first BERT model is described in the embodiment of the present invention, which corresponds to the step S403 disclosed in fig. 4.

It should be noted that the actual words and sentences are derived from the sample data.

Based on the crime name prediction system disclosed by the embodiment of the invention, the modules can be realized by a hardware device consisting of a processor and a memory. The method specifically comprises the following steps: the modules are stored in the memory as program units, and the program units stored in the memory are executed by the processor to realize the criminal name prediction.

The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more, and the criminal name prediction is realized by adjusting the kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

Further, the embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the method for predicting the name of the guilty.

Further, an embodiment of the present invention provides a criminal name prediction apparatus, which includes a processor, a memory, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the following steps: acquiring a first judicial literature needing to predict a criminal name; intercepting or supplementing the first judicial literature to obtain a second judicial literature; and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.

Wherein, when the language model is a BERT model, the process of obtaining a criminal name prediction model by training the language model based on sample data includes: performing character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents; taking the first training data as an input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges; taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model; and taking the sample data as the input of the second BERT model, and training the second BERT model until the second BERT model converges by combining a preset second loss function and a criminal name label corresponding to the sample data to obtain the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.

Wherein, the training the first BERT model by using the first training data as the input of the first BERT model and combining a preset first loss function and the sample data until the first BERT model converges comprises: the first training data is used as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position; calculating a text error between an actual text of the text replacement position and the text prediction result using a first sub-loss function, and calculating a sentence error between an actual sentence of the sentence splicing position and the sentence prediction result using a second sub-loss function; training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges; wherein the actual words and sentences are derived from the sample data.

Wherein, the intercepting or supplementing processing the first judicial literature to obtain a second judicial literature comprises: if the number of the words of the first judicial literature is less than the preset number of the words, adding n blank characters into the first judicial literature to obtain a second judicial literature, wherein n is the preset number of the words minus the number of the words of the first judicial literature; if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing; if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.

Performing character replacement and sentence splicing on the sample data to obtain first training data, including: randomly replacing characters in the sample data with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence.

Further, an embodiment of the present invention provides a storage medium having a program stored thereon, where the program is executed by a processor to implement a criminal name prediction.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a first judicial literature needing to predict a criminal name; intercepting or supplementing the first judicial literature to obtain a second judicial literature; and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.

In summary, the embodiment of the present invention provides a method and a system for predicting a criminal name, where the method includes: a first judicial essay whose name is to be predicted is obtained. And intercepting or supplementing the first judicial literature to obtain a second judicial literature. And inputting the second judicial literature into a pre-established criminal name prediction model for criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data. In the scheme, a language model is trained in advance through a large number of judicial documents to obtain a criminal name prediction model, the judicial documents of which the criminal names need to be predicted are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, and cases do not need to be combed one by one manually and then judged, so that the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, client, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. a crime prediction method, is characterized in that, described method comprises:

Obtain the first judicial paperwork required to predict the offense;

Intercepting or supplementing the first judicial document to obtain a second judicial document;

Input the second judicial document into a pre-established crime prediction model for crime prediction, and obtain a crime prediction result corresponding to the first judicial document, wherein the crime prediction model is obtained by training a language model model based on sample data, and the crime prediction model is obtained by training a language model model based on sample data. The language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element parsing model.

2. The method according to claim 1, wherein when the language model is a BERT model, the process of obtaining a crime prediction model by training the language model based on sample data comprises:

Perform text replacement and sentence splicing processing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting public judicial documents;

Using the first training data as the input of the first BERT model, and combining the preset first loss function and the sample data, train the first BERT model until the first BERT model converges;

Taking the model parameters of the converged first BERT model as the initialization model parameters of the second BERT model;

Take the sample data as the input of the second BERT model, combine the preset second loss function and the crime label corresponding to the sample data, train the second BERT model until the second BERT model converges, and obtain In the crime prediction model, the crime label is obtained from the judgment section corresponding to the public judicial document.

3. The method according to claim 2, wherein the first training data is used as the input of the first BERT model, and the first training data is combined with a preset first loss function and the sample data to train the first BERT model. BERT model until the first BERT model converges, including:

Using the first training data as the input of the first BERT model, obtain the text prediction result of the corresponding text replacement position, and obtain the sentence prediction result of the corresponding sentence splicing position;

The first sub-loss function is used to calculate the text error between the actual text at the text replacement position and the text prediction result, and the second sub-loss function is used to calculate the difference between the actual sentence at the sentence splicing position and the sentence prediction result. sentence errors between

Based on the text error and sentence error, train the first BERT model in combination with the first training data until the first BERT model converges;

Wherein, the actual text and the actual sentence are derived from the sample data.

4. The method according to claim 1, wherein the intercepting or supplementary processing of the first judicial document to obtain a second judicial document, comprising:

If the number of characters in the first judicial document is less than the preset number of characters, n blank characters are added to the first judicial document to obtain the second judicial document, where n is the preset number of characters minus the first The number of words in a judicial document;

If the number of words in the first judicial document is equal to the preset number of words, the first judicial document is used as the second judicial document;

If the number of characters in the first judicial document is more than the preset number of characters, the first x characters and the last y characters of the first judicial document are intercepted, and the obtained x characters and y characters are used as the The second judicial document, wherein the sum of x and y is equal to the preset number of words.

5. The method according to claim 2, wherein the sample data is subjected to text replacement and sentence splicing to obtain the first training data, comprising:

Randomly replace the text in the sample data with preset characters, and randomly splicing a second sentence for the first sentence in the sample data, wherein the second sentence is the next sentence corresponding to the first sentence or not the next sentence corresponding to the first sentence.

6. A crime prediction system, wherein the system comprises:

The obtaining unit is used to obtain the first judicial document for which the crime needs to be predicted;

a processing unit, configured to intercept or supplement the first judicial document to obtain a second judicial document;

A prediction unit, configured to input the second judicial document into a pre-established crime prediction model for crime prediction, and obtain a crime prediction result corresponding to the first judicial document, wherein the crime prediction model is trained by a language model based on sample data Obtained, the language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element parsing model.

7. The system according to claim 6, wherein when the language model is a BERT model, the prediction unit comprises:

a processing module, configured to perform text replacement and sentence splicing processing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting public judicial documents;

The first training module is used to use the first training data as the input of the first BERT model, and combine the preset first loss function and the sample data to train the first BERT model until the first BERT model convergence;

A setting module is used to use the model parameters of the converged first BERT model as the initialization model parameters of the second BERT model;

The second training module is configured to use the sample data as the input of the second BERT model, combine the preset second loss function and the crime label corresponding to the sample data, and train the second BERT model until the The second BERT model converges to obtain the crime prediction model, wherein the crime label is obtained from the judgment section corresponding to the public judicial document.

8. The system of claim 7, wherein the first training module comprises:

A prediction submodule, configured to use the first training data as the input of the first BERT model, obtain the text prediction result of the corresponding text replacement position, and obtain the sentence prediction result of the corresponding sentence splicing position;

The error sub-module is used to calculate the text error between the actual text of the text replacement position and the text prediction result using the first sub-loss function, and use the second sub-loss function to calculate the actual sentence and the sentence splicing position. the sentence error between the sentence prediction results;

a training sub-module for training the first BERT model in combination with the first training data based on the text error and sentence error until the first BERT model converges;

9. A storage medium, wherein the storage medium comprises a stored program, wherein when the program is run, a device where the storage medium is located is controlled to perform the crime prediction according to any one of claims 1-5 method.

10. A crime prediction device, characterized in that it comprises a storage medium and a processor, wherein the storage medium stores a program, and the processor is configured to run the program, wherein the program executes the program according to claim 1- The crime prediction method described in any one of 5.