CN112396201A - Criminal name prediction method and system - Google Patents

Criminal name prediction method and system Download PDF

Info

Publication number
CN112396201A
CN112396201A CN201910695855.3A CN201910695855A CN112396201A CN 112396201 A CN112396201 A CN 112396201A CN 201910695855 A CN201910695855 A CN 201910695855A CN 112396201 A CN112396201 A CN 112396201A
Authority
CN
China
Prior art keywords
model
sentence
judicial
training
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910695855.3A
Other languages
Chinese (zh)
Inventor
戴威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201910695855.3A priority Critical patent/CN112396201A/en
Publication of CN112396201A publication Critical patent/CN112396201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种罪名预测方法及系统,该方法为:获取需要预测罪名的第一司法文书。对第一司法文书进行截取或补充处理,得到第二司法文书。将第二司法文书输入预先建立的罪名预测模型进行罪名预测,得到第一司法文书对应的罪名预测结果,其中,罪名预测模型由基于样本数据训练语言模型获得。在本方案中,通过海量的司法文书预先训练语言模型得到罪名预测模型,将需要预测罪名的司法文书进行截取或补充处理,过滤该司法文书中冗余的证人证言部分。将进行截取或补充处理后的司法文书作为罪名预测模型的输入,得到该司法文书的罪名预测结果,从而节约人力成本和时间成本,提高判决的准确性和效率。

Figure 201910695855

The present invention provides a method and system for predicting a crime. The method includes: obtaining a first judicial document for which the crime needs to be predicted. Intercept or supplement the first judicial document to obtain the second judicial document. Inputting the second judicial document into the pre-established crime prediction model for crime prediction, and obtaining the crime prediction result corresponding to the first judicial document, wherein the crime prediction model is obtained by training a language model based on the sample data. In this scheme, the crime prediction model is obtained by pre-training the language model through a large number of judicial documents, and the judicial documents that need to be predicted are intercepted or supplemented, and the redundant witness testimony in the judicial documents is filtered. The intercepted or supplemented judicial document is used as the input of the crime prediction model, and the crime prediction result of the judicial document is obtained, thereby saving labor cost and time cost, and improving the accuracy and efficiency of judgment.

Figure 201910695855

Description

Criminal name prediction method and system
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method and a system for predicting a criminal name.
Background
With the development of modern society, law is one of the products in the development process of civilized society. Law is generally a specific behavior rule which is set by a social approved national validation legislation and has general constraint on all members of the society, and the national mandatory guarantees define the rights and obligations of parties as contents. When disputes occur among the members of the society, the judicial authorities carry out official working and adjudication according to laws.
At present, when legal judgment is carried out, the judgment is usually made according to legal regulations by people according to the description of case and referring to relevant laws. However, because of the large number of criminal names regulated by laws of various countries, higher time and labor cost are required for manually combing cases one by one and then judging. On the other hand, due to the diversity of languages, when a case is combed, a plurality of different description and expression modes are usually provided for the same conviction element, and the accuracy and the efficiency of judgment can be influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and a system for predicting a criminal name, so as to solve the problems of high labor cost, high time cost, low accuracy, low efficiency, and the like in the existing manual criminal name judgment.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the first aspect of the embodiment of the invention discloses a method for predicting a criminal name, which comprises the following steps:
acquiring a first judicial literature needing to predict a criminal name;
intercepting or supplementing the first judicial literature to obtain a second judicial literature;
and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.
Preferably, when the language model is a BERT model, the process of obtaining a criminal name prediction model by training the language model based on sample data includes:
performing character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents;
taking the first training data as an input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges;
taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model;
and taking the sample data as the input of the second BERT model, and training the second BERT model until the second BERT model converges by combining a preset second loss function and a criminal name label corresponding to the sample data to obtain the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.
Preferably, the training the first BERT model by using the first training data as an input of the first BERT model and combining a preset first loss function and the sample data until the first BERT model converges includes:
the first training data is used as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position;
calculating a text error between an actual text of the text replacement position and the text prediction result using a first sub-loss function, and calculating a sentence error between an actual sentence of the sentence splicing position and the sentence prediction result using a second sub-loss function;
training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges;
wherein the actual words and sentences are derived from the sample data.
Preferably, the intercepting or supplementing the first judicial literature to obtain a second judicial literature includes:
if the number of the words of the first judicial literature is less than the preset number of the words, adding n blank characters into the first judicial literature to obtain a second judicial literature, wherein n is the preset number of the words minus the number of the words of the first judicial literature;
if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing;
if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.
Preferably, the performing text replacement and sentence splicing on the sample data to obtain first training data includes:
randomly replacing characters in the sample data with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence.
The second aspect of the embodiment of the invention discloses a system for predicting a criminal name, which comprises:
the acquiring unit is used for acquiring a first judicial literature needing to predict a criminal name;
the processing unit is used for intercepting or supplementing the first judicial literature to obtain a second judicial literature;
and the prediction unit is used for inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction so as to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.
Preferably, when the language model is a BERT model, the prediction unit includes:
the processing module is used for carrying out character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents;
a first training module, configured to use the first training data as an input of a first BERT model, and train the first BERT model until the first BERT model converges by combining a preset first loss function and the sample data;
the setting module is used for taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model;
and the second training module is used for taking the sample data as the input of the second BERT model, combining a preset second loss function and a criminal name label corresponding to the sample data, training the second BERT model until the second BERT model converges, and obtaining the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.
Preferably, the first training module comprises:
the prediction submodule is used for taking the first training data as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position;
an error submodule, configured to calculate a text error between an actual text at the text replacement position and the text prediction result using a first sub-loss function, and calculate a sentence error between an actual sentence at the sentence splicing position and the sentence prediction result using a second sub-loss function;
a training sub-module for training the first BERT model based on the text error and sentence error in combination with the first training data until the first BERT model converges;
wherein the actual words and sentences are derived from the sample data.
A third aspect of the embodiments of the present invention discloses a storage medium, where the storage medium includes a stored program, and when the program runs, the device on which the storage medium is located is controlled to execute the method for predicting the name of a crime as disclosed in the first aspect of the embodiments of the present invention.
The fourth aspect of the embodiment of the present invention discloses a device for predicting a guilt name, which includes a storage medium and a processor, wherein the storage medium stores a program, and the processor is configured to run the program, and when the program runs, the method for predicting a guilt name as disclosed in the first aspect of the embodiment of the present invention is performed.
Based on the method and the system for predicting the criminal name provided by the embodiment of the invention, the method comprises the following steps: a first judicial essay whose name is to be predicted is obtained. And intercepting or supplementing the first judicial literature to obtain a second judicial literature. And inputting the second judicial literature into a pre-established criminal name prediction model for criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data. In the scheme, a language model is trained in advance through a large number of judicial documents to obtain a criminal name prediction model, the judicial documents of which the criminal names need to be predicted are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, cases do not need to be combed one by one manually and then judged, the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic structural diagram of a Transformer according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for predicting a guilt name according to an embodiment of the present invention;
FIG. 3 is a flow chart of obtaining a criminal name prediction model according to an embodiment of the present invention;
FIG. 4 is a flowchart of training a first BERT model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As known in the background art, at present, when legal judgment is carried out, judgment is usually made manually according to the description of case and referring to relevant laws and legal regulations. However, because of the large number of criminal names regulated by laws of various countries, higher time and labor cost are required for manually combing cases one by one and then judging. On the other hand, due to the diversity of languages, when a case is combed, a plurality of different description and expression modes are usually provided for the same conviction element, and the accuracy and the efficiency of judgment can be influenced.
Therefore, the embodiment of the invention provides a method and a system for predicting a name of a crime, wherein a language model is pre-trained by using a large amount of judicial documents to obtain a name of the crime prediction model, and the judicial documents needing to predict the name of the crime are used as input of the name of the crime prediction model to obtain a name of the crime prediction result, so that the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.
It should be noted that the (bidirectional encoder Representation from transforms, BERT) model involved in the embodiments of the present invention is a language model proposed by google, and has a strong abstraction capability for text in the field of natural language processing. The BERT model has a 12-layer Transformer structure. The concrete structure of the BERT model is as follows: segmenting a text input into an embedding (embedding) layer according to words, mapping the words into 768-dimensional vectors based on word vector mapping weights provided by Google, and obtaining a coding vector Enc through a 12-layer Transformer structure.
Referring to fig. 1, a schematic structural diagram of a Transformer is shown, in which fig. 1, the Transformer includes a multi-head Attention (Multihead Attention), a Residual Unit (Residual Unit), a Layer normalization (Layer Norm), and a two-Layer full join (FFN).
Referring to fig. 2, a flow of a method for predicting a guilty name provided by an embodiment of the present invention is shown, where the method for predicting a guilty name includes the following steps:
step S201: a first judicial essay whose name is to be predicted is obtained.
Step S202: and intercepting or supplementing the first judicial literature to obtain a second judicial literature.
It should be noted that, in step S202, the word count of the first judicial literature may be adjusted by interception or supplementation according to the word count requirement of the criminal name prediction model for the judicial literature. The text lengths of the text description paragraphs of the document facts in different judicial documents are inconsistent, some text paragraphs of the testimonial testimonials in the judicial documents are tens of thousands of characters, and some text paragraphs of the testimonial testimonials have only a few crosses. For witness certificates with text fields up to tens of thousands of characters, most witnesses are redundant for criminal name prediction, so the redundant witness needs to be truncated. For witness testifiers with character paragraphs having only a few crosses, because the input text lengths of the guiltname prediction model need to be consistent, character paragraphs with only a few crosses need to be subjected to word number adding processing until the word number meets the requirements of the guiltname prediction model.
In the process of specifically implementing step S202, if the number of words of the first judicial literature is less than the preset number of words, n blank characters are added to the first judicial literature to obtain the second judicial literature, where n is the preset number of words minus the number of words of the first judicial literature.
Such as: and if the preset word number is 600 words, if the word number of the first judicial literature is less than 600 words, adding blank characters into the first judicial literature until the word number of the first judicial literature is 600 words to obtain the second judicial literature.
And if the word number of the first judicial literature is equal to the preset word number, taking the first judicial literature as the second judicial literature.
If the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.
Such as: assuming that the preset word number is 600 words, if the word number of the first judicial literature is greater than 600 words, intercepting the first 550 words and the last 50 words of the first judicial literature, and taking the obtained first 550 words and the obtained last 50 words as the second judicial literature.
Step S203: and inputting the second judicial literature into a pre-established criminal name prediction model to carry out criminal name prediction, so as to obtain a criminal name prediction result corresponding to the first judicial literature.
In the process of specifically implementing step S203, a training set text is acquired from the CAIL2018 data set, and the training set text is processed in the manner of intercepting the text in step S202, so as to obtain sample data. And training a language model based on the sample data to obtain the criminal name prediction model.
It should be noted that the language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element analysis model. The types of language models include, but are not limited to: ELMo model, GPT model, and BERT model.
It should be noted that the CAIL2018 data set is a data set constructed based on a decision book published by a referee document network, and includes 170 ten thousand training data and 20 ten thousand testing data. The concrete construction mode is as follows: and deleting the contents related to the criminal name in the fact description segment corresponding to the judgment book, and taking the fact description segment as a training set text. And taking the name of the crime judged by the judgment section in the judgment book as a crime name label.
It should be noted that the reason for deleting the content related to the crime name in the fact description segment is to use the fact description segment as training data of a crime name prediction model, and if the content related to the crime name exists in the fact description segment, the crime name can be directly obtained from the fact description segment, instead of deducing the crime name according to the content in the fact description segment, which is inconsistent with the purpose of training the crime name prediction model.
In the embodiment of the invention, a language model is trained in advance through a large amount of judicial documents to obtain a criminal name prediction model, the judicial documents needing to predict the criminal names are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, cases do not need to be combed one by one manually and then judged, the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.
In the above embodiment of the present invention, in the process of training the language model to obtain the criminal name prediction model based on the sample data in step S203 disclosed in fig. 2, in the case that the language model is a BERT model, reference may be made to fig. 3, which shows a flowchart of obtaining the criminal name prediction model provided in the embodiment of the present invention, and includes the following steps:
step S301: and performing character replacement and sentence splicing on the sample data to obtain first training data.
In the process of specifically implementing step S301, the sample data is obtained by performing interception processing on the published judicial literature, and the specific way of the interception processing is referred to the corresponding content in step S202 disclosed in fig. 2 in the above embodiment of the present invention, which is not described herein again.
When carrying out character replacement and sentence splicing processing, randomly selecting characters from the sample data, replacing the characters with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence. Such as: randomly replacing the characters in the sample with "[ MASK ]". And selecting the sentence needing sentence splicing, wherein 50% of the probability is the next sentence corresponding to the sentence splicing, and 50% of the probability is the sentence splicing other sentences.
It should be noted that the above-mentioned randomly performing text replacement and sentence splicing is only applicable to illustration, and a technician may also specifically select which texts need to be replaced by characters and which sentences need to be spliced. Similarly, some characters may be replaced by characters every other characters with a preset number, and a sentence may be spliced every other sentences with a preset number, which is not specifically limited in the embodiment of the present invention.
Step S302: and taking the first training data as the input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges.
It should be noted that, in the process of specifically implementing step S302, the first training data is used as the input of the first BERT model, the sentences at the text and sentence splicing parts of the text replacement part are predicted, and the judgment capability of the first BERT model on the texts and sentences is trained by combining the error between the prediction result and the actual result. For example: for a complete sentence, randomly replacing a character in the sentence with a preset character, and training the first BERT model to judge what the actual character of the preset character part of the sentence is. And for the whole content composed of a plurality of sentences, sentence splicing processing is carried out on a certain sentence, and the first BERT model is trained to judge what the actual sentence corresponding to the sentence splicing part is.
Step S303: and taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model.
In the process of specifically implementing step S303, the converged parameters of the embedding layer and 12-layer transformer structure in the first BERT model are used as initialization parameters of the embedding layer and 12-layer transformer structure in the second BERT model.
Step S304: and taking the sample data as the input of the second BERT model, and training the second BERT model by combining a preset second loss function and a criminal name label corresponding to the sample data until the second BERT model converges to obtain the criminal name prediction model.
In the process of the specific implementation step S304, a 768-dimensional vector is selected after the coding vector Enc in the second BERT model, the 768-dimensional vector connects the number of categories of guilties to be predicted through a 768-dimensional full-link layer, and the second BERT model is trained by using a weighted cross entropy loss function (sigmoid cross entropy) as the second loss function. The specific training process is as shown in process A1-A3. It should be noted that the dimensions of the vectors and the fully-connected layer referred to above include, but are not limited to 768 dimensions.
A1: and inputting the sample data into the second BERT model to obtain a criminal name prediction result.
A2: and calculating the error of the criminal name prediction result and the criminal name label by using the second loss function.
A3: and if the error is smaller than a threshold value, constructing the guiltname prediction model based on the current model parameters of the second BERT model. And if the error is larger than a threshold value, adjusting the model parameters of the second BERT model based on the error, training the second BERT model based on the sample data and the criminal name label until the error is smaller than the threshold value, and determining the trained second BERT model as the criminal name prediction model.
It should be noted that the guilt name tag is obtained from the decision section corresponding to the published judicial paper.
It should be noted that the above processes a1-A3 are only for illustration.
It should be noted that an initial parameter is needed for training the neural network model, the initial parameter of the conventional neural network model generally adopts a random parameter with a normally distributed distribution with a small mean value and variance of 0, and the initial parameter determination method of the conventional neural network model has a poor prediction effect on text elements. In the embodiment of the invention, the first BERT model is trained in advance until convergence, and when the second BERT model is trained, the parameter structure of the trained first BERT model is used for initializing the parameter of the second BERT model, so that sufficient legal field prior information is provided for the second BERT model, and the element prediction accuracy of the element analysis model is effectively improved.
In the embodiment of the invention, the first BERT model is trained through sample data until convergence, the converged model parameters of the first BERT model are used as the initialization model parameters of the second BERT model, and the second BERT model is trained through the sample data until convergence to obtain a criminal name prediction model. The criminal name prediction model is used for carrying out criminal name prediction on the judicial documents, manual work is not needed for combing cases one by one and then judgment is conducted, the labor cost and the time cost can be effectively reduced, and the judgment accuracy and the judgment efficiency are improved.
The above-mentioned process for training the first BERT model, which is disclosed in step S302 of fig. 3 in the embodiment of the present invention, is shown in fig. 4, which is a flowchart for training the first BERT model provided in the embodiment of the present invention, and includes the following steps:
step S401: and taking the first training data as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position.
It should be noted that, for the process of obtaining the first training data, reference is made to the content corresponding to step S301 disclosed in fig. 3 in the embodiment of the present invention, and details are not repeated here.
Step S402: and calculating a text error between the actual text of the text replacement position and the text prediction result by using a first sub-loss function, and calculating a sentence error between the actual sentence of the sentence splicing position and the sentence prediction result by using a second sub-loss function.
In the process of implementing step S402 specifically, a first 768-dimensional vector is selected from the encoding vectors Enc, and the 768-dimensional vector is connected to the first sub-loss function and the second sub-loss function through a 768-dimensional full-connection layer. It should be noted that the dimensions of the vectors and the fully-connected layer referred to above include, but are not limited to 768 dimensions.
It should be noted that the first sub-loss function includes, but is not limited to: a multi-class softmax cross-entropy loss function, the second sub-loss function including, but not limited to: softmax cross entropy loss function of two classes.
Step S403: training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges.
In the process of implementing step S403 specifically, the actual characters and the actual sentences are derived from the sample data, that is, the actual characters at the character replacement positions and the actual sentences at the sentence splicing positions can be obtained through the sample data. And if the text error and the sentence error are both smaller than a threshold value, converging the first BERT. If the text error and the sentence error are both larger than the threshold value, the model parameters of the first BERT model are adjusted based on the text error and the sentence error, and the first BERT model is continuously trained by using the first training data until the text error and the sentence error are both smaller than the threshold value.
In the embodiment of the invention, before the acquirement of the criminal name prediction model, the first BERT model is trained until convergence through first training data based on a first sub-loss function and a second sub-loss function, the model parameters of the converged first BERT model are used as the initialization model parameters of the second BERT model, and then the second BERT model is trained until convergence to obtain the criminal name prediction model based on the training data, so that the accuracy of the criminal name prediction model for predicting the criminal name can be improved.
Corresponding to the above-described method for predicting a guilty name disclosed in the embodiment of the present invention, referring to fig. 5, the embodiment of the present invention further provides a schematic structural diagram of a guilty name prediction system, where the guilty name prediction system includes: an acquisition unit 501, a truncation unit 502, and a prediction unit 503.
An obtaining unit 501 is configured to obtain a first judicial literature whose name is to be predicted.
The processing unit 502 is configured to intercept or perform supplementary processing on the first judicial literature to obtain a second judicial literature.
In a specific implementation, the processing unit 502 is specifically configured to add n blank characters to the first judicial literature to obtain the second judicial literature if the number of words of the first judicial literature is less than a preset number of words, where n is the preset number of words minus the number of words of the first judicial literature;
if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing;
if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.
For the processing procedure of the first judicial literature, reference is made to the content corresponding to step S202 disclosed in fig. 2 of the embodiment of the present invention.
The predicting unit 503 is configured to input the second judicial literature into a pre-established guiltname prediction model for performing guiltname prediction, so as to obtain a guiltname prediction result corresponding to the first judicial literature, where the guiltname prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine an initialization model parameter of the element analysis model. For the obtaining process and the processing process of the sample data, the content corresponding to the step S203 disclosed in fig. 2 in the embodiment of the present invention is referred to.
In the embodiment of the invention, a language model is trained in advance through a large amount of judicial documents to obtain a criminal name prediction model, the judicial documents needing to predict the criminal names are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, cases do not need to be combed one by one manually and then judged, the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.
Referring to fig. 6, a schematic structural diagram of a criminal name prediction system according to an embodiment of the present invention is shown, where when the language model is a BERT model, the prediction unit 503 includes: a processing module 5031, a first training module 5032, a setup module 5033, and a second training module 5034.
The processing module 5031 is configured to perform text replacement and sentence splicing on the sample data to obtain first training data, where the sample data is obtained by intercepting a published judicial literature.
In a specific implementation, the processing module 5031 is specifically configured to randomly replace the text in the sample data with a preset character, and randomly splice a second sentence with a first sentence in the sample data, where the second sentence is a next sentence corresponding to the first sentence or is not a next sentence corresponding to the first sentence. For specific content, refer to the content corresponding to step S301 disclosed in fig. 3 of the above embodiment of the present invention.
A first training module 5032, configured to use the first training data as an input of a first BERT model, and train the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges.
A setting module 5033, configured to use the converged model parameters of the first BERT model as initialization model parameters of a second BERT model.
A second training module 5034, configured to use the sample data as an input of the second BERT model, and combine a preset second loss function and a guilty name tag corresponding to the sample data to train the second BERT model until the second BERT model converges, so as to obtain the guilty name prediction model, where the guilty name tag is obtained from a decision segment corresponding to the published judicial literature. The specific process of training the second BERT model refers to the content corresponding to step S304 disclosed in fig. 3 of the above embodiment of the present invention.
In the embodiment of the invention, the first BERT model is trained through sample data until convergence, the converged model parameters of the first BERT model are used as the initialization model parameters of the second BERT model, and the second BERT model is trained through the sample data until convergence to obtain a criminal name prediction model. The criminal name prediction model is used for carrying out criminal name prediction on the judicial documents, manual work is not needed for combing cases one by one and then judgment is conducted, the labor cost and the time cost can be effectively reduced, and the judgment accuracy and the judgment efficiency are improved.
Preferably, referring to fig. 7 in conjunction with fig. 6, a schematic structural diagram of a criminal name prediction system provided in an embodiment of the present invention is shown, where the first training module 5032 includes: a predictor sub-module 50321, an error sub-module 50322, and a training sub-module 50323.
The prediction sub-module 50321 is configured to use the first training data as an input of the first BERT model to obtain a text prediction result corresponding to the text replacement position and obtain a sentence prediction result corresponding to the sentence splicing position.
An error sub-module 50322 for calculating a text error between the actual text at the text replacement position and the text prediction result using a first sub-loss function and calculating a sentence error between the actual sentence at the sentence splicing position and the sentence prediction result using a second sub-loss function.
A training sub-module 50323 configured to train the first BERT model with the first training data based on the text error and the sentence error until the first BERT model converges. The process of training the first BERT model is described in the embodiment of the present invention, which corresponds to the step S403 disclosed in fig. 4.
It should be noted that the actual words and sentences are derived from the sample data.
In the embodiment of the invention, before the acquirement of the criminal name prediction model, the first BERT model is trained until convergence through first training data based on a first sub-loss function and a second sub-loss function, the model parameters of the converged first BERT model are used as the initialization model parameters of the second BERT model, and then the second BERT model is trained until convergence to obtain the criminal name prediction model based on the training data, so that the accuracy of the criminal name prediction model for predicting the criminal name can be improved.
Based on the crime name prediction system disclosed by the embodiment of the invention, the modules can be realized by a hardware device consisting of a processor and a memory. The method specifically comprises the following steps: the modules are stored in the memory as program units, and the program units stored in the memory are executed by the processor to realize the criminal name prediction.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to be one or more, and the criminal name prediction is realized by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
Further, the embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the method for predicting the name of the guilty.
Further, an embodiment of the present invention provides a criminal name prediction apparatus, which includes a processor, a memory, and a program stored in the memory and executable on the processor, where the processor executes the program to implement the following steps: acquiring a first judicial literature needing to predict a criminal name; intercepting or supplementing the first judicial literature to obtain a second judicial literature; and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.
Wherein, when the language model is a BERT model, the process of obtaining a criminal name prediction model by training the language model based on sample data includes: performing character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents; taking the first training data as an input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges; taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model; and taking the sample data as the input of the second BERT model, and training the second BERT model until the second BERT model converges by combining a preset second loss function and a criminal name label corresponding to the sample data to obtain the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.
Wherein, the training the first BERT model by using the first training data as the input of the first BERT model and combining a preset first loss function and the sample data until the first BERT model converges comprises: the first training data is used as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position; calculating a text error between an actual text of the text replacement position and the text prediction result using a first sub-loss function, and calculating a sentence error between an actual sentence of the sentence splicing position and the sentence prediction result using a second sub-loss function; training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges; wherein the actual words and sentences are derived from the sample data.
Wherein, the intercepting or supplementing processing the first judicial literature to obtain a second judicial literature comprises: if the number of the words of the first judicial literature is less than the preset number of the words, adding n blank characters into the first judicial literature to obtain a second judicial literature, wherein n is the preset number of the words minus the number of the words of the first judicial literature; if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing; if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.
Performing character replacement and sentence splicing on the sample data to obtain first training data, including: randomly replacing characters in the sample data with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence.
Further, an embodiment of the present invention provides a storage medium having a program stored thereon, where the program is executed by a processor to implement a criminal name prediction.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device: acquiring a first judicial literature needing to predict a criminal name; intercepting or supplementing the first judicial literature to obtain a second judicial literature; and inputting the second judicial literature into a pre-established criminal name prediction model to perform criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data, and the language model is used for pre-training according to a preset number of legal texts to determine initialization model parameters of the element analysis model.
Wherein, when the language model is a BERT model, the process of obtaining a criminal name prediction model by training the language model based on sample data includes: performing character replacement and sentence splicing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting and processing open judicial documents; taking the first training data as an input of a first BERT model, and training the first BERT model by combining a preset first loss function and the sample data until the first BERT model converges; taking the converged model parameters of the first BERT model as initialization model parameters of a second BERT model; and taking the sample data as the input of the second BERT model, and training the second BERT model until the second BERT model converges by combining a preset second loss function and a criminal name label corresponding to the sample data to obtain the criminal name prediction model, wherein the criminal name label is obtained from a judgment section corresponding to the public judicial literature.
Wherein, the training the first BERT model by using the first training data as the input of the first BERT model and combining a preset first loss function and the sample data until the first BERT model converges comprises: the first training data is used as the input of the first BERT model to obtain a character prediction result corresponding to a character replacement position and a sentence prediction result corresponding to a sentence splicing position; calculating a text error between an actual text of the text replacement position and the text prediction result using a first sub-loss function, and calculating a sentence error between an actual sentence of the sentence splicing position and the sentence prediction result using a second sub-loss function; training the first BERT model in conjunction with the first training data based on the text error and sentence error until the first BERT model converges; wherein the actual words and sentences are derived from the sample data.
Wherein, the intercepting or supplementing processing the first judicial literature to obtain a second judicial literature comprises: if the number of the words of the first judicial literature is less than the preset number of the words, adding n blank characters into the first judicial literature to obtain a second judicial literature, wherein n is the preset number of the words minus the number of the words of the first judicial literature; if the word number of the first judicial writing is equal to the preset word number, taking the first judicial writing as the second judicial writing; if the number of words of the first judicial literature is more than the preset number of words, intercepting the first x words and the last y words of the first judicial literature, and taking the obtained x words and the obtained y words as the second judicial literature, wherein the sum of x and y is equal to the preset number of words.
Performing character replacement and sentence splicing on the sample data to obtain first training data, including: randomly replacing characters in the sample data with preset characters, and randomly splicing a second sentence for a first sentence in the sample data, wherein the second sentence is a next sentence corresponding to the first sentence or is not the next sentence corresponding to the first sentence.
In summary, the embodiment of the present invention provides a method and a system for predicting a criminal name, where the method includes: a first judicial essay whose name is to be predicted is obtained. And intercepting or supplementing the first judicial literature to obtain a second judicial literature. And inputting the second judicial literature into a pre-established criminal name prediction model for criminal name prediction to obtain a criminal name prediction result corresponding to the first judicial literature, wherein the criminal name prediction model is obtained by training a language model based on sample data. In the scheme, a language model is trained in advance through a large number of judicial documents to obtain a criminal name prediction model, the judicial documents of which the criminal names need to be predicted are intercepted or supplemented, and redundant testimony parts in the judicial documents are filtered. The judicial literature subjected to intercepting or supplementing processing is used as the input of the criminal name prediction model to obtain the criminal name prediction result of the judicial literature, and cases do not need to be combed one by one manually and then judged, so that the labor cost and the time cost are saved, and the accuracy and the efficiency of judgment are improved.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, client, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1.一种罪名预测方法,其特征在于,所述方法包括:1. a crime prediction method, is characterized in that, described method comprises: 获取需要预测罪名的第一司法文书;Obtain the first judicial paperwork required to predict the offense; 对所述第一司法文书进行截取或补充处理,得到第二司法文书;Intercepting or supplementing the first judicial document to obtain a second judicial document; 将所述第二司法文书输入预先建立的罪名预测模型进行罪名预测,得到所述第一司法文书对应的罪名预测结果,其中,所述罪名预测模型由基于样本数据训练语言模型模型获得,所述语言模型用于根据预设数量的法律文本进行预训练确定所述要素解析模型的初始化模型参数。Input the second judicial document into a pre-established crime prediction model for crime prediction, and obtain a crime prediction result corresponding to the first judicial document, wherein the crime prediction model is obtained by training a language model model based on sample data, and the crime prediction model is obtained by training a language model model based on sample data. The language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element parsing model. 2.根据权利要求1所述的方法,其特征在于,当所述语言模型为BERT模型,所述由基于样本数据训练语言模型获得罪名预测模型的过程,包括:2. The method according to claim 1, wherein when the language model is a BERT model, the process of obtaining a crime prediction model by training the language model based on sample data comprises: 对所述样本数据进行文字替换以及句子拼接处理得到第一训练数据,其中,所述样本数据由对公开的司法文书进行截取处理获得;Perform text replacement and sentence splicing processing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting public judicial documents; 将所述第一训练数据作为第一BERT模型的输入,结合预设的第一损失函数和所述样本数据,训练所述第一BERT模型直至所述第一BERT模型收敛;Using the first training data as the input of the first BERT model, and combining the preset first loss function and the sample data, train the first BERT model until the first BERT model converges; 将收敛后的所述第一BERT模型的模型参数作为第二BERT模型的初始化模型参数;Taking the model parameters of the converged first BERT model as the initialization model parameters of the second BERT model; 将所述样本数据作为所述第二BERT模型的输入,结合预设的第二损失函数和所述样本数据对应的罪名标签,训练所述第二BERT模型直至所述第二BERT模型收敛,得到所述罪名预测模型,其中,所述罪名标签由所述公开的司法文书对应的判决段中获得。Take the sample data as the input of the second BERT model, combine the preset second loss function and the crime label corresponding to the sample data, train the second BERT model until the second BERT model converges, and obtain In the crime prediction model, the crime label is obtained from the judgment section corresponding to the public judicial document. 3.根据权利要求2所述的方法,其特征在于,所述将所述第一训练数据作为第一BERT模型的输入,结合预设的第一损失函数和所述样本数据训练所述第一BERT模型直至所述第一BERT模型收敛,包括:3. The method according to claim 2, wherein the first training data is used as the input of the first BERT model, and the first training data is combined with a preset first loss function and the sample data to train the first BERT model. BERT model until the first BERT model converges, including: 将所述第一训练数据作为所述第一BERT模型的输入,得到对应文字替换位置的文字预测结果,以及得到对应句子拼接位置的句子预测结果;Using the first training data as the input of the first BERT model, obtain the text prediction result of the corresponding text replacement position, and obtain the sentence prediction result of the corresponding sentence splicing position; 使用第一子损失函数计算所述文字替换位置的实际文字和所述文字预测结果之间的文字误差,以及使用第二子损失函数计算所述句子拼接位置的实际句子与所述句子预测结果之间的句子误差;The first sub-loss function is used to calculate the text error between the actual text at the text replacement position and the text prediction result, and the second sub-loss function is used to calculate the difference between the actual sentence at the sentence splicing position and the sentence prediction result. sentence errors between 基于所述文字误差和句子误差,结合所述第一训练数据训练所述第一BERT模型直至所述第一BERT模型收敛;Based on the text error and sentence error, train the first BERT model in combination with the first training data until the first BERT model converges; 其中,所述实际文字和实际句子来源于所述样本数据。Wherein, the actual text and the actual sentence are derived from the sample data. 4.根据权利要求1所述的方法,其特征在于,所述对所述第一司法文书进行截取或补充处理,得到第二司法文书,包括:4. The method according to claim 1, wherein the intercepting or supplementary processing of the first judicial document to obtain a second judicial document, comprising: 若所述第一司法文书的字数少于预设字数,则向所述第一司法文书中添加n个空白字符,得到所述第二司法文书,n为所述预设字数减去所述第一司法文书的字数;If the number of characters in the first judicial document is less than the preset number of characters, n blank characters are added to the first judicial document to obtain the second judicial document, where n is the preset number of characters minus the first The number of words in a judicial document; 若所述第一司法文书的字数等于所述预设字数,则将所述第一司法文书作为所述第二司法文书;If the number of words in the first judicial document is equal to the preset number of words, the first judicial document is used as the second judicial document; 若所述第一司法文书的字数多于所述预设字数,则截取所述第一司法文书的前x个文字和后y个文字,将得到的所述x个文字和y个文字作为所述第二司法文书,其中,x与y的和等于所述预设字数。If the number of characters in the first judicial document is more than the preset number of characters, the first x characters and the last y characters of the first judicial document are intercepted, and the obtained x characters and y characters are used as the The second judicial document, wherein the sum of x and y is equal to the preset number of words. 5.根据权利要求2所述的方法,其特征在于,所述对所述样本数据进行文字替换以及句子拼接处理得到第一训练数据,包括:5. The method according to claim 2, wherein the sample data is subjected to text replacement and sentence splicing to obtain the first training data, comprising: 随机将所述样本数据中的文字替换为预设字符,以及随机为所述样本数据中的第一语句拼接第二语句,其中,所述第二语句为所述第一语句对应的下一句或不是所述第一语句对应的下一句。Randomly replace the text in the sample data with preset characters, and randomly splicing a second sentence for the first sentence in the sample data, wherein the second sentence is the next sentence corresponding to the first sentence or not the next sentence corresponding to the first sentence. 6.一种罪名预测系统,其特征在于,所述系统包括:6. A crime prediction system, wherein the system comprises: 获取单元,用于获取需要预测罪名的第一司法文书;The obtaining unit is used to obtain the first judicial document for which the crime needs to be predicted; 处理单元,用于对所述第一司法文书进行截取或补充处理,得到第二司法文书;a processing unit, configured to intercept or supplement the first judicial document to obtain a second judicial document; 预测单元,用于将所述第二司法文书输入预先建立的罪名预测模型进行罪名预测,得到所述第一司法文书对应的罪名预测结果,其中,所述罪名预测模型由基于样本数据训练语言模型获得,所述语言模型用于根据预设数量的法律文本进行预训练确定所述要素解析模型的初始化模型参数。A prediction unit, configured to input the second judicial document into a pre-established crime prediction model for crime prediction, and obtain a crime prediction result corresponding to the first judicial document, wherein the crime prediction model is trained by a language model based on sample data Obtained, the language model is used for pre-training according to a preset number of legal texts to determine the initialization model parameters of the element parsing model. 7.根据权利要求6所述的系统,其特征在于,当所述语言模型为BERT模型,所述预测单元包括:7. The system according to claim 6, wherein when the language model is a BERT model, the prediction unit comprises: 处理模块,用于对所述样本数据进行文字替换以及句子拼接处理得到第一训练数据,其中,所述样本数据由对公开的司法文书进行截取处理获得;a processing module, configured to perform text replacement and sentence splicing processing on the sample data to obtain first training data, wherein the sample data is obtained by intercepting public judicial documents; 第一训练模块,用于将所述第一训练数据作为第一BERT模型的输入,结合预设的第一损失函数和所述样本数据,训练所述第一BERT模型直至所述第一BERT模型收敛;The first training module is used to use the first training data as the input of the first BERT model, and combine the preset first loss function and the sample data to train the first BERT model until the first BERT model convergence; 设置模块,用于将收敛后的所述第一BERT模型的模型参数作为第二BERT模型的初始化模型参数;A setting module is used to use the model parameters of the converged first BERT model as the initialization model parameters of the second BERT model; 第二训练模块,用于将所述样本数据作为所述第二BERT模型的输入,结合预设的第二损失函数和所述样本数据对应的罪名标签,训练所述第二BERT模型直至所述第二BERT模型收敛,得到所述罪名预测模型,其中,所述罪名标签由所述公开的司法文书对应的判决段中获得。The second training module is configured to use the sample data as the input of the second BERT model, combine the preset second loss function and the crime label corresponding to the sample data, and train the second BERT model until the The second BERT model converges to obtain the crime prediction model, wherein the crime label is obtained from the judgment section corresponding to the public judicial document. 8.根据权利要求7所述的系统,其特征在于,所述第一训练模块包括:8. The system of claim 7, wherein the first training module comprises: 预测子模块,用于将所述第一训练数据作为所述第一BERT模型的输入,得到对应文字替换位置的文字预测结果,以及得到对应句子拼接位置的句子预测结果;A prediction submodule, configured to use the first training data as the input of the first BERT model, obtain the text prediction result of the corresponding text replacement position, and obtain the sentence prediction result of the corresponding sentence splicing position; 误差子模块,用于使用第一子损失函数计算所述文字替换位置的实际文字和所述文字预测结果之间的文字误差,以及使用第二子损失函数计算所述句子拼接位置的实际句子与所述句子预测结果之间的句子误差;The error sub-module is used to calculate the text error between the actual text of the text replacement position and the text prediction result using the first sub-loss function, and use the second sub-loss function to calculate the actual sentence and the sentence splicing position. the sentence error between the sentence prediction results; 训练子模块,用于基于所述文字误差和句子误差,结合所述第一训练数据训练所述第一BERT模型直至所述第一BERT模型收敛;a training sub-module for training the first BERT model in combination with the first training data based on the text error and sentence error until the first BERT model converges; 其中,所述实际文字和实际句子来源于所述样本数据。Wherein, the actual text and the actual sentence are derived from the sample data. 9.一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行如权利要求1-5中任一所述的罪名预测方法。9. A storage medium, wherein the storage medium comprises a stored program, wherein when the program is run, a device where the storage medium is located is controlled to perform the crime prediction according to any one of claims 1-5 method. 10.一种罪名预测设备,其特征在于,包括存储介质和处理器,所述存储介质存储有程序,所述处理器用于运行所述程序,其中,所述程序运行时执行如权利要求1-5中任一所述的罪名预测方法。10. A crime prediction device, characterized in that it comprises a storage medium and a processor, wherein the storage medium stores a program, and the processor is configured to run the program, wherein the program executes the program according to claim 1- The crime prediction method described in any one of 5.
CN201910695855.3A 2019-07-30 2019-07-30 Criminal name prediction method and system Pending CN112396201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910695855.3A CN112396201A (en) 2019-07-30 2019-07-30 Criminal name prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910695855.3A CN112396201A (en) 2019-07-30 2019-07-30 Criminal name prediction method and system

Publications (1)

Publication Number Publication Date
CN112396201A true CN112396201A (en) 2021-02-23

Family

ID=74601146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910695855.3A Pending CN112396201A (en) 2019-07-30 2019-07-30 Criminal name prediction method and system

Country Status (1)

Country Link
CN (1) CN112396201A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010092102A (en) * 2008-10-03 2010-04-22 Koji Ishibashi Information presentation method, information presentation program, computer readable recording medium, and information presentation device
CN106897268A (en) * 2017-02-28 2017-06-27 科大讯飞股份有限公司 Text semantic understanding method, device and system
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109308355A (en) * 2018-09-17 2019-02-05 清华大学 Method and device for predicting legal judgment result
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A Sentence Prediction Method Based on Multi-task Artificial Neural Network
CN109376964A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge prediction technique based on Memory Neural Networks
CN109597889A (en) * 2018-11-19 2019-04-09 刘品新 A kind of method and system of determining a crime based on text classification and deep neural network
CN109871451A (en) * 2019-01-25 2019-06-11 中译语通科技股份有限公司 A kind of Relation extraction method and system incorporating dynamic term vector
CN109933789A (en) * 2019-02-27 2019-06-25 中国地质大学(武汉) A method and system for relation extraction in judicial field based on neural network
CN109949185A (en) * 2019-03-15 2019-06-28 南京邮电大学 Judicial case discrimination system and method based on event tree analysis
CN109992782A (en) * 2019-04-02 2019-07-09 深圳市华云中盛科技有限公司 Legal documents name entity recognition method, device and computer equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010092102A (en) * 2008-10-03 2010-04-22 Koji Ishibashi Information presentation method, information presentation program, computer readable recording medium, and information presentation device
CN106897268A (en) * 2017-02-28 2017-06-27 科大讯飞股份有限公司 Text semantic understanding method, device and system
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109308355A (en) * 2018-09-17 2019-02-05 清华大学 Method and device for predicting legal judgment result
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A Sentence Prediction Method Based on Multi-task Artificial Neural Network
CN109597889A (en) * 2018-11-19 2019-04-09 刘品新 A kind of method and system of determining a crime based on text classification and deep neural network
CN109376964A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge prediction technique based on Memory Neural Networks
CN109871451A (en) * 2019-01-25 2019-06-11 中译语通科技股份有限公司 A kind of Relation extraction method and system incorporating dynamic term vector
CN109933789A (en) * 2019-02-27 2019-06-25 中国地质大学(武汉) A method and system for relation extraction in judicial field based on neural network
CN109949185A (en) * 2019-03-15 2019-06-28 南京邮电大学 Judicial case discrimination system and method based on event tree analysis
CN109992782A (en) * 2019-04-02 2019-07-09 深圳市华云中盛科技有限公司 Legal documents name entity recognition method, device and computer equipment

Similar Documents

Publication Publication Date Title
CN110287477B (en) Entity emotion analysis method and related device
CN110276066B (en) Entity association relation analysis method and related device
JP2023539532A (en) Text classification model training method, text classification method, device, equipment, storage medium and computer program
US11972759B2 (en) Audio mistranscription mitigation
US11042710B2 (en) User-friendly explanation production using generative adversarial networks
CN110929524A (en) Data screening method, device, equipment and computer readable storage medium
US20230297784A1 (en) Automated decision modelling from text
CN112395861A (en) Method and device for correcting Chinese text and computer equipment
CN116324929A (en) Answer span correction
US20220358594A1 (en) Counterfactual e-net learning for contextual enhanced earnings call analysis
US20220398397A1 (en) Automatic rule prediction and generation for document classification and validation
AU2019246890B2 (en) Systems and methods for validating domain specific models
CN116702765A (en) Event extraction method and device and electronic equipment
US20210312831A1 (en) Methods and systems for assisting pronunciation correction
CN114647981B (en) Data processing method, device, storage medium and program product
US11099107B2 (en) Component testing plan considering distinguishable and undistinguishable components
CN118378299A (en) Document desensitizing method, device, computer program product and electronic equipment
CN112396201A (en) Criminal name prediction method and system
CN110969549B (en) Judicial data processing method and system
CN117764043A (en) Visual encoder training and describing method, device, equipment and medium
US12093298B2 (en) Apparatus and method for training model for document summarization
US20230147585A1 (en) Dynamically enhancing supervised learning
US20220284280A1 (en) Data labeling for synthetic data generation
CN112329436A (en) Legal document element analysis method and system
CN114254588A (en) Data tag processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210223