CN114860900A

CN114860900A - Sentencing prediction method and device

Info

Publication number: CN114860900A
Application number: CN202210365513.7A
Authority: CN
Inventors: 张淯易; 黄继超; 陈维强
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-08-05
Anticipated expiration: 2042-04-07
Also published as: CN114860900B

Abstract

The present application discloses a sentencing prediction method and device, which are used to solve the problems of incomplete and accurate judgment results in the prior art. The method proposed in this application includes: obtaining case-related information and crime fact description text; performing vectorization processing on word segments included in multiple chapters of the crime fact description text to obtain a first word vector corresponding to each segment Perform vectorization processing to obtain the second word vector corresponding to each word segment; perform feature extraction on the first word vector included in multiple chapters to obtain the first feature vector of each chapter in the multiple chapters. The feature vector determines the predicted category of each chapter; perform feature extraction on the second word vector included in the case-related information to obtain the second feature vector of the case-related information; according to the predicted categories of multiple first feature vectors corresponding to multiple chapters And the second eigenvector for law prediction, crime prediction and sentence prediction.

Description

Sentencing prediction method and device

技术领域technical field

本申请涉及信息技术领域，尤其涉及一种量刑预测方法及装置。The present application relates to the field of information technology, and in particular, to a sentencing prediction method and device.

背景技术Background technique

目前，法律判决预测主要涉及三个子任务：即罪名、法条和刑期的预测。刑期的预测过程是十分复杂的，不仅要考虑被告人的基本状态和作案过程，还要考虑被告人是否积极认罪，自首等相关的因素。首先，现有技术方案缺失外部信息，只重视案件的犯罪事实描述。犯罪事实描述在进行罪名预测和法条预测时确实起着不可替代的作用，但是在刑期预测的过程中，仅仅依靠犯罪事实描述中的信息已经不太现实。其次，现有技术方案中，从犯罪事实描述中分别提取出三个子任务分别对应的相关信息，根据单个子任务对应的信息进行预测，没有考虑各个子任务之间的关联关系，导致了判别结果在全面性和准确性方面存在一定的误差。At present, the prediction of legal sentence mainly involves three sub-tasks: the prediction of the crime, the law and the sentence. The prediction process of the sentence is very complicated, not only the basic state of the defendant and the process of committing the crime, but also the relevant factors such as whether the defendant actively pleads guilty and surrenders. First, the existing technical solution lacks external information and only pays attention to the description of the criminal facts of the case. The description of criminal facts does play an irreplaceable role in predicting crimes and laws, but in the process of predicting the sentence, it is not realistic to rely only on the information in the description of criminal facts. Secondly, in the prior art solution, the relevant information corresponding to the three sub-tasks is extracted from the description of the crime facts, and the prediction is made according to the information corresponding to a single sub-task, without considering the relationship between the sub-tasks, resulting in the discrimination result. There are certain errors in comprehensiveness and accuracy.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种量刑预测方法及装置，用以解决现有技术中判别结果不全面以及准确性地的问题。The embodiments of the present application provide a sentencing prediction method and device, which are used to solve the problems of incomplete and accurate judgment results in the prior art.

第一方面，本申请实施例提供了一种量刑预测方法，包括：In the first aspect, the embodiments of the present application provide a method for predicting sentencing, including:

获取案件相关信息以及犯罪事实描述文本，所述案件相关信息包括人证、物证、被告人信息、证词、嫌疑人口供以及笔录中至少一项；所述犯罪事实描述文本包括多个篇章，所述多个篇章中每个篇章包括多个分句，所述多个句中每个分句包括多个分词；Obtain case-related information and a description text of the crime facts, the case-related information includes at least one of witnesses, physical evidence, defendant information, testimony, the suspect's confession, and transcripts; the crime fact description text includes multiple chapters, and the Each of the plurality of chapters includes a plurality of clauses, and each of the plurality of sentences includes a plurality of participles;

对所述多个篇章包括的分词进行向量化处理以得到每个分词对应第一词向量，以及对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量；对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量，并根据每个篇章的第一特征向量确定所述每个篇章的预测类别，所述预测类别为法条类别或者罪名类别或刑期类别；performing vectorization processing on the word segments included in the plurality of chapters to obtain a first word vector corresponding to each word segment, and performing vectorization processing on the word segments included in the case-related information to obtain a second word vector corresponding to each segment; Perform feature extraction on the first word vectors included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, and determine the predicted category of each chapter according to the first feature vector of each chapter, so The prediction category mentioned above is the category of the law or the category of the crime or the category of the sentence;

对所述案件相关信息包括的第二词向量进行特征提取得到所述案件相关信息的第二特征向量；根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测。Perform feature extraction on the second word vector included in the case-related information to obtain a second feature vector of the case-related information; according to the predicted categories of the multiple first feature vectors corresponding to the multiple chapters and the second feature The vector is used for law prediction, crime prediction and sentence prediction.

基于上述方案，在犯罪事实描述文本的基础上引入了案件相关信息共同进行量刑预测，提高了预测效果。同时由于刑期的复杂性，在多任务上引入了拓扑结构，先预测罪名，再预测法条，最后在罪名和法条的基础之上进行刑期的预测，进一步提高了刑期预测的效果。Based on the above scheme, the relevant information of the case is introduced on the basis of the description text of the crime facts to jointly predict the sentencing, which improves the prediction effect. At the same time, due to the complexity of the sentence, a topology structure is introduced in the multi-task, first predicting the crime, then predicting the law, and finally predicting the sentence on the basis of the crime and the law, which further improves the effect of sentence prediction.

一种可能的实现方式中，对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量，包括：基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章，所述经过滤第一篇章包括的多个分句中的第一词向量均与量刑预测相关，所述第一篇章为所述多个篇章中的任一个篇章；将所述经过滤第一篇章包括的多个分句进行组合得到多个分句组合，所述多个分句组合中每个分句组合包括至少两个分句；In a possible implementation manner, performing feature extraction on the first word vectors included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, including: pairing based on the first word vector included in the first chapter. The word segmentation in the first chapter is filtered to obtain the filtered first chapter, the first word vectors in the multiple clauses included in the filtered first chapter are all related to sentencing prediction, and the first chapter is the Describe any one of the multiple chapters; combine the multiple clauses included in the filtered first chapter to obtain multiple clause combinations, and each clause combination in the multiple clause combinations includes at least two clauses clause;

通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量；对多个分句组合的词级特征向量进行特征拼接，以得到每个分句组合的语句向量表示；通过第二语义向量编码器每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量；对多个分句组合的分句级特征向量进行特征拼接以得到所述第一特征向量。The first semantic vector encoder is used to extract features for each clause combination to obtain word-level feature vectors of each clause combination; perform feature splicing on word-level feature vectors of multiple clause combinations to obtain each segment The sentence vector representation of the sentence combination; the feature extraction is performed on the sentence vector of each sentence combination by the second semantic vector encoder to obtain the sentence level feature vector of each sentence combination; The feature vector performs feature splicing to obtain the first feature vector.

基于上述方案，通过特征提取，获得了各个篇章中多个分词之间的上下文特征以及分句之间的上下文特征。Based on the above scheme, through feature extraction, contextual features between multiple word segments and contextual features between clauses in each chapter are obtained.

一种可能的实现方式中，所述方法还包括：编码所述经过滤第一篇章包括的多个第一词向量对应的位置向量，所述第一词向量对应的位置向量用于表征所述第一词向量对应的分词在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个分词的第一词向量与对应的位置向量进行融合得到第一篇章中多个分词的融合词向量；In a possible implementation manner, the method further includes: encoding a position vector corresponding to a plurality of first word vectors included in the filtered first chapter, where the position vector corresponding to the first word vector is used to represent the The position of the word segment corresponding to the first word vector in the text corresponding to the first chapter; the first word vector of the multiple word segments included in the filtered first chapter and the corresponding position vector are fused to obtain the first word vector in the first chapter. The fusion word vector of multiple word segmentations;

所述通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量，包括：根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The feature extraction is performed on each clause combination by the first semantic vector encoder to obtain a word-level feature vector of each clause combination, including: according to the fusion word vector of a plurality of segmented words included in the first clause combination, using The first semantic vector encoder performs feature extraction on the first clause combination to obtain a word-level feature vector of the first clause combination, where the first clause combination is any one of the multiple clause combinations Clause combination.

基于上述方案，通过对第一词向量与位置向量进行融合，使得在后续编码过程中可以获得每个分词在句子中的相对和绝对的位置信息。Based on the above solution, by fusing the first word vector and the position vector, the relative and absolute position information of each word segment in the sentence can be obtained in the subsequent encoding process.

一种可能的实现方式中，所述方法还包括：编码所述经过滤第一篇章包括的多个第一语句向量对应的位置向量，所述第一语句向量对应的位置向量用于表征所述第一语句向量对应的分句在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个第一语句向量与对应的位置向量进行融合得到第一篇章中多个分句的融合句向量；In a possible implementation manner, the method further includes: encoding a position vector corresponding to a plurality of first sentence vectors included in the filtered first chapter, where the position vector corresponding to the first sentence vector is used to represent the The position of the clause corresponding to the first sentence vector in the text corresponding to the first chapter; the plurality of first sentence vectors included in the filtered first chapter and the corresponding position vector are fused to obtain the multiplication in the first chapter. The fused sentence vector of each clause;

所述通过第二语义向量编码器对每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量，包括：根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The feature extraction is performed on the sentence vector of each clause combination by the second semantic vector encoder to obtain the clause-level feature vector of each clause combination, including: according to the plurality of clauses included in the first clause combination The fusion sentence vector adopts the second semantic vector encoder to perform feature extraction on the first clause combination, so as to obtain the clause-level feature vector of the first clause combination, and the first clause combination is the Any one of the clause combinations.

基于上述方案，通过对第一语句向量与位置向量进行融合，使得在后续编码过程中可以获得每个分句在分句组合中的相对和绝对的位置信息。Based on the above solution, by fusing the first sentence vector and the position vector, the relative and absolute position information of each sentence in the sentence combination can be obtained in the subsequent encoding process.

一种可能的实现方式中，所述第一语义向量编码器包括N个注意力网络层、第一神经网络层和第一单头注意力层；所述N个注意力网络层中每个注意力网络层包括第一多头注意力层和第一相加归一化层；N为正整数；In a possible implementation, the first semantic vector encoder includes N attention network layers, a first neural network layer, and a first single-head attention layer; each attention layer in the N attention network layers The force network layer includes the first multi-head attention layer and the first addition normalization layer; N is a positive integer;

所述根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量，包括：The first semantic vector encoder is used to perform feature extraction on the first clause combination according to the fusion word vector of the plurality of word segments included in the first clause combination, so as to obtain the word-level feature vector of the first clause combination, including :

第i个注意力网络层中的所述第一多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分词的融合词向量进行注意力运算，以得到所述多个注意力模块的输出；The plurality of attention modules included in the first multi-head attention layer in the i-th attention network layer respectively perform attention operations on the fusion word vectors of the plurality of word segmentations included in the first sentence combination, to obtain the outputs of the multiple attention modules;

第i个注意力网络层中的所述第一相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第一多头注意力层的第一输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第一多头注意力层的第一输出结果进行归一化处理得到第二输出结果，所述第二输出结果用于第i+1个注意力网络层的线性变换；The first addition normalization layer in the i-th attention network layer splices the output results of the multiple attention modules to obtain a splicing result; according to the output results of the i-1-th attention network layer Perform linear transformation on the splicing result to obtain the first output result of the first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; for the ith attention network The first output result of the first multi-head attention layer of the layer is normalized to obtain a second output result, and the second output result is used for the linear transformation of the i+1th attention network layer;

通过所述第一神经网络层对第N个注意力网络层的所述第二输出结果进行特征提取，获得每个分句组合的第一分词的特征矩阵；Perform feature extraction on the second output result of the Nth attention network layer through the first neural network layer to obtain the feature matrix of the first word segmentation of each sentence combination;

通过所述第一单头注意力层提取出第一分词的特征矩阵中的特征信息，获得所述第一分句组合的词级特征向量。The feature information in the feature matrix of the first word segmentation is extracted through the first single-head attention layer, and the word-level feature vector of the first sentence combination is obtained.

基于上述方案，通过N个注意力网络层对第一分句组合包括的多个分词进行特征提取，能够捕获到文本中分词与分词之间的长距离特征，能够提取到丰富的上下文语义表征信息，增强对特征的提取能力。Based on the above scheme, N attention network layers are used to perform feature extraction on multiple word segmentations included in the first sentence combination, which can capture the long-distance features between word segmentation and word segmentation in the text, and can extract rich contextual semantic representation information. , to enhance the feature extraction ability.

一种可能的实现方式中，所述第二语义向量编码器包括N个注意力网络层、第二神经网络层和第二单头注意力层；所述N个注意力网络层中每个注意力网络层包括第二多头注意力层和第二相加归一化层；N为正整数；In a possible implementation, the second semantic vector encoder includes N attention network layers, a second neural network layer and a second single-head attention layer; each attention network layer in the N attention network layers The force network layer includes a second multi-head attention layer and a second addition normalization layer; N is a positive integer;

所述根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量，包括：The second semantic vector encoder is used to perform feature extraction on the first clause combination according to the fusion sentence vector of the plurality of clauses included in the first clause combination, so as to obtain the clause-level feature vector of the first clause combination. ,include:

第i个注意力网络层中的所述第二多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分句的融合句向量进行注意力计算，以得到所述多个注意力模块的输出；The plurality of attention modules included in the second multi-head attention layer in the i-th attention network layer respectively perform attention calculation on the fused sentence vectors of the plurality of sentences included in the first sentence combination, to obtain obtain the outputs of the multiple attention modules;

第i个注意力网络层中的所述第二相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第二多头注意力层的第三输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第二多头注意力层的第三输出结果进行归一化处理得到第四输出结果，所述第四输出结果用于第i+1个注意力网络层的线性变换；The second addition normalization layer in the i-th attention network layer splices the output results of the multiple attention modules to obtain a splicing result; according to the output results of the i-1-th attention network layer Perform linear transformation on the splicing result to obtain the third output result of the second multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; for the ith attention network The third output result of the second multi-head attention layer of the layer is normalized to obtain a fourth output result, and the fourth output result is used for the linear transformation of the i+1th attention network layer;

通过所述第二神经网络层对第N个注意力网络层的所述第四输出结果进行特征提取，获得每个分句组合的第一分句的特征矩阵；Perform feature extraction on the fourth output result of the Nth attention network layer through the second neural network layer to obtain a feature matrix of the first clause of each clause combination;

通过所述第二单头注意力层提取出第一分句的特征矩阵中的特征信息，获得所述第一分句组合的分句级特征向量。The feature information in the feature matrix of the first clause is extracted through the second single-head attention layer, and the clause-level feature vector of the first clause combination is obtained.

基于上述方案，通过N个注意力网络层对第一分句组合包括的多个分句进行特征提取，能够捕获到文本中分句与分句之间的长距离特征，能够提取到丰富的上下文语义表征信息，增强对特征的提取能力。Based on the above scheme, N attention network layers are used to perform feature extraction on multiple clauses included in the first clause combination, which can capture long-distance features between clauses in the text and extract rich context. Semantically represent information and enhance the ability to extract features.

在一种可能的实现方式中，所述根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测，包括：In a possible implementation manner, according to the prediction categories of the plurality of first feature vectors corresponding to the plurality of chapters and the second feature vectors, the law prediction, the crime prediction and the sentence prediction are performed, including:

对所述第二特征向量和所述多个篇章对应的多个第一特征向量中预测类别为法条类别的第一特征向量进行非线性变换获得法条预测向量，根据所述法条预测向量进行法条预测；Perform nonlinear transformation on the second eigenvector and the first eigenvectors corresponding to the plurality of chapters and the first eigenvector whose prediction category is the law category to obtain a law prediction vector, and obtain a law prediction vector according to the law prediction vector make legal predictions;

对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为罪名类别的第一特征向量以及所述法条预测向量进行非线性变换获得罪名预测向量，根据所述罪名预测向量进行罪名预测；Perform nonlinear transformation on the second eigenvector, the first eigenvectors corresponding to the plurality of chapters and the first eigenvector whose predicted category is the crime category, and the ordinance prediction vector to obtain the crime prediction vector. Describe the crime prediction vector for crime prediction;

对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为刑期类别的第一特征向量、所述法条预测向量以及所述罪名预测向量进行非线性变换获得刑期预测向量，根据所述刑期预测向量进行刑期预测。Perform nonlinear transformation on the second feature vector, the first feature vector corresponding to the multiple chapters and the first feature vector whose prediction category is the sentence category, the legal article prediction vector, and the crime prediction vector to obtain A sentence prediction vector, and a sentence prediction is performed according to the sentence prediction vector.

基于上述方案，在量刑预测时采用拓扑结构，先预测罪名，再预测法条，最后在罪名和法条的基础之上进行刑期的预测，进一步提高了刑期预测的效果。Based on the above scheme, the topological structure is used in the sentencing prediction, firstly predicting the crime, then predicting the law, and finally predicting the sentence on the basis of the crime and the law, which further improves the effect of sentence prediction.

一种可能的实现方式中，所述案件相关信息包括第一数据和第二数据；其中，所述第一数据包括证词、嫌疑人口供以及笔录中的至少一项，所述第二数据包括人证、物证、被告人信息中的至少一项；所述对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量，包括：In a possible implementation manner, the case-related information includes first data and second data; wherein, the first data includes at least one of testimony, a suspect's confession, and a transcript, and the second data includes a person at least one of evidence, physical evidence, and defendant information; performing vectorization processing on the word segments included in the case-related information to obtain a second word vector corresponding to each word segment, including:

对所述第一数据包括的分词进行向量化处理以得到第一数据中每个分词对应第二词向量；Perform vectorization processing on the word segmentation included in the first data to obtain a second word vector corresponding to each word segmentation in the first data;

确定所述第二数据包括的每个分词所属的类别，从数据向量表中确定所述第二数据包括的每个分词所属的类别对应的类别向量；所述数据向量表包括多个类别对应的类别向量；将所述第二数据包括的每个分词所属的类别对应的类别向量确定所述每个分词对应的所述第二词向量。Determine the category to which each participle included in the second data belongs, and determine a category vector corresponding to the category to which each participle included in the second data belongs from the data vector table; the data vector table includes a plurality of categories corresponding to Category vector; the second word vector corresponding to each word segmentation is determined from the category vector corresponding to the category to which each word segment included in the second data belongs.

基于上述方案，在犯罪事实描述文本的基础上引入了案件相关信息，并对案件相关信息进行特征提取，通过犯罪事实描述文本的基础上引入了案件相关信息共同进行量刑预测，提高了预测效果。Based on the above scheme, the case-related information is introduced on the basis of the description text of the crime facts, and the feature extraction of the relevant information of the case is carried out.

一种可能的实现方式中，所述基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章，包括：In a possible implementation manner, the filtered first chapter is obtained by filtering the word segmentation in the first chapter based on the first word vector included in the first chapter, including:

通过卷积神经网络对所述第一篇章包括的多个第一词向量进行过滤处理以获得所述经过滤的第一篇章。The plurality of first word vectors included in the first chapter are filtered through a convolutional neural network to obtain the filtered first chapter.

第二方面，本申请实施例提供了一种量刑预测装置，包括获取单元和处理单元；In a second aspect, an embodiment of the present application provides a sentencing prediction device, including an acquisition unit and a processing unit;

所述获取单元，用于获取案件相关信息以及犯罪事实描述文本，所述案件相关信息包括人证、物证、被告人信息、证词、嫌疑人口供以及笔录中至少一项；所述犯罪事实描述文本包括多个篇章，所述多个篇章中每个篇章包括多个分句，所述多个句中每个分句包括多个分词；The obtaining unit is used to obtain case-related information and a description text of the crime facts, the case-related information includes at least one of witness evidence, physical evidence, defendant information, testimony, the suspect's personal testimony, and transcripts; the crime fact description text Including a plurality of articles, each article in the plurality of articles includes a plurality of clauses, and each clause in the plurality of sentences includes a plurality of participles;

所述处理单元，用于对所述多个篇章包括的分词进行向量化处理以得到每个分词对应第一词向量，以及对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量；对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量，并根据每个篇章的第一特征向量确定所述每个篇章的预测类别，所述预测类别为法条类别或者罪名类别或刑期类别；The processing unit is configured to perform vectorization processing on the word segments included in the plurality of chapters to obtain a first word vector corresponding to each word segment, and perform vectorization processing on the word segments included in the case-related information to obtain each word segment The corresponding second word vector; perform feature extraction on the first word vector included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, and determine the each chapter according to the first feature vector of each chapter. The prediction category of each chapter, the prediction category is the category of the law or the category of the crime or the category of the sentence;

所述处理单元，还用于对所述案件相关信息包括的第二词向量进行特征提取得到所述案件相关信息的第二特征向量；根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测。The processing unit is further configured to perform feature extraction on the second word vector included in the case-related information to obtain a second feature vector of the case-related information; The predicted category and the second feature vector are used to predict legal articles, crimes and sentence predictions.

一种可能的实现方式中，所述处理单元，在对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量时，具体用于：In a possible implementation manner, when the processing unit performs feature extraction on the first word vectors included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, the processing unit is specifically used for:

基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章，所述经过滤第一篇章包括的多个分句中的第一词向量均与量刑预测相关，所述第一篇章为所述多个篇章中的任一个篇章；Based on the first word vector included in the first chapter, the word segmentation in the first chapter is filtered to obtain the filtered first chapter, and the first word vectors in the multiple clauses included in the filtered first chapter are all the same as Sentencing prediction is related, and the first chapter is any one of the multiple chapters;

将所述经过滤第一篇章包括的多个分句进行组合得到多个分句组合，所述多个分句组合中每个分句组合包括至少两个分句；Combining the plurality of clauses included in the filtered first chapter to obtain a plurality of clause combinations, each clause combination in the plurality of clause combinations includes at least two clauses;

通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量；Perform feature extraction on each clause combination by the first semantic vector encoder to obtain a word-level feature vector of each clause combination;

对多个分句组合的词级特征向量进行特征拼接，以得到每个分句组合的语句向量表示；Feature splicing is performed on the word-level feature vectors of multiple clause combinations to obtain the sentence vector representation of each clause combination;

通过第二语义向量编码器每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量；Perform feature extraction through the sentence vector of each clause combination by the second semantic vector encoder to obtain the clause-level feature vector of each clause combination;

对多个分句组合的分句级特征向量进行特征拼接以得到所述第一特征向量。Feature splicing is performed on the clause-level feature vectors of multiple clause combinations to obtain the first feature vector.

一种可能的实现方式中，所述处理单元还用于：编码所述经过滤第一篇章包括的多个第一词向量对应的位置向量，所述第一词向量对应的位置向量用于表征所述第一词向量对应的分词在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个分词的第一词向量与对应的位置向量进行融合得到第一篇章中多个分词的融合词向量；In a possible implementation manner, the processing unit is further configured to: encode a position vector corresponding to a plurality of first word vectors included in the filtered first chapter, where the position vector corresponding to the first word vector is used to represent The position of the word segment corresponding to the first word vector in the text corresponding to the first chapter; the first word vector of the plurality of word segments included in the filtered first chapter and the corresponding position vector are fused to obtain the first word vector. The fusion word vector of multiple word segmentations in the chapter;

所述处理单元，在通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量时，具体用于：根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The processing unit, when performing feature extraction on each clause combination by the first semantic vector encoder, to obtain the word-level feature vector of each clause combination, is specifically used for: according to the plurality of clauses included in the first clause combination. The fusion word vector of each word segment adopts the first semantic vector encoder to perform feature extraction on the first clause combination, so as to obtain the word-level feature vector of the first clause combination, and the first clause combination is the plurality of Any one of the clause combinations.

一种可能的实现方式中，所述处理单元还用于：编码所述经过滤第一篇章包括的多个第一语句向量对应的位置向量，所述第一语句向量对应的位置向量用于表征所述第一语句向量对应的分句在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个第一语句向量与对应的位置向量进行融合得到第一篇章中多个分句的融合句向量；In a possible implementation manner, the processing unit is further configured to: encode a position vector corresponding to a plurality of first sentence vectors included in the filtered first chapter, where the position vector corresponding to the first sentence vector is used to represent The position of the clause corresponding to the first sentence vector in the text corresponding to the first chapter; the first chapter is obtained by fusing a plurality of first sentence vectors included in the filtered first chapter with the corresponding position vector The fused sentence vector of multiple clauses in ;

所述处理单元，在通过第二语义向量编码器对每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量时，具体用于：根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The processing unit, when performing feature extraction on the sentence vector of each clause combination by the second semantic vector encoder to obtain the clause-level feature vector of each clause combination, is specifically used for: according to the first clause The fusion sentence vector of the multiple clauses included in the combination uses the second semantic vector encoder to perform feature extraction on the first clause combination, so as to obtain the clause-level feature vector of the first clause combination. The combination is any one of the multiple clause combinations.

所述处理单元，在根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量时，具体用于：第i个注意力网络层中的所述第一多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分词的融合词向量进行注意力运算，以得到所述多个注意力模块的输出；第i个注意力网络层中的所述第一相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第一多头注意力层的第一输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第一多头注意力层的第一输出结果进行归一化处理得到第二输出结果，所述第二输出结果用于第i+1个注意力网络层的线性变换；通过所述第一神经网络层对第N个注意力网络层的所述第二输出结果进行特征提取，获得每个分句组合的第一分词的特征矩阵；通过所述第一单头注意力层提取出第一分词的特征矩阵中的特征信息，获得所述第一分句组合的词级特征向量。The processing unit uses the first semantic vector encoder to perform feature extraction on the first clause combination according to the fusion word vector of the plurality of word segments included in the first clause combination, so as to obtain the word level of the first clause combination. When the feature vector is used, it is specifically used for: the multiple attention modules included in the first multi-head attention layer in the i-th attention network layer respectively fuse the multiple word segmentation included in the first sentence combination. The vector performs attention operation to obtain the outputs of the multiple attention modules; the first addition normalization layer in the ith attention network layer splices the output results of the multiple attention modules , obtain the splicing result; perform linear transformation on the splicing result according to the output result of the i-1th attention network layer, and obtain the first output result of the first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; the first output result of the first multi-head attention layer of the i-th attention network layer is normalized to obtain the second output result, and the second output result is The linear transformation of the i+1th attention network layer; the feature extraction is performed on the second output result of the Nth attention network layer through the first neural network layer, and the first result of each sentence combination is obtained. The feature matrix of the word segmentation; the feature information in the feature matrix of the first word segmentation is extracted through the first single-head attention layer, and the word-level feature vector of the first sentence combination is obtained.

所述处理单元，在根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量时，具体用于：第i个注意力网络层中的所述第二多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分句的融合句向量进行注意力计算，以得到所述多个注意力模块的输出；第i个注意力网络层中的所述第二相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第二多头注意力层的第三输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第二多头注意力层的第三输出结果进行归一化处理得到第四输出结果，所述第四输出结果用于第i+1个注意力网络层的线性变换；通过所述第二神经网络层对第N个注意力网络层的所述第四输出结果进行特征提取，获得每个分句组合的第一分句的特征矩阵；通过所述第二单头注意力层提取出第一分句的特征矩阵中的特征信息，获得所述第一分句组合的分句级特征向量。The processing unit uses the second semantic vector encoder to perform feature extraction on the first clause combination according to the fusion sentence vector of the plurality of clauses included in the first clause combination, so as to obtain the score of the first clause combination. When the sentence-level feature vector is used, it is specifically used for: the multiple attention modules included in the second multi-head attention layer in the i-th attention network layer respectively combine the multiple clauses included in the first clause The fused sentence vector is used for attention calculation to obtain the output of the multiple attention modules; the second addition normalization layer in the i-th attention network layer is used for the output of the multiple attention modules. The results are spliced to obtain a splicing result; the splicing result is linearly transformed according to the output result of the i-1 th attention network layer, and the third output of the second multi-head attention layer of the ith attention network layer is obtained. Result: i is a positive integer less than or equal to N; the third output result of the second multi-head attention layer of the i-th attention network layer is normalized to obtain a fourth output result. The output result is used for the linear transformation of the i+1th attention network layer; the feature extraction is performed on the fourth output result of the Nth attention network layer through the second neural network layer, and each sentence combination is obtained. The feature matrix of the first clause; extract the feature information in the feature matrix of the first clause through the second single-head attention layer, and obtain the clause-level feature vector of the first clause combination.

一种可能的实现方式中，所述处理单元，在根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测时，具体用于：对所述第二特征向量和所述多个篇章对应的多个第一特征向量中预测类别为法条类别的第一特征向量进行非线性变换获得法条预测向量，根据所述法条预测向量进行法条预测；对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为罪名类别的第一特征向量以及所述法条预测向量进行非线性变换获得罪名预测向量，根据所述罪名预测向量进行罪名预测；对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为刑期类别的第一特征向量、所述法条预测向量以及所述罪名预测向量进行非线性变换获得刑期预测向量，根据所述刑期预测向量进行刑期预测。In a possible implementation manner, the processing unit, when performing the prediction of laws, charges and sentences according to the prediction categories of the plurality of first feature vectors corresponding to the plurality of chapters and the second feature vectors, It is specifically used for: performing nonlinear transformation on the second feature vector and the first feature vectors corresponding to the multiple chapters and the first feature vector whose prediction category is the law category to obtain the law prediction vector, and according to the Perform legal prediction on the legal article prediction vector; perform non-inference on the second feature vector, the first feature vector whose predicted category is the crime category in the second feature vector, the plurality of first feature vectors corresponding to the multiple chapters, and the legal article prediction vector. Linearly transforming to obtain a crime prediction vector, and performing crime prediction according to the crime prediction vector; for the second feature vector and the first feature vectors corresponding to the multiple chapters, the predicted category is the first feature vector of the sentence category, The said law prediction vector and the said crime prediction vector are nonlinearly transformed to obtain a sentence prediction vector, and the sentence prediction is performed according to the said sentence prediction vector.

一种可能的实现方式中，所述案件相关信息包括第一数据和第二数据；其中，所述第一数据包括证词、嫌疑人口供以及笔录中的至少一项，所述第二数据包括人证、物证、被告人信息中的至少一项；所述处理单元，在对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量时，具体用于：对所述第一数据包括的分词进行向量化处理以得到第一数据中每个分词对应第二词向量；确定所述第二数据包括的每个分词所属的类别，从数据向量表中确定所述第二数据包括的每个分词所属的类别对应的类别向量；所述数据向量表包括多个类别对应的类别向量；将所述第二数据包括的每个分词所属的类别对应的类别向量确定所述每个分词对应的所述第二词向量。In a possible implementation manner, the case-related information includes first data and second data; wherein, the first data includes at least one of testimony, a suspect's confession, and a transcript, and the second data includes a person At least one of evidence, physical evidence, and defendant information; the processing unit, when performing vectorization processing on the word segments included in the case-related information to obtain the second word vector corresponding to each word segment, is specifically used for: The word segmentation included in the first data is vectorized to obtain a second word vector corresponding to each word segmentation in the first data; the category to which each word segmentation included in the second data belongs is determined, and the The category vector corresponding to the category to which each participle included in the second data belongs; the data vector table includes category vectors corresponding to multiple categories; the category vector corresponding to the category to which each participle included in the second data belongs is determined Describe the second word vector corresponding to each participle.

一种可能的实现方式中，所述处理单元，在基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章时，具体用于：通过卷积神经网络对所述第一篇章包括的多个第一词向量进行过滤处理以获得所述经过滤的第一篇章。In a possible implementation manner, the processing unit, when performing filtering processing on the word segmentation in the first chapter based on the first word vector included in the first chapter to obtain the filtered first chapter, is specifically used to: pass the volume The product neural network performs filtering processing on the plurality of first word vectors included in the first chapter to obtain the filtered first chapter.

第三方面，本申请实施例提供了一种量刑预测装置，包括存储器和处理器；In a third aspect, an embodiment of the present application provides a sentencing prediction device, including a memory and a processor;

所述存储器，用于存储程序指令；the memory for storing program instructions;

所述处理器，用于调用所述存储器中存储的程序指令，按照获得的程序执行第一方面以及第一方面中包括的任一种可能的实现方式所述的方法。The processor is configured to invoke the program instructions stored in the memory, and execute the method described in the first aspect and any possible implementation manner included in the first aspect according to the obtained program.

第四方面，本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行第一方面以及第一方面中包括的任一种可能的实现方式所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer can execute the first aspect and the first The method described in any of the possible implementations included in the aspect.

另外，第二方面至第四方面中任一种实现方式所带来的技术效果可参见第一方面以及第一方面不同实现方式所带来的技术效果，此处不再赘述。In addition, for the technical effects brought by any one of the implementations of the second aspect to the fourth aspect, reference may be made to the technical effects brought by the first aspect and different implementations of the first aspect, which will not be repeated here.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are For some embodiments of the present application, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1A为本申请实施例提供的系统架构示意图；1A is a schematic diagram of a system architecture provided by an embodiment of the present application;

图1B为本申请实施例提供的服务器结构示意图；FIG. 1B is a schematic structural diagram of a server provided by an embodiment of the present application;

图2为本申请实施例提供的一种量刑预测方法的流程示意图；2 is a schematic flowchart of a sentencing prediction method provided by an embodiment of the present application;

图3为本申请实施例提供的一种案件相关信息特征提取的网络模型示意图；3 is a schematic diagram of a network model for feature extraction of case-related information provided by an embodiment of the present application;

图4A为本申请实施例提供的一种犯罪事实描述文本特征提取的网络模型示意图；FIG. 4A is a schematic diagram of a network model for feature extraction of a crime fact description text provided by an embodiment of the present application;

图4B为本申请实施例提供的一种犯罪事实描述文本特征提取的流程示意图；FIG. 4B is a schematic flowchart of feature extraction of a crime fact description text provided by an embodiment of the present application;

图5为本申请实施例提供的一种获取词级特征向量的流程示意图；5 is a schematic flowchart of obtaining a word-level feature vector according to an embodiment of the present application;

图6为本申请实施例提供的一种注意力网络层的示意图；FIG. 6 is a schematic diagram of an attention network layer provided by an embodiment of the present application;

图7为本申请实施例提供的一种分类器的结构示意图；FIG. 7 is a schematic structural diagram of a classifier according to an embodiment of the present application;

图8为本申请实施例提供的一种量刑预测模型的结构示意图；8 is a schematic structural diagram of a sentencing prediction model provided by an embodiment of the present application;

图9为本申请实施例提供的一种量刑预测装置的示意图；9 is a schematic diagram of a sentencing prediction device provided by an embodiment of the present application;

图10为本申请实施例提供的另一种量刑预测装置的示意图。FIG. 10 is a schematic diagram of another sentencing prediction apparatus provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.

因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments of the present application, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present application.

需要说明的是，术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that relational terms such as the terms "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

针对现有技术中仅根据案件的犯罪事实描述对刑期进行预测，导致判别结果在全面性和准确性方面存在一定的误差的问题，本申请提供了一种量刑预测方法，在犯罪事实描述文本的基础上增加了案件相关信息，通过对犯罪事实描述文本和案件相关信息两种信息进行编码和特征提取，进而实现量刑预测，可以提高量刑预测的准确度。Aiming at the problem in the prior art that the sentence is only predicted based on the description of the criminal facts of the case, resulting in a certain error in the comprehensiveness and accuracy of the judgment result, the present application provides a sentencing prediction method. On the basis, the relevant information of the case is added. By encoding and extracting two kinds of information, the description text of the crime fact and the relevant information of the case, the sentencing prediction can be realized, and the accuracy of the sentencing prediction can be improved.

图1A示例性地示出了本申请实施例所适用的一种系统架构，该系统架构可以包括量刑预测装置。一些实施例中，量刑预测装置可以包括一个或者多个服务器100，图1A中以三个服务器为例。服务器100可以通过实体服务器实现，也可以通过虚拟服务器实现。服务器可以通过单个服务器实现，可以通过多个服务器组成的服务器集群实现。单个服务器或者服务器集群来实现本申请提供的量刑预测方法。可选地，服务器100可以与终端设备相连，接收终端设备发送的量刑预测任务，或者将量刑预测结果发送给终端设备。例如，终端设备可为手机、平板电脑和个人计算机等。FIG. 1A exemplarily shows a system architecture to which the embodiments of the present application are applied, and the system architecture may include a sentencing prediction apparatus. In some embodiments, the sentencing prediction apparatus may include one or more servers 100 , three servers are taken as an example in FIG. 1A . The server 100 may be implemented by a physical server or a virtual server. The server can be implemented by a single server or by a server cluster composed of multiple servers. A single server or server cluster is used to implement the sentencing prediction method provided by this application. Optionally, the server 100 may be connected to the terminal device, and receive the sentencing prediction task sent by the terminal device, or send the sentencing prediction result to the terminal device. For example, the terminal device may be a mobile phone, a tablet computer, a personal computer, and the like.

作为一种举例，参见图1B所示，服务器可以包括处理器110、通信接口120和存储器130。当然服务器100中还可以包括其它的组件，图1B中未示出。As an example, referring to FIG. 1B , the server may include a processor 110 , a communication interface 120 and a memory 130 . Of course, the server 100 may also include other components, which are not shown in FIG. 1B .

以服务器100与多个终端设备相连接为例，通信接口120用于与不同终端设备进行通信，用于接收终端设备发送的量刑预测任务，或者向终端设备发送量刑预测结果。Taking the connection of the server 100 with multiple terminal devices as an example, the communication interface 120 is used for communicating with different terminal devices, for receiving sentencing prediction tasks sent by the terminal devices, or sending sentencing prediction results to the terminal devices.

在本申请实施例中，处理器110可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件，可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。In this embodiment of the present application, the processor 110 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement Alternatively, each method, step, and logic block diagram disclosed in the embodiments of the present application are executed. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.

处理器110是服务器100的控制中心，利用各种接口和路线连接整个服务器100的各个部分，通过运行或执行存储在存储器130内的软件程序/或模块，以及调用存储在存储器130内的数据，执行服务器100的各种功能和处理数据。可选地，处理器110可以包括一个或多个处理单元。处理器110，例如可以是处理器、微处理器、控制器等控制组件，例如可以是通用中央处理器(central processing unit，CPU)，通用处理器，数字信号处理(digitalsignal processing，DSP)，专用集成电路(application specific integrated circuits，ASIC)，现场可编程门阵列(field programmable gate array，FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。The processor 110 is the control center of the server 100, using various interfaces and routes to connect various parts of the entire server 100, by running or executing the software programs/or modules stored in the memory 130, and calling the data stored in the memory 130, Various functions of the server 100 are executed and data is processed. Optionally, processor 110 may include one or more processing units. The processor 110, for example, may be a control component such as a processor, a microprocessor, a controller, etc., for example, a general-purpose central processing unit (central processing unit, CPU), a general-purpose processor, a digital signal processing (digital signal processing, DSP), a dedicated Integrated circuits (application specific integrated circuits, ASIC), field programmable gate arrays (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.

存储器130可用于存储软件程序以及模块，处理器110通过运行存储在存储器130的软件程序以及模块，从而执行各种功能应用以及数据处理。存储器130可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序等；存储数据区可存储根据业务处理所创建的数据等。存储器130作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器130可以包括至少一种类型的存储介质，例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory，RAM)、静态随机访问存储器(Static RandomAccess Memory，SRAM)、可编程只读存储器(Programmable Read Only Memory，PROM)、只读存储器(Read Only Memory，ROM)、带电可擦除可编程只读存储器(Electrically ErasableProgrammable Read-Only Memory，EEPROM)、磁性存储器、磁盘、光盘等等。存储器130是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。本申请实施例中的存储器130还可以是电路或者其它任意能够实现存储功能的装置，用于存储程序指令和/或数据。The memory 130 may be used to store software programs and modules, and the processor 110 executes various functional applications and data processing by running the software programs and modules stored in the memory 130 . The memory 130 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function, and the like; the stored data area may store data created according to business processing, and the like. As a non-volatile computer-readable storage medium, the memory 130 can be used to store non-volatile software programs, non-volatile computer-executable programs and modules. The memory 130 may include at least one type of storage medium, for example, may include a flash memory, a hard disk, a multimedia card, a card-type memory, a random access memory (Random Access Memory, RAM), a static random access memory (Static Random Access Memory, SRAM), a Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Magnetic Disk, Optical Disk and many more. The memory 130 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 130 in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.

需要说明的是，上述图1A和图1B所示的结构仅是一种示例，本发明实施例对此不做限定。It should be noted that the structures shown in FIG. 1A and FIG. 1B above are only examples, which are not limited in this embodiment of the present invention.

一些场景中，本申请实施例提供的量刑预测方法可以由本地的一个或者多个终端设备来实现。In some scenarios, the sentencing prediction method provided by the embodiments of the present application may be implemented by one or more local terminal devices.

本申请实施例提供了一种量刑预测方法，图2示例性地示出了量刑预测方法的流程，该流程可由量刑预测装置执行，该装置可以位于如图1B所示的服务器100内，比如可以是处理器110，也可以是服务器100，后续描述时以服务器100为例进行说明，为了便于描述，后续对服务器100的描述不再示例数字标示。该量刑预测装置也可以位于本地的终端设备中。具体流程如下：An embodiment of the present application provides a sentencing prediction method, and FIG. 2 exemplarily shows a flow of the sentencing prediction method, and the flow can be executed by a sentencing prediction device, which can be located in the server 100 shown in FIG. 1B , for example, it can be It is the processor 110, and may also be the server 100. In the subsequent description, the server 100 is used as an example for description. For the convenience of description, the subsequent description of the server 100 will not exemplify the numerical designation. The sentencing prediction device may also be located in a local terminal device. The specific process is as follows:

201，获取案件相关信息以及犯罪事实描述文本。201. Obtain case-related information and the description text of the crime facts.

一些实施例中，案件相关信息是指与案件相关的其他辅助信息，例如人证、物证、被告人信息、证词、嫌疑人口供以及笔录等等。犯罪事实描述文本可以包括被告人信息、犯罪信息。被告人信息中可以包括被告人的前科信息，例如前科数量、前科罚金、前科剥政、前科刑期和前科罪名。犯罪信息可以包括犯罪事实描述，或者包括犯罪事实描述和涉案金额等。In some embodiments, the case-related information refers to other auxiliary information related to the case, such as witness evidence, physical evidence, defendant information, testimony, suspect testimony, and transcripts. The descriptive text of the crime facts may include defendant information and crime information. The defendant's information may include information on the defendant's previous convictions, such as the number of criminal convictions, fines for criminal convictions, political deprivation for criminal convictions, sentence for criminal convictions, and charges for criminal convictions. The crime information may include a description of the crime facts, or a description of the crime facts and the amount of money involved in the case.

一些实施例中，在获得犯罪事实描述文本后，对犯罪事实描述文本进行分词。可以基于分词器对犯罪事实描述文本进行分词，本申请对此不作具体限制。比如分词器可以采用文本分析分词器Analysis、汉语言处理包(Han Language Processing，HanLP)。In some embodiments, after the crime fact description text is obtained, the crime fact description text is segmented. The crime fact description text may be segmented based on the tokenizer, which is not specifically limited in this application. For example, the tokenizer may use a text analysis tokenizer Analysis, a Chinese language processing package (Han Language Processing, HanLP).

一些实施例中，在将犯罪事实描述文本进行分词后，可以将犯罪事实描述文本以固定的分词数量或者分句数量分成多个篇章，每个篇章包括多个分句，每个分句包括多个分词。也可以按照段落将犯罪事实描述文分成多个篇章，本申请实施例对此不作具体限定。In some embodiments, after the crime fact description text is segmented, the crime fact description text may be divided into multiple chapters with a fixed number of word segments or clauses, each chapter includes multiple clauses, and each clause includes multiple clauses. a participle. The crime fact description text may also be divided into multiple chapters according to paragraphs, which is not specifically limited in this embodiment of the present application.

一些实施例中，犯罪事实描述文本和案件相关信息可以来自与服务器相连的终端设备。在一些实施例中，用户可以通过终端设备触发量刑预测任务，在向服务器发送量刑预测任务时，可以将犯罪事实描述文本和案件相关信息发送给服务器，量刑预测任务用于指示服务器进行量刑预测。服务器接收到量刑预测任务后，执行量刑预测的方法流程。In some embodiments, the descriptive text of the crime facts and the case-related information may come from a terminal device connected to the server. In some embodiments, a user can trigger a sentencing prediction task through a terminal device. When sending a sentencing prediction task to the server, the description text of the crime facts and case-related information can be sent to the server, and the sentencing prediction task is used to instruct the server to perform sentencing prediction. After the server receives the sentencing prediction task, it executes the method flow of sentencing prediction.

202，对多个篇章包括的分词进行向量化处理以得到每个分词对应第一词向量，以及对案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量。202. Perform vectorization processing on the word segments included in the multiple chapters to obtain a first word vector corresponding to each word segment, and perform vectorization processing on the word segments included in the case-related information to obtain a second word vector corresponding to each word segment.

一些实施例中，犯罪事实描述文本是书面的、官方的，存在言简意赅的特点，因此确定犯罪事实描述文本包括的多个篇章后，将多个篇章分别对应的多个分词进行向量化处理，获得每个分词对应的词向量。为了便于描述，将篇章包括的分词对应的词向量称为第一词向量。一些场景中，首先将犯罪事实描述文本以字粒度进行处理，将犯罪事实描述文本转化成数值形式。具体地，以第一篇章为例，第一篇章包括的每个分词分别对应一个token，第一篇章就可以表示为T＝(t₁,t₂,…,t_n)。其中，T代表token序列，t_i代表一个分词的token，n为第一篇章包括的分词数量。进一步地，可以将第一篇章包括的token序列映射为整数序列。整数序列可以表示为D(t→d)＝(d₁,d₂,…,d_n)，其中，n为第一篇章包括的分词数量。比如可以通过词典确定每个分词对应的token与一个整数值的对应关系。比如，词典中保存目前场景所遇到的所有分词对应的token与一个整数值的对应关系。例如，基于统计特征训练标注可以获得5000个分词，则词典中保存有5000个分词对应的token与整数值之间的对应关系。不同的分词对应的整数值不同。In some embodiments, the crime fact description text is written and official, and has the characteristics of being concise and comprehensive. Therefore, after determining multiple chapters included in the crime fact description text, perform vectorization processing on multiple word segments corresponding to the multiple chapters, and obtain: The word vector corresponding to each participle. For the convenience of description, the word vector corresponding to the segmented words included in the chapter is called the first word vector. In some scenarios, the crime fact description text is first processed at word granularity, and the crime fact description text is converted into numerical form. Specifically, taking the first chapter as an example, each participle included in the first chapter corresponds to a token, and the first chapter can be expressed as T=(t ₁ ,t ₂ ,...,t _n ). Among them, T represents the token sequence, t _i represents a token of a token, and n is the number of tokens included in the first chapter. Further, the token sequences included in the first chapter can be mapped to integer sequences. The sequence of integers can be expressed as D(t→d)=(d ₁ , d ₂ , . . . , d _n ), where n is the number of participles included in the first chapter. For example, the correspondence between the token corresponding to each participle and an integer value can be determined through a dictionary. For example, the dictionary stores the correspondence between the tokens corresponding to all the word segmentations encountered in the current scene and an integer value. For example, training annotations based on statistical features can obtain 5,000 word segments, and the dictionary stores the correspondence between tokens and integer values corresponding to 5,000 word segments. Different participles correspond to different integer values.

一些实施例中，在将犯罪事实描述文本以字粒度将多个分词以文本的形式转化成数值的形式后，可以将分词映射到向量空间，获得多个词向量。量刑预测装置中保存有一个词向量序列，该词向量序列的数量为词典的大小，通过多个分词对应的数值按照词向量序列映射出多个第一词向量。具体地，词向量序列可以表示为E＝(e₁,e₂,…,e_s)，其中，s为词典包括的分词的数量，e_i为词向量，长度可以设置，e_i∈R^k，k为词向量的长度。例如，词典中一个词向量为“有效”，则词向量的长度为2。犯罪事实描述文本包括的每个分词对应一个token，则每个token所对应的数值就可以通过E映射出一个向量，把所有映射出的词向量按照token原顺序即分词在犯罪事实描述文本中的顺序组合起来，就得到了犯罪事实描述文本中每个篇章包括分词的第一词向量。作为一种举例，以第一篇章中包括s个分词为例，通过词向量序列E映射后，第一篇章可以表示为X＝(x₁,x₂,…,x_s)，其中x_i为第i个分词对应的第一词向量，x_i∈R^k，k为第一词向量的长度。一些场景中，词向量序列可以由量刑装置模型产生。量刑预测装置接收到终端设备发送的量刑预测任务后，可以自动产生一个词向量序列。在将犯罪事实描述文本的分词转换成数值后，可以根据词向量序列将分词映射到向量空间，获得多个词向量。In some embodiments, after the crime fact description text is converted into a numerical form in the form of text at word granularity, the word segmentation may be mapped to a vector space to obtain multiple word vectors. A word vector sequence is stored in the sentencing prediction device, and the number of the word vector sequence is the size of the dictionary, and a plurality of first word vectors are mapped according to the word vector sequence through the numerical values corresponding to the plurality of word segmentations. Specifically, the word vector sequence can be expressed as E=(e ₁ ,e ₂ ,...,e _s ), where s is the number of segmented words included in the dictionary, e _i is the word vector, and the length can be set, e _i ∈R ^k , k is the length of the word vector. For example, if a word vector in the dictionary is "valid", the length of the word vector is 2. Each word segment included in the crime fact description text corresponds to a token, and the value corresponding to each token can be mapped to a vector through E, and all the mapped word vectors are in the original order of the token, that is, the word segmentation in the crime fact description text. Combined sequentially, the first word vector of each chapter in the crime description text including the word segmentation is obtained. As an example, taking the first chapter including s word segments as an example, after the word vector sequence E is mapped, the first chapter can be expressed as X=(x ₁ ,x ₂ ,...,x _s ), where x _i is The first word vector corresponding to the ith participle, x _i ∈ R ^k , where k is the length of the first word vector. In some scenarios, the sequence of word vectors can be generated by the sentencing device model. After the sentencing prediction device receives the sentencing prediction task sent by the terminal device, it can automatically generate a word vector sequence. After converting the word segmentation of the crime fact description text into numerical values, the word segmentation can be mapped to the vector space according to the word vector sequence to obtain multiple word vectors.

另一些实施例中，在获得案件相关信息后，案件相关信息包括第一数据和第二数据。可以理解的是，第一数据与第二数据还可以采用其它称呼，例如第一数据可以称为连续数据，第二数据可以称为离散数据。其中，第一数据可以包括证词、嫌疑人口供以及笔录中的至少一项，第二数据可以包括人证、物证、被告人信息中的至少一项。In other embodiments, after obtaining the case-related information, the case-related information includes the first data and the second data. It can be understood that the first data and the second data may also be referred to by other names, for example, the first data may be referred to as continuous data, and the second data may be referred to as discrete data. Wherein, the first data may include at least one item of testimony, a suspect's oral testimony, and a transcript, and the second data may include at least one item of witness testimony, physical evidence, and defendant information.

一些场景中，可以对第一数据包括的分词进行向量化处理直接得到第一数据包括的每个分词对应第二词向量。对第一数据包括的分词进行向量化处理的方法与上述对犯罪事实描述文本包括的分词进行向量化处理的方法一致，此处不再赘述。In some scenarios, the word segment included in the first data may be vectorized to directly obtain a second word vector corresponding to each word segment included in the first data. The method for vectorizing the word segments included in the first data is the same as the above-mentioned method for vectorizing the word segments included in the crime fact description text, and details are not described herein again.

另一些场景中，第一数据包括的多个分词可能分属于不同的类别。当然第一数据中某几个分词可能属于相同的类别，即每个类别包括多个分词。多个类别对应的向量化后的值域可能不同，且多个类别对应的向量化后的值域的范围比较大。因此，可以将第一数据包括的分词进行向量化处理后，对第一数据中包括的多个分词的向量化后得到的词向量进行归一化操作，得到每个分词对应的第二词向量。因此，第i个分词对应的第二词向量可以满足如下公式(1)所示的条件：In other scenarios, the multiple word segments included in the first data may belong to different categories. Of course, some word segments in the first data may belong to the same category, that is, each category includes multiple word segments. The vectorized value ranges corresponding to multiple categories may be different, and the range of the vectorized value ranges corresponding to multiple categories is relatively large. Therefore, after the word segmentation included in the first data is vectorized, a normalization operation can be performed on word vectors obtained by vectorizing multiple word segmentations included in the first data to obtain a second word vector corresponding to each word segmentation . Therefore, the second word vector corresponding to the ith participle can satisfy the conditions shown in the following formula (1):

其中，c′_i表示第一数据中第i个分词对应的第二词向量，μ_c表示第一数据包括的所有值的均值，σ_c表示方差，c_i表示第i个分词进行向量化处理得到的词向量。Among them, c′ _i represents the second word vector corresponding to the ith participle in the first data, μ _c represents the mean of all the values included in the first data, σ _c represents the variance, and c _i represents the ith participle to be vectorized the resulting word vector.

一些场景中，第一数据包括所有分词所属类别的第一词向量可以通过向量序列来表示，比如表示为C′＝(c′₁,c′₂,c′₃,…,c′_g)。其中，C′∈R^g，g为第一数据的类别数。In some scenarios, the first data including the first word vector of the category to which all the segmented words belong may be represented by a vector sequence, for example, represented as C′=(c′ ₁ ,c′ ₂ ,c′ ₃ ,...,c′ _g ). Among them, C′∈R ^g , and g is the number of categories of the first data.

另一些实施例中，确定第二数据包括的每个分词所属的类别，从数据向量表中的确定第二数据包括的每个分词所述的类别对应的类别向量。第二数据中类别数少于第一数据的类别数。可以通过预先构造的数据向量表，将第二数据包括的每个分词所属的类别对应的类别向量确定所述每个分词对应的第二词向量。其中，数据向量表包括多个类别对应的类别向量。In other embodiments, the category to which each word segment included in the second data belongs is determined, and a category vector corresponding to the category described in each word segment included in the second data is determined from the data vector table. The number of categories in the second data is less than the number of categories in the first data. The second word vector corresponding to each participle may be determined from a category vector corresponding to a category to which each participle included in the second data belongs by using a pre-constructed data vector table. The data vector table includes category vectors corresponding to multiple categories.

203，对多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量，并根据每个篇章的第一特征向量确定所述每个篇章的预测类别。203. Perform feature extraction on the first word vectors included in the multiple chapters to obtain a first feature vector of each chapter in the multiple chapters, and determine the predicted category of each chapter according to the first feature vector of each chapter.

204，对案件相关信息包括的第二词向量进行特征提取得到案件相关信息的第二特征向量。204. Perform feature extraction on the second word vector included in the case-related information to obtain a second feature vector of the case-related information.

例如，可以采用Concat函数对案件相关信息包括的第二词向量进行拼接得到第二特征向量。For example, the second feature vector can be obtained by splicing the second word vector included in the case-related information by using the Concat function.

205，根据多个篇章对应的多个第一特征向量的预测类别以及第二特征向量进行法条预测、罪名预测和刑期预测。205 , according to the prediction categories of the plurality of first feature vectors corresponding to the plurality of chapters and the second feature vectors, perform law prediction, crime prediction, and sentence prediction.

一些实施例中，在获得量刑预测的结果后，服务器可以将量刑预测结果发送至终端设备，用户可以通过终端设备获得量刑预测的结果。In some embodiments, after obtaining the sentencing prediction result, the server may send the sentencing prediction result to the terminal device, and the user may obtain the sentencing prediction result through the terminal device.

示例性的，量刑预测装置中可以部署量刑预测模型，量刑预测装置通过量刑预测模型来执行步骤202-205。Exemplarily, a sentencing prediction model may be deployed in the sentencing prediction device, and the sentencing prediction device executes steps 202-205 through the sentencing prediction model.

通过上述方案，在犯罪事实描述文本的基础上引入了案件相关信息共同进行量刑预测，提高了预测效果。同时由于刑期的复杂性，在多任务上引入了拓扑结构，先预测罪名，再预测法条，最后在罪名和法条的基础之上进行刑期的预测，进一步提高了刑期预测的效果。Through the above scheme, the case-related information is introduced on the basis of the descriptive text of the crime facts to jointly predict the sentencing, which improves the prediction effect. At the same time, due to the complexity of the sentence, a topology structure is introduced in the multi-task, first predicting the crime, then predicting the law, and finally predicting the sentence on the basis of the crime and the law, which further improves the effect of sentence prediction.

在一种可能的实现方式中，执行步骤204中对案件相关信息包括的第二词向量进行特征提取时可以采用前馈神经网络，参见图3所示。例如，将第一数据中的第二词向量输入第一前馈神经网络获得第一数据的输出向量。例如第一前馈神经网络包括全连接层，将第一数据中的第二词向量输入全连接层获得第一数据的输出向量。第一数据的输出向量满足如下公式(2)所示的条件：In a possible implementation manner, a feedforward neural network may be used when performing feature extraction on the second word vector included in the case-related information in step 204, as shown in FIG. 3 . For example, inputting the second word vector in the first data into the first feedforward neural network to obtain the output vector of the first data. For example, the first feedforward neural network includes a fully connected layer, and the second word vector in the first data is input into the fully connected layer to obtain an output vector of the first data. The output vector of the first data satisfies the conditions shown in the following formula (2):

T_c＝Relu(C′W^c+b_c)； (2)T _c =Relu(C'W ^c +b _c ); (2)

其中，T_c为第一数据的输出向量，T_c∈R^p，p为第一数据的输出向量的向量维度，C′是第一数据的向量序列。Relu为激活函数，W^c为变换矩阵，b_c为偏置项。通过上述公式，可以将互相隔离的不同类别的数据融合在一起。Wherein, T _c is the output vector of the first data, T _c ∈ R ^p , p is the vector dimension of the output vector of the first data, and C′ is the vector sequence of the first data. Relu is the activation function, W ^c is the transformation matrix, and b _c is the bias term. Through the above formula, different categories of data that are isolated from each other can be fused together.

一些实施例中，可将第二数据包括的分词对应的第二词向量输入第二前馈神经网络，获得第二数据的输出向量，参见图3所示。例如，将第二数据中的第二词向量输入第二前馈神经网络获得第二数据的输出向量。例如第二前馈神经网络包括全连接层，将第二数据中的第二词向量输入全连接层获得第二数据包括的每个类别的输出向量。第二数据包括第i个类别的输出向量满足如下公式(3)所示的条件：In some embodiments, the second word vector corresponding to the word segmentation included in the second data may be input into the second feedforward neural network to obtain an output vector of the second data, as shown in FIG. 3 . For example, inputting the second word vector in the second data into the second feedforward neural network to obtain the output vector of the second data. For example, the second feedforward neural network includes a fully connected layer, and the second word vector in the second data is input into the fully connected layer to obtain an output vector of each category included in the second data. The second data includes the output vector of the ith category that satisfies the conditions shown in the following formula (3):

T_di＝Relu(d′_iW^di+b_di)； (3)T _di =Relu(d′ _i W ^di +b _di ); (3)

其中，T_di表示第二数据中第i个类别对应的输出向量，Relu表示激活函数，W^di表示变换矩阵，b_di表示偏置项。进一步地，第二数据的输出向量可以表示为T_d＝(T_d1,T_d2,…,T_dq)；其中，q为第二数据包括的类别数。Among them, T _di represents the output vector corresponding to the ith category in the second data, Relu represents the activation function, W ^di represents the transformation matrix, and b _di represents the bias term. Further, the output vector of the second data can be represented as T _d =(T _d1 , T _d2 , . . . , T _dq ); wherein, q is the number of categories included in the second data.

进一步地，在获得第一数据的输出向量与第二数据的输出向量后，将第一数据的输出向量与第二数据的输出向量进行拼接，以获得案件相关信息的第二特征向量，如图3所示。第二特征向量满足如下公式(4)所示的条件：Further, after obtaining the output vector of the first data and the output vector of the second data, the output vector of the first data and the output vector of the second data are spliced to obtain the second feature vector of the case-related information, as shown in the figure. 3 shown. The second eigenvector satisfies the conditions shown in the following formula (4):

T_m＝Concat(T_c,T_d)； (4)T _m =Concat(T _c ,T _d ); (4)

其中，T_m表示第二特征向量，concat表示拼接函数，T_c表示第一数据的输出向量，T_d表示第二数据的输出向量。Wherein, T _m represents the second feature vector, concat represents the concatenation function, T _c represents the output vector of the first data, and T _d represents the output vector of the second data.

在一种可能的实现方式中，在执行步骤203中获取每个篇章的第一特征向量时，可以通过第一语义向量编码器来执行词级编码，然后在经过第二语义向量编码器进行句级编码。第一语义向量编码器也可以称为词编码器，第二语义向量编码器也可以称为句编码器。一些实施例中，在进行词级编码和句级编码之前，先对每个篇章包括的分词进行过滤处理，将与量刑预测无关的分词过滤掉。例如在执行过滤时，可以采用神经网络来实现，比如卷积神经网络。In a possible implementation manner, when the first feature vector of each chapter is obtained in step 203, word-level encoding may be performed by the first semantic vector encoder, and then sentence-level encoding may be performed by the second semantic vector encoder level coding. The first semantic vector encoder may also be referred to as a word encoder, and the second semantic vector encoder may also be referred to as a sentence encoder. In some embodiments, before performing word-level encoding and sentence-level encoding, filtering is performed on the word segments included in each chapter to filter out word segments irrelevant to sentencing prediction. For example, when performing filtering, a neural network, such as a convolutional neural network, can be used.

作为一种举例，参见图4A所示，以采用卷积神经网路进行分词过滤为例。以第一篇章为例，可以基于第一篇章包括的第一词向量对第一篇章中的分词进行过滤处理得到经过滤第一篇章，经过滤第一篇章包括的多个分句中的第一词向量均与量刑预测相关。具体地，可以通过卷积神经网络对第一篇章包括的分词进行过滤处理获得经过滤第一篇章。卷积神经网络比全连接神经网络的优势在于卷积神经网络的参数更少，计算速度更快。卷积神经网络执行文本卷积可以把卷积神经网络中的卷积核理解为滤波器，类似于通信领域的高通滤波器，通过对第一篇章包括的第一词向量进行卷积可以让具有意义的词向量通过(即与量刑预测相关的第一词向量)，而忽略掉无意义的词汇(如“的”、“了”等)。使用卷积神经网络对第一篇章包括的分词进行过滤，可以使每个字作为一个的时候具有词的特征，而不会过拟合。通过卷积核执行卷积操作，最终获得第一篇章对应的token序列D的特征表示量，可以表示C＝(c₁,c₂,…,c_m)，如图4A所示。经过滤第一篇章包括的多个分词的第一词向量经过卷积以后具有n-gram上下文特征，即分词与分词之间不再是孤立的。例如，当n＝2时，经过滤第一篇章中的第一分词向量分别与前后的两个词向量具有上下文特征。As an example, as shown in FIG. 4A , a convolutional neural network is used for word segmentation filtering as an example. Taking the first chapter as an example, the word segmentation in the first chapter can be filtered based on the first word vector included in the first chapter to obtain the filtered first chapter. Word vectors are all related to sentencing prediction. Specifically, the filtered first chapter may be obtained by filtering the word segmentation included in the first chapter through a convolutional neural network. The advantage of a convolutional neural network over a fully connected neural network is that the convolutional neural network has fewer parameters and is faster to compute. Convolutional neural network performs text convolution. The convolution kernel in the convolutional neural network can be understood as a filter, similar to the high-pass filter in the communication field. By convolving the first word vector included in the first chapter, the Meaningful word vectors are passed through (ie, the first word vectors related to sentencing prediction), and meaningless words (such as "的", "計", etc.) are ignored. Using a convolutional neural network to filter the word segmentation included in the first chapter can make each word as a word with the characteristics of a word without overfitting. The convolution operation is performed by the convolution kernel, and finally the feature representation of the token sequence D corresponding to the first chapter is obtained, which can be expressed as C=(c ₁ ,c ₂ ,..., _cm ), as shown in Figure 4A. After filtering, the first word vector of the multiple word segments included in the first chapter has n-gram context features after convolution, that is, word segmentation and word segmentation are no longer isolated. For example, when n=2, the first word vector in the filtered first chapter has contextual features with the two word vectors before and after respectively.

一些实施例中，犯罪事实描述文本属于长篇章级别，不可避免的产生长距离依赖问题，为了进一步获取到分词与分词之间的关系，可以将文本中的分句再组合。具体的，可以将经过滤第一篇章包括的多个分句进行组合得到多个分句组合，所述多个分句组合中每个分句组合包括至少两个分句。经过滤的第一篇章可以表示为C″＝(c₁″,c₂″,…,c_m″)。示例性地，在进行分句组合时，不限定分句之间的顺序。可以理解为，多个分句进行不同顺序排列后，可以构成多个分句组合。In some embodiments, the crime fact description text belongs to the long chapter level, which inevitably leads to a long-distance dependency problem. In order to further obtain the relationship between word segmentation and word segmentation, the sentences in the text can be recombined. Specifically, multiple clauses included in the filtered first chapter may be combined to obtain multiple clause combinations, and each clause combination in the multiple clause combinations includes at least two clauses. The filtered first chapter can be represented as C"=( _ci ", _c2 ",..., _cm "). Exemplarily, when clauses are combined, the order between clauses is not limited. It can be understood that after multiple clauses are arranged in different orders, multiple clause combinations can be formed.

参见图4A所示，进一步地，通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量。然后对多个分句组合的词级特征向量进行特征拼接，以得到每个分句组合的语句向量表示。再然后通过第二语义向量编码器每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量；对多个分句组合的分句级特征向量进行特征拼接以得到第一特征向量。Referring to FIG. 4A , further, feature extraction is performed on each clause combination by the first semantic vector encoder, so as to obtain a word-level feature vector of each clause combination. Then, feature splicing is performed on the word-level feature vectors of multiple clause combinations to obtain the sentence vector representation of each clause combination. Then, perform feature extraction through the sentence vector of each clause combination by the second semantic vector encoder to obtain clause-level feature vector of each clause combination; perform feature splicing on the clause-level feature vectors of multiple clause combinations. to get the first eigenvector.

作为一种举例，第一语义向量编码器可以采用Transformer编码器。Transformer编码器通过注意力机制对每个分句组合进行特征提取，已得到每个分句组合的词级特征向量。As an example, the first semantic vector encoder may use a Transformer encoder. The Transformer encoder performs feature extraction on each sentence combination through the attention mechanism, and has obtained the word-level feature vector of each sentence combination.

一些实施例中，在通过第一语义向量编码器进行词级编码时，可以结合每个分词的位置。例如，参见图4B所示，将经过滤的第一篇章包括的多个分词的第一词向量与对应的位置向量进行融合以得到第一篇章中多个分词的融合词向量。融合词向量满足如下公式(5)所示的条件：In some embodiments, when performing word-level encoding by the first semantic vector encoder, the position of each word segment may be combined. For example, as shown in FIG. 4B , the first word vectors of the multiple word segments included in the filtered first chapter are fused with the corresponding position vectors to obtain the fused word vector of the multiple word segments in the first chapter. The fusion word vector satisfies the conditions shown in the following formula (5):

其中，C_p表示融合词向量，PE表示位置编码向量，C″表示经过滤的第一篇章包括的多个分词对应的第一词向量。

代表元素对应相加操作符号。Wherein, C _p represents the fusion word vector, PE represents the position encoding vector, and C″ represents the first word vector corresponding to the multiple word segments included in the filtered first chapter.

Represents the element corresponding to the addition operator symbol.

上式中，选择对应元素相加的方式进行位置编码向量的融合，而不是选用向量拼接的方法，使得不会出现因为向量拼接的方式造成的参数变多的问题，且不容易出现过拟合。In the above formula, the method of adding the corresponding elements is selected for the fusion of position coding vectors, rather than the method of vector splicing, so that there will be no problem of increasing parameters due to the method of vector splicing, and it is not easy to appear overfitting. .

示例性地，位置向量可以通过正弦函数以及余弦函数进行编码得到。具体的，编码组合后的第一篇章包括的多个第一词向量对应的位置向量后，篇章中第i个分词的位置向量满足如下公式(6)(7)所示的条件：Exemplarily, the position vector can be obtained by encoding a sine function and a cosine function. Specifically, after encoding the position vectors corresponding to the multiple first word vectors included in the first chapter after the combination, the position vector of the i-th word segment in the chapter satisfies the conditions shown in the following formulas (6) (7):

其中，pos代表当前词在句子中的位置，i代表向量中每个词向量在第一篇章中的索引，dmodel代表词向量的维度。因此，给定一个分词的位置pos，可以根据上述公式(6)和(7)生成一个dmodel维度的位置向量。生成的位置向量是绝对位置编码，但是由于使用的是三角函数，所以绝对位置编码中也包含了分词之间相对位置信息。Among them, pos represents the position of the current word in the sentence, i represents the index of each word vector in the vector in the first chapter, and dmodel represents the dimension of the word vector. Therefore, given the position pos of a word segment, a position vector of dmodel dimension can be generated according to the above formulas (6) and (7). The generated position vector is the absolute position encoding, but because the trigonometric function is used, the relative position information between the word segmentations is also included in the absolute position encoding.

通过对多个第一词向量进行位置编码，并与多个第一词向量进行融合获得融合词向量，可以避免无法获知分词在句子中的位置信息，而影响最后的量刑预测结果。可以理解的是，一般情况下，Transformer没有设置机制来捕获分句中分词的相对位置，分词互换位置时不会影响输出。当采用Transformer编码器时，分词的词序信息就会丢失，无法获知每个分词在句子中的相对和绝对的位置信息。通过上述编码位置将分词在句子或者篇章中的位置加入后续的特征提取。By performing position encoding on multiple first word vectors and fusing them with multiple first word vectors to obtain fused word vectors, it is possible to avoid being unable to know the position information of the word segmentation in the sentence, which will affect the final sentencing prediction result. It is understandable that, under normal circumstances, Transformer does not have a mechanism to capture the relative position of the word segmentation in the sentence, and the output will not be affected when the word segmentation is exchanged. When the Transformer encoder is used, the word order information of the segmented words will be lost, and the relative and absolute position information of each segmented word in the sentence cannot be obtained. The position of the word segmentation in the sentence or chapter is added to the subsequent feature extraction through the above encoding position.

一些实施例中，可以将获得第一分句组合的多个分词的融合词向量输入第一语义编码器中并进行特征提取，以获得第一分句组合的词级特征向量，如图4B所示。第一语义向量编码器包括N个注意力网络层、第一神经网络层和第一单头注意力层；所述N个注意力网络层中每个注意力网络层包括第一多头注意力层和第一相加归一化层；N为正整数。通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量的流程如图5所示，包括以下步骤：In some embodiments, the fused word vector of the multiple word segments obtained by the first clause combination may be input into the first semantic encoder and feature extraction is performed to obtain the word-level feature vector of the first clause combination, as shown in FIG. 4B . Show. The first semantic vector encoder includes N attention network layers, a first neural network layer and a first single-head attention layer; each of the N attention network layers includes a first multi-head attention layer layer and the first additive normalization layer; N is a positive integer. The process of performing feature extraction on each clause combination by the first semantic vector encoder to obtain the word-level feature vector of each clause combination is shown in Figure 5, including the following steps:

501，第i个注意力网络层中的第一多头注意力层包括的多个注意力模块分别对第一分句组合包括的多个分词的融合词向量进行注意力运算，以得到多个注意力模块的输出。501. Multiple attention modules included in the first multi-head attention layer in the i-th attention network layer respectively perform an attention operation on the fused word vectors of multiple word segmentations included in the first sentence combination to obtain multiple The output of the attention module.

一些实施例中，第一语义向量编码器包括N层第一多头注意力层，每层执行多头注意力机制，每层第一多头注意力层均包括多个注意力模块。将第一分句组合包括的多个分词的融合词向量分别通过多头注意力机制输入多个注意力模块，获得多个注意力模块的输出。例如，第i层第一多头注意力层包括h个注意力模块。将多个分词的融合词向量作为第i层第一多头注意力层的输入，输入到h个注意力模块中。h个注意力模块分别对输入的多个分词的融合词向量进行注意力运算，获得h个注意力模块的输出。每个注意力模块可以采用下述公式(8)至(10)进行注意力运算，第i个注意力模块的输出结果可以通过公式(10)表示。In some embodiments, the first semantic vector encoder includes N layers of first multi-head attention layers, each layer performs a multi-head attention mechanism, and each layer of the first multi-head attention layer includes a plurality of attention modules. The fused word vectors of the multiple segmented words included in the first clause combination are respectively input into multiple attention modules through the multi-head attention mechanism, and the outputs of the multiple attention modules are obtained. For example, the i-th first multi-head attention layer includes h attention modules. The fused word vector of multiple word segmentations is used as the input of the first multi-head attention layer of the i-th layer, and is input into h attention modules. The h attention modules perform attention operations on the fusion word vectors of the input multiple word segmentations, respectively, and obtain the outputs of the h attention modules. Each attention module can use the following formulas (8) to (10) to perform attention operations, and the output result of the i-th attention module can be represented by formula (10).

Q,K,V＝C_p； (8)Q, K, V = C _p ; (8)

Q′_i＝QW_i ^Q、K′_i＝KW_i ^K、V′_i＝VW_i ^V； (9)Q' _i =QW _i ^Q , K' _i =KW _i ^K , V' _i =VW _i ^V ; (9)

Head_i＝Attention(Q′_i,K′_i,V′_i)； (10)Head _i = Attention(Q′ _i , K′ _i , V′ _i ); (10)

其中，C_p表示融合词向量，W_i ^Q表示第i层第一多头注意力层中第i个注意力模块的查询权重矩阵，Q′_i表示第i层第一多头注意力层中第i个注意力模块的查询矩阵。W_i ^K表示第i层第一多头注意力层中第i个注意力模块的键权重矩阵，K′_i表示第i层第一多头注意力层中第i个注意力模块的键矩阵。W_i ^V表示第i层第一多头注意力层中第i个注意力模块的值权重矩阵，V′_i表示第i层第一多头注意力层中第i个注意力模块的值矩阵。Head_i表示第i个注意力模块的输出矩阵，Attention表示注意力运算。Among them, C _p represents the fusion word vector, Wi ^Q represents the query weight matrix of the _i -th attention module in the first multi-head attention layer of the i-th layer, and Q′ _i represents the first multi-head attention layer of the i-th layer. The query matrix for the ith attention module. W _i ^K represents the key weight matrix of the i-th attention module in the first multi-head attention layer of the i-th layer, and K′ _i represents the key matrix of the i-th attention module in the first multi-head attention layer of the i-th layer . W _i ^V represents the value weight matrix of the i-th attention module in the first multi-head attention layer of the i-th layer, and V′ _i represents the value matrix of the i-th attention module in the first multi-head attention layer of the i-th layer . Head _i represents the output matrix of the ith attention module, and Attention represents the attention operation.

502，第i个注意力网络层中的第一相加归一化层对每层多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对拼接结果进行线性变换，得到第i个注意力网络层的第一多头注意力层的第一输出结果；对第i个注意力网络层的第一多头注意力层的第一输出结果进行归一化处理得到第二输出结果。502, the first addition normalization layer in the i-th attention network layer splices the output results of multiple attention modules in each layer to obtain a splicing result; according to the output results of the i-1th attention network layer Perform linear transformation on the splicing result to obtain the first output result of the first multi-head attention layer of the ith attention network layer; the first output result of the first multi-head attention layer of the ith attention network layer Perform normalization processing to obtain the second output result.

一些实施例中，注意力模块的输出结果的数据形式是矩阵，拼接结果的数据形式也是矩阵，拼接结果的维度数量等于每个注意力模块的输出结果的维度数量之和。拼接的方式可以是横向拼接，拼接过程可以通过调用concat(拼接)函数实现。应理解，横向拼接的方式仅是示例性说明。可选地，采用其他拼接方式，对每个注意力模块的输出结果进行拼接，例如采用纵向拼接的方式，对每个注意力模块的输出结果进行拼接，得到拼接结果，则拼接结果的行数等于每个注意力模块的输出结果的行数之和，本申请实施例对如何进行拼接不做具体限定。In some embodiments, the data form of the output result of the attention module is a matrix, the data form of the splicing result is also a matrix, and the number of dimensions of the splicing result is equal to the sum of the number of dimensions of the output results of each attention module. The splicing method can be horizontal splicing, and the splicing process can be realized by calling the concat (splicing) function. It should be understood that the manner of lateral splicing is only exemplary. Optionally, use other splicing methods to splicing the output results of each attention module, for example, adopting a vertical splicing method to splicing the output results of each attention module to obtain a splicing result, then the number of lines of the splicing result. It is equal to the sum of the number of lines of the output results of each attention module, and the embodiment of the present application does not specifically limit how to perform splicing.

一些实施例中，在获得拼接结果后，可以对拼接结果进行线性变换，得到第一输出结果。其中，线性变换的方式可以是与一个权重矩阵相乘，拼接结果与权重矩阵相乘，将乘积作为第一输出结果。可选地，线性变换也可以采用与权重矩阵相乘之外的其他方式，例如，将拼接结果与某一常数相乘，从而对拼接结果进行线性变换，或者，将拼接结果与某一常数相加，从而对拼接结果进行线性变换，本申请实施例对线性变换采用的方式不做限定具体。In some embodiments, after the splicing result is obtained, the splicing result may be linearly transformed to obtain the first output result. The method of linear transformation may be multiplication with a weight matrix, the splicing result is multiplied with the weight matrix, and the product is used as the first output result. Optionally, the linear transformation can also use other methods other than multiplying the weight matrix, for example, multiplying the splicing result by a certain constant to perform linear transformation on the splicing result, or multiplying the splicing result with a certain constant. Add, so as to perform linear transformation on the splicing result, and the embodiment of the present application does not specifically limit the method used for the linear transformation.

作为一种举例，本申请实施例在对第i个注意力网络层的第一多头注意力层中的多个注意力模块的输出结果进行拼接时可以采用concact拼接，以获得拼接结果。之后对拼接结果进行线性变换时可以采用将拼接结果与一个权重矩阵进行相乘的方式获得第一输出结果。其中，该权重矩阵为第i-1个注意力网络层的输出结果。第i个注意力网络层的第一多头注意力层的第一输出结果满足如下公式(11)所示的条件：As an example, in the embodiment of the present application, concact splicing may be used when splicing the output results of the multiple attention modules in the first multi-head attention layer of the ith attention network layer to obtain the splicing result. Then, when performing linear transformation on the splicing result, the first output result may be obtained by multiplying the splicing result by a weight matrix. Among them, the weight matrix is the output result of the i-1th attention network layer. The first output result of the first multi-head attention layer of the i-th attention network layer satisfies the conditions shown in the following formula (11):

其中，W_i ^O表示第i个注意力网络层的权重矩阵(即第i-1个注意力网络层的第二输出结果)，concat表示拼接函数，MHA_i(Q,K,V)表示第i层注意力网络层的第一输出结果，h表示第i层注意力网络层中的第一多头注意力层中注意力模块的数量，h为大于1的正整数；

表示第i层注意力网络层中h个注意力模块的输出。Among them, W _i ^O represents the weight matrix of the ith attention network layer (that is, the second output result of the ith attention network layer), concat represents the splicing function, and MHA _i (Q, K, V) represents the th The first output result of the attention network layer of layer i, h represents the number of attention modules in the first multi-head attention layer in the attention network layer of layer i, and h is a positive integer greater than 1;

represents the output of h attention modules in the i-th attention network layer.

通过利用多头注意力机制，能够捕获到文本中分词与分词之间的长距离特征，能够提取到丰富的上下文语义表征信息，增强对特征的提取能力。将多个注意力模块的输出结果进行向量拼接，可以理解为在计算第一输出结果时引入了原始信息，可以弥补信息缺失的问题。此外，将多个注意力模块的输出结果进行向量拼接相当于引入一条网络通路，使得网络在反向传播的时候可以有一部分不经过复杂的网络，而直接传播到原始信息中去，防止梯度爆炸或者梯度消失。By using the multi-head attention mechanism, the long-distance features between word segmentation and word segmentation in the text can be captured, rich contextual semantic representation information can be extracted, and the ability to extract features can be enhanced. The vector splicing of the output results of multiple attention modules can be understood as the introduction of original information when calculating the first output result, which can make up for the problem of missing information. In addition, the vector splicing of the output results of multiple attention modules is equivalent to introducing a network path, so that part of the network can be directly propagated to the original information without going through the complex network during backpropagation, preventing gradient explosion. Or the gradient disappears.

一些实施例中，在获得第一输出结果后，对第一输出结果进行归一化，得到第二输出结果。归一化的均值和方差满足如下公式(12)(13)所示的条件：In some embodiments, after the first output result is obtained, the first output result is normalized to obtain the second output result. The normalized mean and variance satisfy the conditions shown in the following formulas (12) (13):

上式中，h代表第i个注意力网络层中注意力模块的数量，

代表第i个注意力网络层中第g个注意力模块的输出，μⁱ表示第i层的注意力模块输出的均值，σⁱ表示第i层的注意力模块输出的方差。In the above formula, h represents the number of attention modules in the ith attention network layer,

represents the output of the g-th attention module in the i-th attention network layer, μ ⁱ represents the mean of the output of the attention module of the i-th layer, and σ ⁱ represents the variance of the output of the attention module of the i-th layer.

每一层通过同一个均值和方差对第i层注意力网络层的第一多头注意力层的第一输出结果进行归一化，获得第i层注意力网络层的第二输出结果。例如，第i层注意力网络层的第二输出结果可以用M′＝(m′₁,m′₂,m′₃,…,m′_n)表示，则m′_j满足如下公式(14)所示的条件：Each layer normalizes the first output result of the first multi-head attention layer of the i-th attention network layer by the same mean and variance, and obtains the second output result of the i-th layer of attention network layer. For example, the second output result of the i-th attention network layer can be represented by M′=(m′ ₁ ,m′ ₂ ,m′ ₃ ,...,m′ _n ), then m′ _j satisfies the following formula (14) Conditions shown:

其中，m′_j表示第i层注意力网络层的第二输出结果中的第j个向量，m′_j∈R^m，m表示m′_j的维度，m_j表示第i层注意力网络层的第一多头注意力层的第一输出结果中的第j个向量。Among them, m' _j represents the j-th vector in the second output result of the i-th attention network layer, m' _j ∈ R ^m , m represents the dimension of m' _j , and m _j represents the i-th layer of attention network layer The jth vector in the first output of the first multi-head attention layer.

经过上述归一化后，可以使数据分布相对一致。网络在传播的过程中，经常会发生偏移，造成反向传播困难。在每一层上都进行归一化操作，经过归一化后，使每一层输第二输出结果符合正态分布。归一化的特点是不依赖于输入序列的数量和输入序列的长度，对于神经网络的效果具有提升作用。After the above normalization, the data distribution can be relatively consistent. In the process of propagation, the network often deviates, which makes back propagation difficult. A normalization operation is performed on each layer, and after normalization, the second output result of each layer is in line with the normal distribution. The feature of normalization is that it does not depend on the number of input sequences and the length of the input sequence, and it has an improved effect on the neural network.

一些实施例中，在获得第i层注意力网络层的第二输出结果后，可以将第i层注意力网络层的第二输出结果用于第i+1个注意力网络层的线性变换。例如，第一分句组合包括的多个分词的融合词向量输入第1个注意力网络层获得第1个注意力网络层的第二输出结果后，可以将获得的第1层的第二输出结果作为第2个注意力网络层中的线性变换的权重矩阵。例如第1层的第二输出结果可以表示为M′₁，可以将M′₁作为第2个注意力网络层中线性变换的权重矩阵

以此类推，可以将第i层注意力网络层的第二输出结果M′_i用于第i+1个注意力网络层的线性变换中的权重矩阵

如图6所示。权重矩阵满足如下公式所示的条件：In some embodiments, after obtaining the second output result of the i-th attention network layer, the second output result of the i-th attention network layer can be used for linear transformation of the i+1-th attention network layer. For example, after the fusion word vector of multiple word segments included in the first sentence combination is input to the first attention network layer to obtain the second output result of the first attention network layer, the obtained second output of the first layer can be The result is the weight matrix of the linear transformation in the second attention network layer. For example, the second output result of the first layer can be expressed as M′ ₁ , and M′ ₁ can be used as the weight matrix of the linear transformation in the second attention network layer

By analogy, the second output result M′ _i of the i-th attention network layer can be used for the weight matrix in the linear transformation of the i+1-th attention network layer

As shown in Figure 6. The weight matrix satisfies the conditions shown in the following formula:

其中，

表示第2个注意力网络层的权重矩阵，M′₁表示第1个注意力网络层的第二输出结果，MHA₂(Q,K,V)表示第2层注意力网络层的第一输出结果，W_i ^O表示第i个注意力网络层的权重矩阵，M′_i-1表示第i-1个注意力网络层的第二输出结果，MHA_i(Q,K,V)表示第i层注意力网络层的第一输出结果，

表示第i+1个注意力网络层的权重矩阵，M′_i示第i个注意力网络层的第二输出结果，MHA_i+1(Q,K,V)表示第i+1层注意力网络层的第一输出结果。in,

Represents the weight matrix of the second attention network layer, M' ₁ represents the second output result of the first attention network layer, MHA ₂ (Q, K, V) represents the first output of the second attention network layer As a result, W _i ^O represents the weight matrix of the i-th attention network layer, M′ _i-1 represents the second output result of the i-1-th attention network layer, and MHA _i (Q, K, V) represents the i-th The first output result of the layer attention network layer,

Represents the weight matrix of the i+1th attention network layer, M′ _i represents the second output result of the ith attention network layer, and MHA _i+1 (Q, K, V) represents the i+1th layer of attention The first output result of the network layer.

503，通过第一神经网络层对第N个注意力网络层的第二输出结果进行特征提取，获得每个分句组合的第一分词的特征矩阵。503. Perform feature extraction on the second output result of the Nth attention network layer through the first neural network layer, to obtain a feature matrix of the first segmented word of each clause combination.

一些实施例中，在获得第N个注意力网络层的第二输出结果后，可以通过第一神经网络对第N个注意力网络层的第二输出结果进行线性变换或者非线性变换，以获得每个分句组合的第一分词的特征矩阵。线性变换可以包括与矩阵相乘的运算、与偏置相加的运算，非线性变换可以通过非线性函数实现。例如，非线性变换可以是求最大值的操作。例如，可以采用max函数实现。其中，max函数仅是非线性变换的示例性实现方式，也可以采用其他方式进行非线性变换，例如通过激活函数进行运算，从而实现非线性变换，本申请实施例对如何进行线性或者非线性变换不做具体限定。作为一种举例，本申请实施例中第一神经网络可以通过前馈神经网络实现，具体地，可以使用两层全连接网络来对第N个注意力网络层的第二输出结果进行特征提取，每个分句组合的第一分词的特征矩阵满足如下公式(15)所示的条件：In some embodiments, after obtaining the second output result of the Nth attention network layer, the second output result of the Nth attention network layer may be linearly transformed or nonlinearly transformed through the first neural network to obtain The feature matrix of the first participle of each clause combination. The linear transformation may include operations of multiplying a matrix and adding an offset, and the nonlinear transformation may be implemented by a nonlinear function. For example, a nonlinear transformation can be a maximization operation. For example, it can be implemented using the max function. The max function is only an exemplary implementation of nonlinear transformation, and other methods can also be used to perform nonlinear transformation, such as performing operations through an activation function, so as to realize nonlinear transformation. Make specific restrictions. As an example, the first neural network in the embodiment of the present application may be implemented by a feedforward neural network. Specifically, a two-layer fully connected network may be used to perform feature extraction on the second output result of the Nth attention network layer, The feature matrix of the first participle of each clause combination satisfies the conditions shown in the following formula (15):

M_d＝Relu((M′W¹+b₁)W²+b₂)； (15)M _d =Relu((M'W ¹ +b ₁ )W ² +b ₂ ); (15)

其中，M′表示第N个注意力网络层的第二输出结果，M_d表示第一分词的特征矩阵，W¹、W²表示参数矩阵，W¹、W²∈R^m×m，b₁、b₂表示偏置项，Relu表示激活函数。Among them, M′ represents the second output result of the Nth attention network layer, M _d represents the feature matrix of the first word segmentation, W ¹ , W ² represent the parameter matrix, W ¹ , W ² ∈ R ^m×m , b ₁ , b ₂ represents the bias term, and Relu represents the activation function.

504，通过第一单头注意力层提取出第一分词的特征矩阵中的特征信息，获得第一分句组合的词级特征向量。504. Extract feature information in the feature matrix of the first word segment through the first single-head attention layer, and obtain a word-level feature vector of the first clause combination.

一些实施例中，可以使用一个可训练的注意力向量代替现有的单头注意力层中的Q查询向量提取第一分词的特征矩阵中的特征信息，以获得第一分句组合的词级特征向量。词级特征向量满足如下公式(16)-(18)所示的条件：In some embodiments, a trainable attention vector can be used to replace the Q query vector in the existing single-head attention layer to extract the feature information in the feature matrix of the first segmented word, so as to obtain the word level of the first segmented sentence combination. Feature vector. The word-level feature vector satisfies the conditions shown in the following formulas (16)-(18):

a_i＝u_w·m_di； (16)a _i = u _w · m _di ; (16)

其中，a_i为经过注意力机制加权后的特征矩阵，a′_i为经过softmax归一化后的特征矩阵，u_w为初始化的注意力向量，m_di为第一分词的特征矩阵中的第i个列向量，u_w∈R^m，s_j表示词级特征向量，s_j∈R^m，R^m为m×m的矩阵。Among them, a _i is the feature matrix weighted by the attention mechanism, a′ _i is the feature matrix normalized by softmax, u _w is the initialized attention vector, m _di is the first word in the feature matrix of the first segment i column vectors, u _w ∈ R ^m , s _j represents the word-level feature vector, s _j ∈ R ^m , and R ^m is an m×m matrix.

一些实施例中，可以对多个分句组合的词级特征向量进行特征拼接，以得到每个分句组合的语句向量表示。例如，当第一分句组合包括n个词级特征向量时，则第一分句的语句向量可以表示为S＝(s₁,s₂,s₃,…,s_n)。In some embodiments, feature splicing may be performed on word-level feature vectors of multiple clause combinations to obtain a sentence vector representation of each clause combination. For example, when the first clause combination includes n word-level feature vectors, the sentence vector of the first clause can be expressed as S=(s ₁ , s ₂ , s ₃ , . . . , s _n ).

一些实施例中，可以通过第二语义向量编码器每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量。一些场景中，在对语句向量进行特征提取前，可以先编码经过滤第一篇章包括的多个第一语句向量对应的位置向量。其中，第一语句向量对应的位置向量用于表征第一语句向量对应的分句在第一篇章对应的文本中的位置。In some embodiments, feature extraction may be performed by using the sentence vector of each clause combination by the second semantic vector encoder, so as to obtain a clause-level feature vector of each clause combination. In some scenarios, before the feature extraction is performed on the sentence vector, position vectors corresponding to the plurality of first sentence vectors included in the filtered first chapter may be encoded. The position vector corresponding to the first sentence vector is used to represent the position of the clause corresponding to the first sentence vector in the text corresponding to the first chapter.

一些实施例中，在获得第一语句向量对应的位置向量后，将第一语句向量的位置向量与第一语句向量进行融合获得融合句向量。具体方法可参考上述融合词向量的编码方法，此处不再赘述。进一步地，可以将多个分句对应的融合句向量输入第二语义向量编码器并进行特征提取，获得第一分句组合的分句级特征向量，如图4B所示。In some embodiments, after the position vector corresponding to the first sentence vector is obtained, the position vector of the first sentence vector is fused with the first sentence vector to obtain a fused sentence vector. For the specific method, reference may be made to the encoding method of the above-mentioned fusion word vector, which will not be repeated here. Further, the fused sentence vectors corresponding to multiple sentences can be input into the second semantic vector encoder and feature extraction is performed to obtain sentence-level feature vectors of the first sentence combination, as shown in FIG. 4B .

一些实施例中，第二语义向量编码器包括N个注意力网络层、第二神经网络层和第二单头注意力层；所述N个注意力网络层中每个注意力网络层包括第二多头注意力层和第二相加归一化层。根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量，包括以下步骤：In some embodiments, the second semantic vector encoder includes N attention network layers, a second neural network layer, and a second single-head attention layer; each of the N attention network layers includes a Two multi-head attention layers and a second additive normalization layer. The second semantic vector encoder is used to perform feature extraction on the first clause combination according to the fused sentence vector of the multiple clauses included in the first clause combination, so as to obtain the clause-level feature vector of the first clause combination, including The following steps:

601，第i个网络注意力层中的第二多头注意力层包括的多个注意力模块分别对第一分句组合包括的多个分句的融合句向量进行注意力计算，以得到所述多个注意力模块的输出。601. Multiple attention modules included in the second multi-head attention layer in the i-th network attention layer respectively perform attention calculation on the fused sentence vectors of the multiple sentences included in the first sentence combination, so as to obtain the result. output of multiple attention modules.

602，第i个网络注意力层中的第二相加归一化层对每层多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对拼接结果进行线性变换，得到根据第i-1个注意力网络层的第二多头注意力层的第三输出结果；i为小于或者等于N的正整数；对根据第i个注意力网络层的第二多头注意力层的第三输出结果进行归一化处理得到第四输出结果。602, the second addition normalization layer in the i-th network attention layer splices the output results of the multiple attention modules in each layer to obtain a splicing result; according to the output results of the i-1 th attention network layer Perform linear transformation on the splicing result to obtain the third output result of the second multi-head attention layer according to the i-1th attention network layer; i is a positive integer less than or equal to N; The third output result of the second multi-head attention layer of the layer is normalized to obtain the fourth output result.

一些实施例中，第四输出结果可以用于第i+1个注意力网络层的线性变换。In some embodiments, the fourth output result can be used for linear transformation of the i+1 th attention network layer.

603，通过第二神经网络层对第N个注意力网络层的第四输出结果进行特征提取，获得每个分句组合的第一分句的特征矩阵。603. Perform feature extraction on the fourth output result of the Nth attention network layer through the second neural network layer, to obtain a feature matrix of the first clause of each clause combination.

604，通过第二单头注意力层提取出第一分句的特征矩阵中的特征信息，获得第一分句组合的分句级特征向量。604. Extract the feature information in the feature matrix of the first clause through the second single-head attention layer, and obtain the clause-level feature vector of the first clause combination.

上述步骤601-604的具体方法与步骤501-504的方法相同，此处不再赘述。The specific methods of the above steps 601-604 are the same as the methods of steps 501-504, and are not repeated here.

一些实施例中，在获得多个分句组合的分句级特征向量后，可以对多个分句组合的分句级特征向量进行特征拼接以得到每个篇章对应的第一特征向量。In some embodiments, after obtaining clause-level feature vectors of multiple clause combinations, feature splicing may be performed on the clause-level feature vectors of multiple clause combinations to obtain a first feature vector corresponding to each chapter.

一些实施例中，在获得每个篇章对应的第一特征向量后，根据第一特征向量确定每个篇章的预测类别。因此，可以将多个篇章进行信息剥离，确定预测类别为法条类别、罪名类别以及刑期类别分别对应的多个篇章。In some embodiments, after obtaining the first feature vector corresponding to each chapter, the predicted category of each chapter is determined according to the first feature vector. Therefore, the information of multiple chapters can be stripped, and the prediction categories can be determined as multiple chapters corresponding to the categories of laws, charges, and sentences respectively.

在确定每个篇章的预测类别之后，可以结合第二特征向量、各个篇章的第一特征向量以及预测类别来进行刑期预测、法条预测以及罪名预测。示例性地，参见图7所示，量刑预测装置可以根据第二特征向量和预测类别为法条类别的多个篇章对应的多个第一特征向量经过非线性变换获得法条预测向量，根据法条预测向量进行法条预测。根据第二特征向量、预测类别为罪名类别的多个篇章对应的多个第一特征向量以及罪名预测向量进行非线性变换获得罪名预测向量，根据罪名预测向量进行罪名预测。根据第二特征向量、预测类别为刑期类别的多个篇章对应的多个第一特征向量、法条预测向量以及罪名预测向量进行非线性变换获得刑期预测向量，根据刑期预测向量进行刑期预测。After the prediction category of each chapter is determined, the sentence prediction, the law prediction and the crime prediction may be performed by combining the second feature vector, the first feature vector of each chapter, and the prediction category. Exemplarily, as shown in FIG. 7 , the sentencing prediction device can obtain the law prediction vector through nonlinear transformation according to the second feature vector and the plurality of first feature vectors corresponding to the plurality of chapters whose prediction category is the law category. Bar prediction vector for normal bar prediction. Perform nonlinear transformation according to the second feature vector, multiple first feature vectors corresponding to multiple chapters whose predicted category is the crime category, and the crime prediction vector to obtain the crime prediction vector, and perform the crime prediction according to the crime prediction vector. Perform nonlinear transformation according to the second eigenvector, multiple first eigenvectors corresponding to multiple chapters whose prediction category is the sentence category, the law prediction vector, and the crime prediction vector to obtain the sentence prediction vector, and predict the sentence according to the sentence prediction vector.

一种示例中，执行刑期预测、法条预测以及罪名预测可以通过分类器来实现，分类器中可以包括法条预测网络、罪名预测网络以及刑期预测网络。参见图8所示。法条预测网络、罪名预测网络以及刑期预测网络可以采用前向传播网络。结合图3、图4B来说，通过图4B对应的编码犯罪事实描述文本的网络获得每个篇章对应的预测类别，以及通过图3对应的网络获得第二特征向量后，将第二特征向量和预测类别为法条类别的多个篇章对应的多个第一特征向量进行拼接(参见公式(19))，获得第一拼接向量，将第一拼接向量输入分类器中(参加公式(20)和(21))，进行法条预测。一般情况下，法律所颁布的法条包括多条条款，不同的条款可被设定为一个条款类别。法条预测的计算过程满足如下公式(19)-(21)所示的条件：In an example, execution of sentence prediction, law prediction and crime prediction may be implemented by a classifier, and the classifier may include a law prediction network, a crime prediction network, and a sentence prediction network. See Figure 8. The law prediction network, the crime prediction network and the sentence prediction network can use the forward propagation network. Referring to Fig. 3 and Fig. 4B, the predicted category corresponding to each chapter is obtained through the network that encodes the crime fact description text corresponding to Fig. 4B, and after obtaining the second feature vector through the network corresponding to Fig. 3, the second feature vector and Multiple first feature vectors corresponding to multiple chapters whose predicted category is the legal category are spliced (see formula (19)) to obtain the first splicing vector, and input the first splicing vector into the classifier (see formula (20) and (21)), to make predictions on legal provisions. Under normal circumstances, the statutes promulgated by the law include multiple clauses, and different clauses can be set as a clause category. The calculation process of the law prediction satisfies the conditions shown in the following formulas (19)-(21):

其中，

为第一拼接向量，T_m为第二特征向量，T₁为预测类别为法条类别的多个篇章对应的第一特征向量，

为法条预测向量，T_l法条预测中每个条款类别的概率分布，T_l∈R^x，x为法条预测的第一特征向量的数量，

W_l ¹为权重，

为偏置项，Relu为激活函数。in,

is the first splicing vector, T _m is the second feature vector, T ₁ is the first feature vector corresponding to multiple chapters whose prediction category is the legal category,

is the rule prediction vector, T _l is the probability distribution of each item category in the rule prediction, T _l ∈ R ^x , x is the number of the first eigenvectors of the rule prediction,

W _l ¹ is the weight,

is the bias term, and Relu is the activation function.

一些实施例中，罪名预测的输入向量包括T_m,T₂和法条预测向量T_l ¹。将T_m,T₂,T_l ¹进行拼接获得第二拼接向量(参见公式(22))，将第二拼接向量输入分类器中(参见公式(23)和公式(24))，进行罪名预测。一般情况下，罪名可能包括多种，不同的罪名可被设定为一个罪名类别。罪名预测的计算过程满足如下公式(22)-(24)所示的条件：In some embodiments, the input vector of the crime prediction includes T _m , T ₂ and the rule prediction vector T _l ¹ . Splicing T _m , T ₂ , and T _l ¹ to obtain a second splicing vector (see formula (22)), and inputting the second splicing vector into the classifier (see formula (23) and formula (24)) to predict the crime . Under normal circumstances, charges may include multiple types, and different charges can be set as one charge category. The calculation process of the crime prediction satisfies the conditions shown in the following formulas (22)-(24):

其中，

表示第二拼接向量，T_m表示第二特征向量，T_l ¹表示法条预测向量，T₂表示预测类别为罪名类别的多个篇章对应的第一特征向量，

表示罪名预测向量，T_ch表示罪名预测中每个罪名类别的概率分布，T_ch∈R^y，y为罪名预测的第一特征向量的数量，

为权重，

为偏置项，Relu为激活函数。in,

represents the second splicing vector, T _m represents the second feature vector, T _l ¹ represents the law prediction vector, T ₂ represents the first feature vector corresponding to multiple chapters whose prediction category is the crime category,

represents the crime prediction vector, T _ch represents the probability distribution of each crime category in the crime prediction, T _ch ∈ R ^y , y is the number of the first feature vector of the crime prediction,

is the weight,

is the bias term, and Relu is the activation function.

一些实施例中，刑期预测的输入向量包括T_m,T₃,T_l ¹和罪名预测向量

将T_m,T₂,T_l ¹,

进行拼接获得第三拼接向量(参见公式(25))，将第三拼接向量输入分类器中(参见公式(26)和公式(27))，进行刑期预测。一般情况下，刑期可能包括多种期限，比如5年、10年、不判刑、死刑以及无期。不同的刑期可被设定为一个期限类别。刑期预测的计算过程满足如下公式(25)-(27)所示的条件：In some embodiments, the input vector of sentence prediction includes T _m , T ₃ , T _l ¹ and the crime prediction vector

Set T _m , T ₂ , T _l ¹ ,

Perform splicing to obtain a third splicing vector (see formula (25)), input the third splicing vector into the classifier (see formula (26) and formula (27)), and perform sentence prediction. Under normal circumstances, the sentence may include a variety of terms, such as 5 years, 10 years, no sentence, the death penalty and life. Different sentences can be set as a term category. The calculation process of sentence prediction satisfies the conditions shown in the following formulas (25)-(27):

其中，

表示第二拼接向量，T_m表示第二特征向量，T_l ¹表示法条预测向量，

为罪名预测向量，T₃表示预测类别为刑期类别的多个篇章对应的第一特征向量，

表示刑期预测向量，T_p表示刑期预测中每个期限类别的概率分布，T_p∈R^z，z为刑期预测的第一特征向量的数量，

为权重，

为偏置项，Relu为激活函数。in,

represents the second splicing vector, T _m represents the second feature vector, T _l ¹ represents the normal prediction vector,

is the crime prediction vector, _T3 represents the first feature vector corresponding to multiple chapters whose prediction category is the sentence category,

represents the sentence prediction vector, T _p represents the probability distribution of each term category in the sentence prediction, T _p ∈ R ^z , z is the number of the first feature vector of the sentence prediction,

is the weight,

is the bias term, and Relu is the activation function.

一些实施例中，可以通过训练集合中的多个样本对量刑预测模型进行训练。样本包括裁判文书以及案件相关信息。其中，裁判文书包括文书信息、被告人信息、犯罪事实描述以及法院判决信息。其中，文书信息为犯罪裁判文书中的摘要和标题，在法院判决信息中，包括事实认定部分信息以及标签信息。事实认定包括结论类信息、金额类信息、情节类信息、后果类信息、认罪态度信息。标签提取三类：相关法条中的各个条款类别、罪名(比如，包含无罪)包括的多种罪名类别、刑期(比如，包含不判刑、死刑以及无期)。在对量刑预测模型进行训练时，可以将多个样本经过多次迭代输入到量刑预测模型中，每次可以输入一个样本，根据量刑预测模型输出的针对该样本的法条预测结果与该样本标签中的法条预测结果进行比较来对量刑预测模型中各个网络参数进行调整。根据量刑预测模型输出的该样本的刑期预测结果与该样本标签中的刑期标签进行比较来对量刑预测模型中各个网络参数进行调整。根据量刑预测模型输出的该样本的罪名预测结果与该样本标签中的罪名标签进行比较来对量刑预测模型中各个网络参数进行调整。In some embodiments, the sentencing prediction model may be trained with multiple samples in the training set. Samples include adjudication documents and case-related information. Among them, the judgment documents include document information, defendant information, description of criminal facts, and court judgment information. Among them, the document information is the abstract and title of the criminal judgment document, and the court judgment information includes the fact-finding part information and label information. Fact determination includes conclusion information, amount information, plot information, consequence information, and confession attitude information. Labels are extracted into three categories: various clause categories in relevant laws and regulations, multiple crime categories included in the crime (for example, including innocence), and sentence (for example, including no sentence, death penalty, and life sentence). When training the sentencing prediction model, multiple samples can be input into the sentencing prediction model through multiple iterations, one sample can be input at a time, and the predicted result of the law for the sample output by the sentencing prediction model and the sample label The prediction results of the legal provisions in the sentence are compared to adjust the various network parameters in the sentencing prediction model. Each network parameter in the sentencing prediction model is adjusted according to the comparison between the sentence prediction result of the sample output by the sentencing prediction model and the sentence label in the sample label. Each network parameter in the sentencing prediction model is adjusted according to the comparison between the crime prediction result of the sample output by the sentencing prediction model and the crime label in the sample label.

一种可能的示例中，在调整网络参数时，可以通过损失函数确定比较得到的比较结果(即损失值)来调整网络参数。损失函数比如可以交叉熵损失函数。In a possible example, when the network parameters are adjusted, the network parameters may be adjusted by determining the comparison result (ie the loss value) obtained by the comparison through the loss function. The loss function can be, for example, a cross-entropy loss function.

每个预测任务的损失可以使用交叉熵损失函数获得，交叉熵损失函数的表达式满足如下公式所示(28)的条件：The loss of each prediction task can be obtained using the cross-entropy loss function, and the expression of the cross-entropy loss function satisfies the conditions shown in the following formula (28):

其中，y表示数据是否属于当前类别，比如，y取值可以为0或1，l表示类别，取值为1或2或3，

表示预测结果，即为属于当前类别的概率，

为预测第i个样本为类别l的损失值。取值为1或2或3用来标识罪名预测的损失值、法条预测的损失值或者刑期预测的损失值。比如1用于标识罪名预测，2用于标识法条预测，3用于标识刑期预测。Among them, y indicates whether the data belongs to the current category. For example, the value of y can be 0 or 1, and the value of l indicates the category, and the value is 1 or 2 or 3.

represents the prediction result, that is, the probability of belonging to the current category,

is the loss value for predicting the i-th sample as category l. A value of 1 or 2 or 3 is used to identify the predicted loss value of the crime, the predicted loss value of the law or the predicted loss value of the sentence. For example, 1 is used to identify the crime prediction, 2 is used to identify the legal article prediction, and 3 is used to identify the sentence prediction.

一些实施例中，可以针对罪名预测通过损失函数确定的损失值，以及法条预测通过损失函数确定的损失值以及刑期预测通过损失函数确定的损失值进行损失值累计得到总损失值，然后基于总损失值来调整量刑预测模型的网络参数。一些场景中，可以通过优化算法进行网络参数的调整，例如，通过Adam优化算法对量刑预测模型的网络参数进行调整。In some embodiments, the loss value determined by the loss function can be predicted for the crime, the loss value determined by the loss function for the prediction of the law, and the loss value determined by the loss function for the sentence prediction can be accumulated to obtain the total loss value, and then based on the total loss value. loss value to adjust the network parameters of the sentencing prediction model. In some scenarios, network parameters can be adjusted through an optimization algorithm. For example, the network parameters of the sentencing prediction model can be adjusted through the Adam optimization algorithm.

在一些可能的实施方式中，针对量刑预测模型中的不同的网络单独进行训练。比如针对图4B用于编码犯罪事实描述文本的网络进行单独训练。编码犯罪事实描述文本的网络的训练集合中的多个样本中每个样本可以包括裁判文书以及裁判文书中每个句子或者段落对应的标签，该标签用于指示该句子或者段落所属的标签为罪名类别、刑期类别以及法条类别。在对编码犯罪事实描述文本的网络进行训练时，可以将多个样本经过多次迭代输入到编码犯罪事实描述文本的网络中，每次可以输入一个样本，根据编码犯罪事实描述文本的网络输出的针对该样本的预测结果与该样本标签中的类别(罪名类别、刑期类别以及法条类别)进行比较得到比较结果，通过比较结果来对编码犯罪事实描述文本的网络中各个网络参数进行调整。In some possible implementations, the different networks in the sentencing prediction model are trained separately. For example, the network in Figure 4B for encoding crime fact description text is separately trained. Each of the multiple samples in the training set of the network encoding the description text of the crime facts may include the judgment document and a label corresponding to each sentence or paragraph in the judgment document, and the label is used to indicate that the label to which the sentence or paragraph belongs is a crime category, sentence category, and statute category. When training the network that encodes the description text of the crime facts, multiple samples can be input into the network that encodes the description text of the crime facts through multiple iterations. One sample can be input at a time, and according to the output of the network that encodes the description text of the crime facts The prediction result of the sample is compared with the categories in the sample label (criminal category, sentence category and legal article category) to obtain the comparison result, and each network parameter in the network encoding the description text of the crime facts is adjusted by comparing the results.

一种可能的示例中，在调整网络参数时，可以通过损失函数确定比较得到的比较结果来调整网络参数。损失函数比如可以交叉熵损失函数。In a possible example, when the network parameters are adjusted, the network parameters may be adjusted by determining the comparison result obtained by the comparison through the loss function. The loss function can be, for example, a cross-entropy loss function.

每个预测任务的损失可以使用交叉熵损失函数获得，交叉熵损失函数的表达式满足如下公式所示(29)的条件：The loss of each prediction task can be obtained using the cross-entropy loss function, and the expression of the cross-entropy loss function satisfies the conditions shown in the following formula (29):

其中，y表示数据是否属于当前类别(罪名类别、法条类别或者刑期类别)，比如，y取值可以为0或1，l表示类别，取值为1或2或3，

表示预测结果，即为属于当前类别的概率，

为预测第i个样本为类别l的损失。取值为1或2或3用来标识罪名类别、法条类别或者刑期类别。比如1用于标识罪名类别，2用于标识法条类别，3用于标识刑期类别。Among them, y indicates whether the data belongs to the current category (criminal category, law category or sentence category), for example, the value of y can be 0 or 1, l indicates the category, and the value is 1 or 2 or 3,

is the loss for predicting that the ith sample is class l. A value of 1 or 2 or 3 is used to identify the crime category, law category or sentence category. For example, 1 is used to identify the crime category, 2 is used to identify the legal article category, and 3 is used to identify the sentence category.

一些实施例中，在训练过程中，预测类别的概率分布的数据可以通过一组1和0组成的数字进行表示，该数据是指多个篇章与预测的分类结果的组合。例如，可以将每个篇章对应一个序列号，当概率分布的数据为[35273，label1]时，则表示第35273个篇章，预测类别标签为第一类。若预测结果和人工标注的结果是一致的，则记为1，反之则0。In some embodiments, during the training process, the data of the probability distribution of predicted categories may be represented by a set of numbers consisting of 1s and 0s, and the data refers to the combination of multiple chapters and the predicted classification results. For example, each chapter can be assigned a serial number. When the probability distribution data is [35273, label1], it means the 35273rd chapter, and the predicted category label is the first category. If the prediction result is consistent with the result of manual annotation, it is marked as 1, otherwise, it is 0.

基于相同的技术构思，本申请实施例提供了一种量刑预测装置800，参见图9所示。该装置800可以执行上述量刑预测方法中的各个步骤，为了避免重复，此处不再详述。装置800包括获取单元801、处理单元802，包括获取单元和处理单元；Based on the same technical concept, an embodiment of the present application provides a sentencing prediction apparatus 800, as shown in FIG. 9 . The device 800 can execute each step in the above-mentioned sentencing prediction method, which will not be described in detail here in order to avoid repetition. The apparatus 800 includes an obtaining unit 801 and a processing unit 802, including an obtaining unit and a processing unit;

所述获取单元801，用于获取案件相关信息以及犯罪事实描述文本，所述案件相关信息包括人证、物证、被告人信息、证词、嫌疑人口供以及笔录中至少一项；所述犯罪事实描述文本包括多个篇章，所述多个篇章中每个篇章包括多个分句，所述多个句中每个分句包括多个分词；The obtaining unit 801 is configured to obtain case-related information and a description text of the criminal facts, the case-related information includes at least one of witness evidence, physical evidence, defendant information, testimony, suspect population testimony, and transcripts; the criminal facts description The text includes a plurality of chapters, each of the plurality of chapters includes a plurality of clauses, and each of the plurality of sentences includes a plurality of participles;

所述处理单元802，用于对所述多个篇章包括的分词进行向量化处理以得到每个分词对应第一词向量，以及对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量；对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量，并根据每个篇章的第一特征向量确定所述每个篇章的预测类别，所述预测类别为法条类别或者罪名类别或刑期类别；The processing unit 802 is configured to perform vectorization processing on the word segments included in the plurality of chapters to obtain a first word vector corresponding to each word segment, and perform vectorization processing on the word segments included in the case-related information to obtain each word segment. The second word vector corresponding to the word segmentation; perform feature extraction on the first word vector included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, and determine the first feature vector according to the first feature vector of each chapter The predicted category of each chapter, the predicted category is the category of the law or the category of the crime or the category of the sentence;

所述处理单元802，还用于对所述案件相关信息包括的第二词向量进行特征提取得到所述案件相关信息的第二特征向量；根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测。The processing unit 802 is further configured to perform feature extraction on the second word vector included in the case-related information to obtain a second feature vector of the case-related information; according to the multiple first feature vectors corresponding to the multiple chapters The prediction category of , and the second feature vector are used to predict legal articles, charges and sentences.

一些实施例中，所述处理单元802，在对所述多个篇章包括的第一词向量进行特征提取得到多个篇章中每个篇章的第一特征向量时，具体用于：基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章，所述经过滤第一篇章包括的多个分句中的第一词向量均与量刑预测相关，所述第一篇章为所述多个篇章中的任一个篇章；将所述经过滤第一篇章包括的多个分句进行组合得到多个分句组合，所述多个分句组合中每个分句组合包括至少两个分句；In some embodiments, when the processing unit 802 performs feature extraction on the first word vectors included in the multiple chapters to obtain the first feature vector of each chapter in the multiple chapters, the processing unit 802 is specifically configured to: based on the first chapter The included first word vector filters the word segmentation in the first chapter to obtain the filtered first chapter, and the first word vectors in the multiple clauses included in the filtered first chapter are all related to sentencing prediction, The first chapter is any one of the multiple chapters; the multiple clauses included in the filtered first chapter are combined to obtain multiple clause combinations, each of the multiple clause combinations. A clause combination consists of at least two clauses;

通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量；对多个分句组合的词级特征向量进行特征拼接，以得到每个分句组合的语句向量表示；The first semantic vector encoder is used to extract features for each clause combination to obtain word-level feature vectors of each clause combination; perform feature splicing on word-level feature vectors of multiple clause combinations to obtain each segment Sentence vector representation of sentence combination;

通过第二语义向量编码器每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量；对多个分句组合的分句级特征向量进行特征拼接以得到所述第一特征向量。The second semantic vector encoder performs feature extraction on the sentence vector of each clause combination to obtain the clause-level feature vector of each clause combination; and performs feature splicing on the clause-level feature vectors of multiple clause combinations to obtain the first feature vector.

一些实施例中，所述处理单元802还用于：编码所述经过滤第一篇章包括的多个第一词向量对应的位置向量，所述第一词向量对应的位置向量用于表征所述第一词向量对应的分词在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个分词的第一词向量与对应的位置向量进行融合得到第一篇章中多个分词的融合词向量；In some embodiments, the processing unit 802 is further configured to: encode a position vector corresponding to a plurality of first word vectors included in the filtered first chapter, where the position vector corresponding to the first word vector is used to represent the The position of the word segment corresponding to the first word vector in the text corresponding to the first chapter; the first word vector of the multiple word segments included in the filtered first chapter and the corresponding position vector are fused to obtain the first word vector in the first chapter. The fusion word vector of multiple word segmentations;

所述处理单元802，在通过第一语义向量编码器对每个分句组合进行特征提取，以得到每个分句组合的词级特征向量时，具体用于：根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The processing unit 802, when performing feature extraction on each clause combination by the first semantic vector encoder, to obtain a word-level feature vector of each clause combination, is specifically used for: The fusion word vector of multiple word segments uses the first semantic vector encoder to perform feature extraction on the first clause combination, so as to obtain the word-level feature vector of the first clause combination, and the first clause combination is the multi-sentence combination. any one of these clause combinations.

另一些实施例中，所述处理单元802还用于：编码所述经过滤第一篇章包括的多个第一语句向量对应的位置向量，所述第一语句向量对应的位置向量用于表征所述第一语句向量对应的分句在所述第一篇章对应的文本中的位置；将所述经过滤第一篇章包括的多个第一语句向量与对应的位置向量进行融合得到第一篇章中多个分句的融合句向量；In other embodiments, the processing unit 802 is further configured to: encode a position vector corresponding to a plurality of first sentence vectors included in the filtered first chapter, where the position vector corresponding to the first sentence vector is used to represent the Describe the position of the clause corresponding to the first sentence vector in the text corresponding to the first chapter; fuse the plurality of first sentence vectors included in the filtered first chapter with the corresponding position vector to obtain the first sentence in the first chapter. The fused sentence vector of multiple clauses;

所述处理单元802，在通过第二语义向量编码器对每个分句组合的语句向量进行特征提取，以得到每个分句组合的分句级特征向量时，具体用于：根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量，所述第一分句组合为所述多个分句组合中的任一个分句组合。The processing unit 802, when performing feature extraction on the sentence vector of each sentence combination by the second semantic vector encoder, to obtain the sentence-level feature vector of each sentence combination, is specifically used for: according to the first score. The fused sentence vector of the multiple clauses included in the sentence combination adopts the second semantic vector encoder to perform feature extraction on the first clause combination, so as to obtain the clause-level feature vector of the first clause combination, and the first clause The sentence combination is any one of the plurality of clause combinations.

一些实施例中，所述第一语义向量编码器包括N个注意力网络层、第一神经网络层和第一单头注意力层；所述N个注意力网络层中每个注意力网络层包括第一多头注意力层和第一相加归一化层；N为正整数；In some embodiments, the first semantic vector encoder includes N attention network layers, a first neural network layer, and a first single-head attention layer; each attention network layer in the N attention network layers Including the first multi-head attention layer and the first addition normalization layer; N is a positive integer;

所述处理单元802，在根据第一分句组合包括的多个分词的融合词向量采用第一语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的词级特征向量时，具体用于：第i个注意力网络层中的所述第一多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分词的融合词向量进行注意力运算，以得到所述多个注意力模块的输出；第i个注意力网络层中的所述第一相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第一多头注意力层的第一输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第一多头注意力层的第一输出结果进行归一化处理得到第二输出结果，所述第二输出结果用于第i+1个注意力网络层的线性变换；通过所述第一神经网络层对第N个注意力网络层的所述第二输出结果进行特征提取，获得每个分句组合的第一分词的特征矩阵；通过所述第一单头注意力层提取出第一分词的特征矩阵中的特征信息，获得所述第一分句组合的词级特征向量。The processing unit 802 uses the first semantic vector encoder to perform feature extraction on the first clause combination according to the fused word vector of the plurality of word segments included in the first clause combination, so as to obtain the words of the first clause combination. level feature vector, it is specifically used for: the fusion of the multiple word segmentations included in the first sentence combination by the multiple attention modules included in the first multi-head attention layer in the i-th attention network layer. The word vector performs an attention operation to obtain the output of the multiple attention modules; the first addition normalization layer in the i-th attention network layer performs the output results of the multiple attention modules. Splicing to obtain a splicing result; performing linear transformation on the splicing result according to the output result of the i-1th attention network layer, to obtain the first output result of the first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; the first output result of the first multi-head attention layer of the ith attention network layer is normalized to obtain a second output result, the second output result Linear transformation for the i+1 th attention network layer; feature extraction is performed on the second output result of the N th attention network layer through the first neural network layer to obtain the th The feature matrix of a segmented word; the feature information in the feature matrix of the first segmented word is extracted through the first single-head attention layer, and the word-level feature vector of the first clause combination is obtained.

另一些实施例中，所述第二语义向量编码器包括N个注意力网络层、第二神经网络层和第二单头注意力层；所述N个注意力网络层中每个注意力网络层包括第二多头注意力层和第二相加归一化层；N为正整数；In other embodiments, the second semantic vector encoder includes N attention network layers, a second neural network layer and a second single-head attention layer; each attention network in the N attention network layers The layer includes a second multi-head attention layer and a second additive normalization layer; N is a positive integer;

所述处理单元802，在根据第一分句组合包括的多个分句的融合句向量采用第二语义向量编码器对第一分句组合进行特征提取，以得到所述第一分句组合的分句级特征向量时，具体用于：第i个注意力网络层中的所述第二多头注意力层包括的多个注意力模块分别对所述第一分句组合包括的多个分句的融合句向量进行注意力计算，以得到所述多个注意力模块的输出；第i个注意力网络层中的所述第二相加归一化层对所述多个注意力模块的输出结果进行拼接，得到拼接结果；根据第i-1个注意力网络层的输出结果对所述拼接结果进行线性变换，得到第i个注意力网络层的第二多头注意力层的第三输出结果；i为小于或者等于N的正整数；对所述第i个注意力网络层的第二多头注意力层的第三输出结果进行归一化处理得到第四输出结果，所述第四输出结果用于第i+1个注意力网络层的线性变换；通过所述第二神经网络层对第N个注意力网络层的所述第四输出结果进行特征提取，获得每个分句组合的第一分句的特征矩阵；通过所述第二单头注意力层提取出第一分句的特征矩阵中的特征信息，获得所述第一分句组合的分句级特征向量。The processing unit 802 uses the second semantic vector encoder to perform feature extraction on the first clause combination according to the fusion sentence vector of the multiple clauses included in the first clause combination, so as to obtain the first clause combination. When a sentence-level feature vector is used, it is specifically used for: the plurality of attention modules included in the second multi-head attention layer in the i-th attention network layer are respectively used for the plurality of divisions included in the first sentence combination. The fusion sentence vector of the sentence performs attention calculation to obtain the output of the plurality of attention modules; the second addition normalization layer in the i-th attention network layer The output results are spliced to obtain the splicing result; the splicing result is linearly transformed according to the output result of the i-1th attention network layer, and the third multi-head attention layer of the second multi-head attention layer of the i-th attention network layer is obtained. Output result; i is a positive integer less than or equal to N; the third output result of the second multi-head attention layer of the i-th attention network layer is normalized to obtain the fourth output result, and the fourth output result is obtained. The four output results are used for the linear transformation of the i+1th attention network layer; the feature extraction is performed on the fourth output result of the Nth attention network layer through the second neural network layer, and each sentence is obtained. The feature matrix of the combined first clause; the feature information in the feature matrix of the first clause is extracted through the second single-head attention layer, and the clause-level feature vector of the first clause combination is obtained.

一些实施例中，所述处理单元802，在根据所述多个篇章对应的多个第一特征向量的预测类别以及所述第二特征向量进行法条预测、罪名预测和刑期预测时，具体用于：对所述第二特征向量和所述多个篇章对应的多个第一特征向量中预测类别为法条类别的第一特征向量进行非线性变换获得法条预测向量，根据所述法条预测向量进行法条预测；对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为罪名类别的第一特征向量以及所述法条预测向量进行非线性变换获得罪名预测向量，根据所述罪名预测向量进行罪名预测；对所述第二特征向量、所述多个篇章对应的多个第一特征向量中预测类别为刑期类别的第一特征向量、所述法条预测向量以及所述罪名预测向量进行非线性变换获得刑期预测向量，根据所述刑期预测向量进行刑期预测。In some embodiments, the processing unit 802, when performing law prediction, crime prediction and sentence prediction according to the prediction categories of the plurality of first feature vectors corresponding to the plurality of chapters and the second feature vector, specifically uses In: performing nonlinear transformation on the second eigenvector and the first eigenvectors corresponding to the plurality of chapters and the first eigenvector whose prediction category is the law category to obtain a law prediction vector, and according to the law The prediction vector is used to predict the law; the second feature vector, the plurality of first feature vectors corresponding to the plurality of chapters, the first feature vector whose prediction category is the crime category, and the law prediction vector are nonlinearly transformed Obtain a crime prediction vector, and perform crime prediction according to the crime prediction vector; for the second feature vector and the first feature vectors corresponding to the multiple chapters, the predicted category is the first feature vector of the sentence category, and the The law prediction vector and the crime prediction vector are nonlinearly transformed to obtain a sentence prediction vector, and the sentence prediction is performed according to the sentence prediction vector.

另一些实施例中，所述案件相关信息包括第一数据和第二数据；其中，所述第一数据包括证词、嫌疑人口供以及笔录中的至少一项，所述第二数据包括人证、物证、被告人信息中的至少一项；所述处理单元802，在对所述案件相关信息包括的分词进行向量化处理以得到每个分词对应的第二词向量时，具体用于：对所述第一数据包括的分词进行向量化处理以得到第一数据中每个分词对应第二词向量；确定所述第二数据包括的每个分词所属的类别，从数据向量表中确定所述第二数据包括的每个分词所属的类别对应的类别向量；所述数据向量表包括多个类别对应的类别向量；将所述第二数据包括的每个分词所属的类别对应的类别向量确定所述每个分词对应的所述第二词向量。In other embodiments, the case-related information includes first data and second data; wherein the first data includes at least one of testimony, a suspect's oral testimony, and a transcript, and the second data includes witnesses, At least one of physical evidence and defendant information; the processing unit 802, when performing vectorization processing on the word segmentation included in the case-related information to obtain the second word vector corresponding to each word segmentation, is specifically used for: Perform vectorization processing on the word segments included in the first data to obtain a second word vector corresponding to each segment in the first data; determine the category to which each word segment included in the second data belongs, and determine the first word vector from the data vector table The category vector corresponding to the category to which each participle included in the second data belongs; the data vector table includes category vectors corresponding to multiple categories; the category vector corresponding to the category to which each participle included in the second data belongs is determined as the The second word vector corresponding to each participle.

一些实施例中，所述处理单元802，在基于第一篇章包括的第一词向量对所述第一篇章中的分词进行过滤处理得到经过滤第一篇章时，具体用于：通过卷积神经网络对所述第一篇章包括的多个第一词向量进行过滤处理以获得所述经过滤的第一篇章。In some embodiments, when the processing unit 802 performs filtering processing on the word segmentation in the first chapter based on the first word vector included in the first chapter to obtain the filtered first chapter, the processing unit 802 is specifically configured to: The network performs filtering processing on the plurality of first word vectors included in the first chapter to obtain the filtered first chapter.

基于相同的技术构思，本申请实施例提供了一种量刑预测装置1000，参见图10所示。该装置1000可以执行上述量刑预测方法中的各个步骤，为了避免重复，此处不再详述。装置1000包括存储器1001和处理器1002。Based on the same technical concept, an embodiment of the present application provides a sentencing prediction apparatus 1000, as shown in FIG. 10 . The apparatus 1000 can execute each step in the above-mentioned sentencing prediction method, which will not be described in detail here in order to avoid repetition. The apparatus 1000 includes a memory 1001 and a processor 1002 .

所述存储器1001，用于存储程序指令；The memory 1001 is used to store program instructions;

所述处理器1002，用于调用所述存储器中存储的程序指令，按照获得的程序执行上述量刑预测方法。The processor 1002 is configured to call the program instructions stored in the memory, and execute the above sentencing prediction method according to the obtained program.

本申请实施例提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行第一方面以及第一方面中包括的任一种可能的实现方式所述的方法。An embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer is made to execute the first aspect and the components included in the first aspect. Any possible implementation of the method described.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. A crime prediction method, comprising:

acquiring case related information and a criminal fact description text, wherein the case related information comprises at least one of testimony, material evidence, informed person information, testimony, mouth supply of a suspect and a record; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;

vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle;

performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;

performing feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information;

and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.

2. The method of claim 1, wherein feature extracting the first word vector included in the plurality of chapters to obtain the first feature vector of each chapter of the plurality of chapters comprises:

filtering the participles in a first chapter based on a first word vector included in the first chapter to obtain a filtered first chapter, wherein the first word vector in a plurality of clauses included in the filtered first chapter is related to sentry prediction, and the first chapter is any one of the plurality of chapters;

combining the multiple clauses included in the filtered first discourse to obtain multiple clause combinations, wherein each clause combination in the multiple clause combinations comprises at least two clauses;

extracting the characteristics of each clause combination through a first semantic vector encoder to obtain a word-level characteristic vector of each clause combination;

performing feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination;

performing feature extraction on the sentence vector of each sentence combination through a second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination;

and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain the first feature vector.

3. The method of claim 2, wherein the method further comprises:

encoding position vectors corresponding to a plurality of first word vectors included in the filtered first chapters, wherein the position vectors corresponding to the first word vectors are used for representing positions of participles corresponding to the first word vectors in texts corresponding to the first chapters;

fusing the first word vectors of the multiple participles included in the filtered first discourse with the corresponding position vectors to obtain fused word vectors of the multiple participles in the first discourse;

the extracting the feature of each clause combination through the first semantic vector encoder to obtain the word-level feature vector of each clause combination comprises the following steps:

and performing feature extraction on a first sentence combination by adopting a first semantic vector encoder according to a fused word vector of a plurality of clauses included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.

4. The method of claim 2, wherein the method further comprises:

encoding position vectors corresponding to a plurality of first statement vectors included in the filtered first chapters, wherein the position vectors corresponding to the first statement vectors are used for representing positions of clauses corresponding to the first statement vectors in texts corresponding to the first chapters;

fusing the plurality of first sentence vectors included in the filtered first chapters with corresponding position vectors to obtain fused sentence vectors of a plurality of clauses in the first chapters;

the extracting the feature of the sentence vector of each sentence combination by the second semantic vector encoder to obtain the sentence-level feature vector of each sentence combination includes:

and performing feature extraction on the first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.

5. The method of any of claims 1-4, wherein said performing forensic prediction, criminal name prediction and criminal phase prediction based on the prediction categories of the first plurality of feature vectors corresponding to the plurality of chapters and the second feature vector comprises:

performing nonlinear transformation on the second feature vector and a first feature vector of which the prediction category is a normal category in a plurality of first feature vectors corresponding to the plurality of chapters to obtain a normal prediction vector, and performing normal prediction according to the normal prediction vector;

carrying out nonlinear transformation on the second feature vector, a first feature vector with a prediction category being a guilty category in a plurality of first feature vectors corresponding to the plurality of chapters and the normal prediction vector to obtain a guilty prediction vector, and carrying out guilty prediction according to the guilty prediction vector;

and carrying out nonlinear transformation on the second feature vector, the first feature vector with the prediction category being the criminal category in the plurality of first feature vectors corresponding to the plurality of chapters, the normal prediction vector and the criminal name prediction vector to obtain a criminal prediction vector, and carrying out criminal prediction according to the criminal prediction vector.

6. The method according to any of the claims 1-4, wherein said case related information comprises first data and second data; the first data comprise at least one item of testimony words, suspected person oral supply and notes, and the second data comprise at least one item of testimony, material evidence and information of an advertiser; the vectorizing processing of the participles included in the case related information to obtain a second word vector corresponding to each participle includes:

vectorizing the participles included in the first data to obtain a second word vector corresponding to each participle in the first data;

determining the category to which each participle included in the second data belongs, and determining a category vector corresponding to the category to which each participle included in the second data belongs from a data vector table; the data vector table comprises category vectors corresponding to a plurality of categories; and determining the second word vector corresponding to each participle according to the category vector corresponding to the category to which each participle included in the second data belongs.

7. The method of any of claims 1-4, wherein the filtering the participles in the first chapter based on the first word vector comprised by the first chapter to obtain a filtered first chapter comprises:

filtering, by a convolutional neural network, a plurality of first word vectors included in the first chapters to obtain the filtered first chapters.

8. A crime prediction apparatus, comprising an acquisition unit and a processing unit;

the acquisition unit is used for acquiring case related information and criminal fact description texts, wherein the case related information comprises at least one of testimony, material evidence, information of an advertiser, testimony, oral supply of a suspect and a note; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;

the processing unit is used for vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;

the processing unit is further configured to perform feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.

9. A crime prediction apparatus comprising a memory and a processor;

the memory to store program instructions;

the processor, for calling program instructions stored in the memory, for executing the method of any one of claims 1-7 according to the obtained program.

10. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-7.