CN113642756B - Criminal investigation period prediction method based on deep learning technology - Google Patents

Criminal investigation period prediction method based on deep learning technology Download PDF

Info

Publication number
CN113642756B
CN113642756B CN202110584847.9A CN202110584847A CN113642756B CN 113642756 B CN113642756 B CN 113642756B CN 202110584847 A CN202110584847 A CN 202110584847A CN 113642756 B CN113642756 B CN 113642756B
Authority
CN
China
Prior art keywords
sentence
word
criminal
vector
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110584847.9A
Other languages
Chinese (zh)
Other versions
CN113642756A (en
Inventor
张鹏
池瑶
卢暾
顾宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202110584847.9A priority Critical patent/CN113642756B/en
Publication of CN113642756A publication Critical patent/CN113642756A/en
Application granted granted Critical
Publication of CN113642756B publication Critical patent/CN113642756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Technology Law (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of judicial judgment, in particular to a criminal investigation period prediction method based on a deep learning technology. The invention designs a multi-document crime-preventing predictive model based on a hierarchical attention network of document classification, which is based on the main content of a crime-preventing document, and migrating contents according to the combined crime document and laws, and better predicting and identifying criminal periods of the criminal reduction cases, wherein the concrete steps comprise: collecting data and preprocessing; encoding, by a text encoder, vectors of the multiple documents; and (5) determining the distribution of the criminal investigation period. The invention is that the method can to meet the requirement of criminal investigation the criminal period predicts the demand more accurately.

Description

Criminal investigation period prediction method based on deep learning technology
Technical Field
The invention belongs to the technical field of judicial judgment, and particularly relates to a criminal investigation period prediction method based on a deep learning technology.
Background
According to the invention, a crime forecast identification model is constructed based on a multi-document attention mechanism model, and crime forecast periods are forecast by combining other document information related to crime according to basic document information of the crime forecast document, so that the method is more in line with the logic of crime forecast in reality, and the crime forecast periods are more accurately and effectively forecast.
Disclosure of Invention
The invention aims to provide a method capable of objectively and correctly predicting a criminal investigation period.
The method for predicting the criminal investigation period is based on a deep learning technology, and specifically utilizes a multi-document attention mechanism to construct a criminal investigation prediction model, and predicts the criminal investigation period by migrating and combining other document information related to the criminal investigation according to basic document information of the criminal investigation document;
in order to enable the deep learning model to combine more information across documents and realize the requirement of more accurately predicting the criminal investigation period, the invention designs a novel model for predicting the criminal investigation period across documents and classifies the documents through a hierarchical attention network (Hierarchical Attention Networks for Document Classification) [1] Based on the main content of the criminal document, the local context vector mechanism in the criminal document migrates the content according to the criminal document and the legal standard, and the criminal period is better predicted and identified. The model of the invention can be trained and deployed on a server, and can be predicted by using an interface calling model.
The method comprises the following specific steps:
(one) collecting data and preprocessing
Firstly, collecting criminal document data, criminal document data corresponding to the criminal document data and key legal basis, and preprocessing.
The pretreatment comprises the following steps:
(1) Regular matching: according to the principal material submitted during criminal investigation, namely a criminal investigation (false release) audit watch, the principal material is compared with the content of a document, and the parts of criminal names (Crime), original criminal period changes (Judge), transformation Performance (Performance), crime facts (Fact) and legal basis (Artistic) in the document are confirmed to be the key parts for determining the criminal period. Because the document format has certain standard, the corresponding data can be extracted by utilizing regular matching in a rule writing mode. And extracting the key information from the criminal document and the corresponding criminal document respectively to obtain data required by the model.
(2) Data cleaning: repeatedly mentioned words and sentences which are irrelevant to the predicted criminal investigation period in the text for many times are deleted in a regular matching mode; converting the Chinese number into Arabic number; and deleting the cleaned blank data.
(3) Word segmentation: for the data of the cleaning completion, there can be two processes: character or word segmentation and stop word removal.
(II) design model text encoder
The main part of the multi-document criminal investigation period prediction model is a Text Encoder (Text Encoder), which consists of three parts, namely: static text Encoder (Static Text Encoder), (criminal name) entity embedded Encoder (Crime Encoder), dynamic text Encoder (Dynamic Text Encoder). The function of the text encoder is to input crime name, criminal period change, transformation performance, criminal facts and legal basis, and the corresponding encoder generates a coding vector-d crime 、d judge 、d performance 、d fact 、d article
a) Static text encoder
Static text encoder (Static Text Encoder) employs a hierarchical attention network model (Hierarchical Attention Networks for Document Classification) [1] . The original criminal period change (Judge) and reformulation Performance (Performance) paragraph documents will generate representation Vectors by a static text encoder (StaticText Encoder), where a global level Context vector (Context Vectors) is used in the model for selecting words and sentences with rich information, i.e. u shown in fig. 1.
The following describes the specific contents:
the hierarchical attention network model consists of several parts: (1) a word sequence encoder; (2) a word-level attention layer; (3) a sentence encoder; (4) a sentence-level attention layer; the specific explanation of each part is as follows:
assume that a document has L sentences in a text i Representing the ith sentence, wherein the ith sentence contains T i Individual words, word it (i∈[1,L],t∈[1,T i ]) Representing the t-th word in the i-th sentence. Hierarchical note network models can map individual document raw text into a vector representation. Wherein:
(1) Word sequence Encoder (Word Encoder)
The sentence is preprocessed, the word can be converted into a word vector by using an embedding matrix, and then the context representation of the word is obtained through a bidirectional GRU layer.
The coding part can introduce BERT pre-training, so that the model can transfer and learn information of words and sentences of other texts, optimize the expression degree of some word and sentence vectors and further improve the accuracy of the model.
For the ith sentence, the word encoding process of the sentence is formulated as follows:
x it =W we word it ,t∈[1,T i ];
wherein word it A one-hot (one-hot) encoded representation for the T-th word in the i-th sentence, the length of the ith sentence word is T i ,W we Representing the embedded vector, the resulting x it I.e., the pre-training vector representation of the t-th word in the i-th sentence.And->As a result of the bi-directional GRU layer, is a hidden vector representation of the t-th word in the i-th sentence.
(2) Word level Attention layer (Word Attention)
Not all words have equal effect on the representation of sentence semantics. Accordingly, attention mechanisms are introduced to extract words that are important for semantic representation of sentences and to aggregate representations of these words to form sentence vectors. Firstly, the hidden vector representation of the t word in the ith sentence in the text is passed through a single-layer full-connection layer to obtain the learning representation u as the t word in the ith sentence it The method comprises the steps of carrying out a first treatment on the surface of the A context vector u representing word level is then initialized w General purpose medicineCalculating the similarity of the t word in the i-th sentence and the context vector of the word level, and carrying out SoftMax normalization to obtain a method capable of measuring the importance degree alpha of the t word in the i-th sentence it . Thereafter, weights of the weight-based word representations are calculated to obtain sentence vector s i . Wherein the context vector u w "advanced representation" considered as a fixed problem, i.e. "which are useful words", the model is randomly initialized during training and combined with the learning word context vector u w
The specific process is expressed as follows:
u it =tanh(W wa h it +b wa );
wherein W is wa And b wa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer; h is a it Is the hidden vector of the t-th word in the i-th sentence obtained by the bi-directional GRU layer mentioned in the above section (1); u (u) it Representing a representation of the t-th word in the i-th sentence obtained through the single fully-concatenated layer. u (u) w A context vector representing a word level; alpha it Representing that the T-th word in the i-th sentence occupies all T of the i-th sentence i Weights in words. s is(s) i Representing T in the comprehensive ith sentence i And (3) a word vector, and calculating a vector representation of the ith sentence.
(3) Sentence Encoder (Sentence Encoder)
The sentence level coding and the word level coding adopt similar methods, and the calculated expression vectors of the L sentences are subjected to bidirectional GRU to obtain the context expression of each sentence, and the specific process is expressed as follows:
wherein s is i Is the vector representation of the i-th sentence calculated above;and->As a result of the bi-directional GRU layer, a hidden vector representation of the ith sentence is represented.
(4) Sentence-level attention layer (Sentence Attention)
The sentence level attention and word level attention adopt similar methods, firstly, through a single-layer full-connection layer, then, the weight of the ith sentence in the document is obtained through calculation of a context vector us of the sentence boundary, and finally, a document vector d is obtained, which integrates all the information of the sentences in the document;
the specific process is expressed as follows:
u i =tanh(W sa h i +b sa );
wherein W is sa And b sa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer, h i Is the hidden vector of the ith sentence obtained by the bi-directional GRU layer mentioned in the above section (2); u (u) i A vector representation representing the ith sentence obtained by a single fully connected layer. u (u) s Representative of sentenceLevel context vector, alpha i Representing the weight of the ith sentence in all L sentences of the whole document. d represents a document vector representation obtained by integrating all L sentence vectors of the entire document.
d judge 、d performance The coding vector representing the original congratulatory criminal period change and the modification expression is generated through all text coding steps of (1) - (4) of the static text coder.
b) (criminal name) entity embedded encoder
The crime name is limited and discrete entity data, the model regards a crime name as a word, and the crime name is also represented by a single-hot (one-hot) code, so that the crime name can be coded into a vector form by referring to a text coding mode, and is input into the model. Entity embedded Encoder (Crime Encoder) for criminal names (Crime) consulting hierarchical attention network model (Hierarchical Attention Networks for Document Classification) [1] The structures of (1) - (2), the resulting sentence code vector represents the code vector d of the Crime name (Crime) crime
c) Dynamic text encoder (Dynamic Text Encoder)
Dynamic text encoder (Dynamic Text Encoder) of criminal facts (Fact), legal bases (arc) also references hierarchical attention network model (Hierarchical Attention Networks for Document Classification) )[1] . However, since the text of the Crime facts (Fact) and the legal evidence (rule) aims at a specific task of predicting the criminal period, only part of text information can play a key role, and the key information of the Crime facts and the legal evidence can be extracted according to the text information of the Crime names (Crime), the original criminal period change (Judge) and the transformation Performance (Performance) which are relatively related to the task of predicting the criminal period, so that the code vector of the text information can be formed. In order to efficiently combine the associated information of all documents to achieve the cross-document migration of information, the dynamic text encoder (Dynamic Text Encoder) employs a special process: receiving the coding results of a criminal name (Crime), an original criminal period change (Judge) and a modification Performance (Performance) text coder, dynamic Context vector (Context) depending on the remodeled document coding modelvector), embedded into the text codes of crime facts (Fact) and legal bases (arc), and obtaining the text code vector for the specific task of predicting the criminal investigation period.
Modifying word context vector u in text encoder encoding w And u s They are not randomly initialized during the training process, but are represented by vectors of related source documents, obtained through single-layer full-link training learning. The abstract dynamic context generation process is formulated as follows:
u w =W w d+b w
u s =W s d+b s
d represents the source document coding vector which needs to be referred to for extracting the related information, u w And u s Context vectors representing word and sentence levels, respectively; w (W) w And b w Weight parameters and bias parameters of single-layer full-connection layers of word layers respectively, W s And b s Respectively sentence level is connected in a single layer weight parameters and bias parameters of the layers.
Since extracting relevant information of Crime facts (Fact) generally requires combining key information of Crime names (Crime) of criminals in practice to obtain more accurate information of Crime facts (Fact), context vector (Context vector) of dynamic text encoder (Dynamic Text Encoder) thereof requires expression vector d of Crime names (Crime) crime Carrying out single-layer full connection to obtain; in practice, the related information of the legal basis (Article) generally needs to be combined with the key information of the original criminal period variation (Judge) and the transformation Performance (Performance) to obtain the information of the more accurate legal basis (Article), so the Context vector (Context vector) of the dynamic text encoder (Dynamic Text Encoder) needs to be the expression vector d of the original criminal period variation (Judge) and the transformation Performance (Performance) judge 、d performance And (5) carrying out single-layer full connection.
Combining information of other documents the following vector is specifically formulated as follows:
wherein d is crime 、d judge And d performance The code vectors are respectively obtained through a static text encoder (Static Text Encoder) and used for modifying the expression of criminal names and criminal period variation of the original criminal period.And->A context vector at word level for crime facts and laws, respectively; />And->Crime facts and laws depend on sentence-level context vectors, respectively.
The context vectors of Crime facts (Fact) and legal bases (decision) in the dynamic text encoder (Dynamic Text Encoder) are not directly randomly initialized, but are obtained by combining key information of Crime names (Crime), original criminal period change (Judge) and modification Performance (Performance) to perform training learning. The model can correlate and extract the crime facts and relevant information according to laws of the case to generate a text code which is more relevant to the crime-reduction information——d fact 、d article
Determination of model criminal investigation phase distribution
Finally, these related text information codes are concatenated d., and the criminal investigation period distribution of the input cases is predicted by a SoftMax classifier.
d=concat(d crime ,d judge ,d performance ,d fact ,d article );
p=softmax(W c d+b c );
Wherein d represents the coding vector of a text obtained by a text encoder, d represents the coding vector required in the predicted criminal investigation period, W c And b c Is the weight parameter and the bias parameter of the single-layer full-connection layer.
The cross entropy loss (Categorical Cross Entropy) is used as the loss for training:
wherein x represents an input sample, C is the total number of the types of the criminal investigation period to be classified, y i For the real criminal investigation period label category corresponding to the ith category, logf i (x) And outputting a criminal investigation period category value for the corresponding model.
By minimizing cross entropy loss, model parameters are trained to obtain a model that can predict the distribution of the criminal investigation period.
The beneficial effects of the invention are as follows:
the novel cross-document criminal investigation prediction model is provided, and based on the model, relevant criminal document information can be better combined, and the accuracy of criminal investigation prediction period is improved.
Drawings
FIG. 1 is a diagram of a predictive recognition model of a multi-document criminal reduction case of the present invention.
FIG. 2 is a diagram illustrating an implementation of the present invention.
Detailed Description
Examples: the criminal investigation prediction model adopts Tensorflow2.0 (Keras) to realize training and prediction of the model and is based on Django deployment, an interface for model management configuration is provided for an administrator through Django Admin, and a restful API interface is provided for a user to call the model for prediction.
As shown in fig. 2, the model deployment is divided into a training end and a prediction end, wherein the training end deploys the trained model by researchers and can be directly used for prediction; and the subsequent manager is allowed to add data by himself to perform remote training.
Firstly, a training end needs to add a configured training model data (corresponding to dataset data defined by Django) and needs to set: model name, training process (Keras-based Python function defining a training process to get model), hyper-parameters (hyper-parameters required in model function), if uploading trained model, uploading: the method comprises the steps of training a model weight file, training a criminal name dictionary, training an original judgment information dictionary, training a transformation expression dictionary and a model label (label list of data corresponding to a trained model), and if the model to be trained is uploaded, setting: the batch size (the size of the training set is taken in each round of training), the Epochs (the round of training), the Verbose (whether the training process is visible in the server), the training progress (the training progress percentage is synchronously modified in the training process), and the data label is added to the model. The data tags may be uploaded in bulk, individually. The batch uploading file type is csv file. After the untrained model configuration and corresponding data set are uploaded, the model may be trained using a server.
The predictive end may directly input the pdf format of the original document, text content can be input according to the requirement, and a trained model result can be obtained. The pdf-format file is uploaded to, the background converts pdf into txt format files, then extracting the input content required by the model through regular matching based on rules, then obtaining a criminal investigation period prediction result by the model; and (5) typing in content information, selecting a corresponding trained model, and obtaining a criminal investigation period prediction result.
Reference is made to:
[1]Yang,Z.,et al.(2016).Hierarchical attention networks for document classification.Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics:human language technologies.

Claims (1)

1. a crime-reduction period prediction method based on a deep learning technology is characterized in that a multi-document attention mechanism is specifically utilized to construct a crime-reduction prediction model, and according to basic crime-reduction document information, other document information related to crimes is migrated and combined to predict the crime-reduction period; the method comprises the following specific steps:
(one) collecting data and preprocessing
Firstly, collecting criminal document data, corresponding criminal document data and key legal bases, and preprocessing;
the pretreatment comprises the following steps: regular matching, namely comparing the material submitted during criminal investigation-criminal investigation (false release) audit list-with the content of a document, and confirming that the criminal name, the original criminal period change, the transformation performance, the criminal fact and the legal rules in the document are key parts for determining the criminal period according to the parts; extracting the key information from the criminal document and the corresponding criminal document through regular matching to obtain data required by the model; data cleaning; word segmentation and stop word removal;
(II) design model text encoder
The multi-document criminal investigation period prediction model is a text encoder, which consists of three parts, namely: a static text encoder, a criminal name entity embedded encoder and a dynamic text encoder; attention to the network model based on the hierarchy; the text encoder has the functions of inputting crime name, original criminal period change, transformation expression, crime facts and legal basis, and the corresponding encoder generates a coding vector-d crime 、d judge 、d performance 、d fact 、d article
Wherein, the composition of the text encoder, namely the static text encoder, comprises the following parts: (1) a word sequence encoder; (2) a word-level attention layer; (3) a sentence encoder; (4) a sentence-level attention layer; the concrete explanation is as follows:
assume that a document has L sentences in a text i Representing the ith sentence, wherein the ith sentence contains T i Individual words, word it (i∈[1,L],t∈[1,T i ]) Representing the t-th word in the i-th sentence; the hierarchical note network model can map the original text of each document into a vector representation; wherein:
(1) Word sequence encoder
Preprocessing sentences, converting words into word vectors by adopting an embedding matrix, and obtaining the context representation of the words through a bidirectional GRU;
the coding part introduces BERT pre-training, so that the model can transfer and learn the information of words and sentences of other texts, optimize the representation degree of some word and sentence vectors and further improve the accuracy of the model;
for the ith sentence, the word encoding process of the sentence is formulated as follows:
x it =W we word it ,t∈[1,T i ];
wherein word it Is the one-hot coded representation of the T-th word in the i-th sentence, the i-th sentence word length is T i ,W we Representing the embedded vector, the resulting x it Namely the pre-training vector representation of the t word in the i sentence;and->Is double in numberThe result of the GRU layer is the hidden vector representation of the t word in the i sentence;
(2) Word level attention layer
Not all words have equal effect on the representation of sentence semantics; thus, attention mechanisms are introduced to extract words that are important for semantic representation of sentences, and summarizing the representations of the words to form sentence vectors; firstly, the hidden vector representation of the t word in the ith sentence in the text is passed through a single-layer full-connection layer to obtain the learning representation u as the t word in the ith sentence it The method comprises the steps of carrying out a first treatment on the surface of the A context vector u representing word level is then initialized w By calculating the similarity of the t word in the i-th sentence and the context vector of the word level and carrying out SoftMax normalization, the importance degree alpha of the t word in the i-th sentence where the t word is located can be measured it The method comprises the steps of carrying out a first treatment on the surface of the Thereafter, weights of the weight-based word representations are calculated to obtain sentence vector s i The method comprises the steps of carrying out a first treatment on the surface of the Wherein the context vector u w "advanced representation" considered as a fixed problem, i.e. "which are useful words", the model is randomly initialized during training and combined with the learning word context vector u w
The specific process is expressed as follows:
u it =tanh(W wa h it +b wa );
wherein W is wa And b wa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer; h is a it Is obtained from the bidirectional GRU layer mentioned in the above (1) the hidden vector of the t word in the i-th sentence; u (u) it A representation representing a t-th word in an i-th sentence obtained through a single full-concatenation layer;u w representative word level of context vector; alpha it Representing that the T-th word in the i-th sentence occupies all T of the i-th sentence i Weights in words; s is(s) i Representing T in the comprehensive ith sentence i The word vector is calculated, and the vector representation of the i-th sentence is calculated;
(3) Sentence encoder
The sentence level coding and the word level coding adopt similar methods, and the calculated expression vectors of the L sentences are subjected to bidirectional GRU to obtain the context expression of each sentence, and the specific process is expressed as follows:
wherein s is i Is the vector representation of the i-th sentence calculated above;and->As a result of the bi-directional GRU layer, representing a hidden vector representation of the ith sentence;
(4) Sentence-level attention layer
Sentence-level attention and word-level attention are similarly performed by first passing through a single full-connection layer and then passing through the context vector u of sentence boundary s Calculating to obtain the weight of the ith sentence in the document, and finally obtaining a document vector d which sums all the information of the sentences in the document;
the specific process is expressed as follows:
u i =tanh(W sa h i +b sa );
wherein W is sa And b sa Respectively a weight parameter and a bias parameter of a single-layer full-connection layer, h i Is the hidden vector of the ith sentence obtained by the bi-directional GRU layer mentioned in the above section (2); u (u) i A vector representation representing an ith sentence obtained through a single fully connected layer; u (u) s Context vector, alpha, representing sentence level i Representing the weight of the ith sentence in all L sentences of the whole document; d represents a document vector representation obtained by integrating all L sentence vectors of the whole document;
d judge 、d performance the coding vector representing the original congratulatory period variation and the modification expression is generated through all text coding steps of (1) - (4) of the static text coder;
the method comprises the steps that the composition of a text encoder, namely a criminal name entity embedded encoder, refers to a static text encoding mode, a model regards a criminal name as a word, the criminal name is represented by single-hot encoding, the criminal name is encoded into a vector form, and the vector form is input into the model; the entity embedded encoder of the crime refers to the structures of (1) - (2) of the hierarchical attention network model, and the generated sentence code vector represents the code vector d of the crime crime
Wherein, the composition of the text encoder, namely the dynamic text encoder, also refers to the hierarchical attention network model; because of the text according to crime facts and laws, aiming at a specific task of predicting criminal investigation period, only part of text information can play a key role, and the key information according to crime facts and laws can be extracted according to the text information of crime names, original criminal investigation period changes and modification performances which are relatively related to the task of predicting criminal investigation period, so that the code vector is formed; in order to effectively combine the associated information of all documents and realize the cross-document migration of information, the dynamic text encoder adopts a special process: receiving the code results of a crime name, the change of an original criminal investigation period and the modification expression text coder, and embedding the code results into the text codes according to crime facts and laws by means of the dynamic context vector of the modified document code model to obtain a text code vector aiming at the specific task of predicting the criminal investigation period;
modifying word context vector u in text encoder encoding w And u s They are not randomly initialized in the training process, but are represented by vectors of related source documents, and are obtained through single-layer full-connection layer training learning; the abstract dynamic context generation process is formulated as follows:
u w =W w d+b w
u s =W s d+b s
d represents the source document coding vector which needs to be referred to for extracting the related information, u w And u s Context vectors representing word and sentence levels, respectively; w (W) w And b w Weight parameters and bias parameters of single-layer full-connection layers of word layers respectively, W s And b s Weight parameters and bias parameters of a single-layer full-connection layer of a sentence layer respectively;
in a cross-document criminal period predication model, a specific formula of a context vector combined with information of other document documents is as follows:
wherein d crime 、d j u dge And d perfomance The code vectors are respectively obtained through a static text encoder, the crime name, the original criminal judgment period variation and the transformation expression;and->A context vector at word level for crime facts and laws, respectively; />And->Crime facts and laws are based on sentence-level context vectors, respectively;
the context vector according to crime facts and laws in the dynamic text encoder is not directly and randomly initialized, but is obtained by combining key information of crime names, original criminal period change and modification expression, and training and learning are performed; the model can correlate the crime facts of the extracted cases and the related information according to the laws, and generate a text code-d with more related criminal investigation information fact 、d article
Determination of model criminal investigation phase distribution
Finally, these related text information codes are concatenated d., and the criminal investigation period distribution of the input case is predicted by a SoftMax classifier:
d=concat(d crime ,d judge ,d performance ,d fact ,d article );
p=softmax(W c d+b c );
wherein d. tableThe coding vector obtained by a text through a text coder is shown, d represents the coding vector required by the predictive criminal investigation period, W c And b c The weight parameter and the bias parameter of the single-layer full-connection layer;
the cross entropy loss is used as the loss for training:
wherein x represents an input sample, C is the total number of the types of the criminal investigation period to be classified, y i For the real criminal investigation period label category corresponding to the ith category, logf i (x) Outputting a criminal investigation period category value for the corresponding model;
by minimizing cross entropy loss, model parameters are trained to obtain a model that can predict the distribution of the criminal investigation period.
CN202110584847.9A 2021-05-27 2021-05-27 Criminal investigation period prediction method based on deep learning technology Active CN113642756B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110584847.9A CN113642756B (en) 2021-05-27 2021-05-27 Criminal investigation period prediction method based on deep learning technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110584847.9A CN113642756B (en) 2021-05-27 2021-05-27 Criminal investigation period prediction method based on deep learning technology

Publications (2)

Publication Number Publication Date
CN113642756A CN113642756A (en) 2021-11-12
CN113642756B true CN113642756B (en) 2023-11-24

Family

ID=78415849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110584847.9A Active CN113642756B (en) 2021-05-27 2021-05-27 Criminal investigation period prediction method based on deep learning technology

Country Status (1)

Country Link
CN (1) CN113642756B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109558993A (en) * 2018-12-18 2019-04-02 华南师范大学 Prediction technique, device, storage medium and the server of theory of crime prison term
CN110610005A (en) * 2019-09-16 2019-12-24 哈尔滨工业大学 Stealing crime auxiliary criminal investigation method based on deep learning
CN111815485A (en) * 2020-06-12 2020-10-23 中国司法大数据研究院有限公司 Sentencing prediction method and device based on deep learning BERT model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018232699A1 (en) * 2017-06-22 2018-12-27 腾讯科技(深圳)有限公司 Information processing method and related device
US20210103814A1 (en) * 2019-10-06 2021-04-08 Massachusetts Institute Of Technology Information Robust Dirichlet Networks for Predictive Uncertainty Estimation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109558993A (en) * 2018-12-18 2019-04-02 华南师范大学 Prediction technique, device, storage medium and the server of theory of crime prison term
CN110610005A (en) * 2019-09-16 2019-12-24 哈尔滨工业大学 Stealing crime auxiliary criminal investigation method based on deep learning
CN111815485A (en) * 2020-06-12 2020-10-23 中国司法大数据研究院有限公司 Sentencing prediction method and device based on deep learning BERT model

Also Published As

Publication number Publication date
CN113642756A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN110119765B (en) Keyword extraction method based on Seq2Seq framework
WO2022178919A1 (en) Taxpayer industry classification method based on noise label learning
CN110532557B (en) Unsupervised text similarity calculation method
WO2021051518A1 (en) Text data classification method and apparatus based on neural network model, and storage medium
CN114169330A (en) Chinese named entity identification method fusing time sequence convolution and Transformer encoder
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN111723196B (en) Single document abstract generation model construction method and device based on multi-task learning
CN111783399A (en) Legal referee document information extraction method
CN113239663B (en) Multi-meaning word Chinese entity relation identification method based on Hopkinson
CN113987187A (en) Multi-label embedding-based public opinion text classification method, system, terminal and medium
CN114139522A (en) Key information identification method based on level attention and label guided learning
Wadud et al. Word embedding methods for word representation in deep learning for natural language processing
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN115859980A (en) Semi-supervised named entity identification method, system and electronic equipment
CN109446326A (en) Biomedical event based on replicanism combines abstracting method
CN111046233B (en) Video label determination method based on video comment text
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
CN113642756B (en) Criminal investigation period prediction method based on deep learning technology
CN115186670B (en) Method and system for identifying domain named entities based on active learning
CN116186241A (en) Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium
Padia et al. UMBC at SemEval-2018 Task 8: Understanding text about malware
CN115345150A (en) System and method for generating introduction content of scientific and technological paper
CN114997176A (en) Method, apparatus and medium for identifying descriptor of text data
CN114611489A (en) Text logic condition extraction AI model construction method, extraction method and system
CN114692596A (en) Deep learning algorithm-based bond information analysis method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant