CN111460100A - Criminal legal document and criminal name recommendation method and system - Google Patents

Criminal legal document and criminal name recommendation method and system Download PDF

Info

Publication number
CN111460100A
CN111460100A CN202010236444.0A CN202010236444A CN111460100A CN 111460100 A CN111460100 A CN 111460100A CN 202010236444 A CN202010236444 A CN 202010236444A CN 111460100 A CN111460100 A CN 111460100A
Authority
CN
China
Prior art keywords
criminal
name
text
model
text set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010236444.0A
Other languages
Chinese (zh)
Inventor
李芳芳
陈可道
张健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202010236444.0A priority Critical patent/CN111460100A/en
Publication of CN111460100A publication Critical patent/CN111460100A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method and a system for recommending criminal legal document and criminal names. The method comprises the following steps: obtaining a criminal law document, and performing word segmentation processing on the criminal law document to obtain a text set with entries as units; obtaining a word2vec word vector model and a text separable convolutional neural network model; establishing a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model; and obtaining a criminal name tag matrix according to the text set by using the criminal name recommendation model, and further determining the criminal name corresponding to the criminal legal document according to the criminal name tag matrix. The criminal legal document and criminal name recommending method and system provided by the invention have the characteristics of low labor cost, high criminal name acquiring efficiency, high criminal name recommending accuracy and the like.

Description

Criminal legal document and criminal name recommendation method and system
Technical Field
The invention relates to the technical field of text processing, in particular to a method and a system for recommending criminal legal document and criminal names.
Background
Predicting the names of crimes according to case description and facts of criminal legal documents refers to predicting the names of crimes corresponding to criminal cases when a prisoner is lifted up for criminal actions according to a section of specific text describing the criminal cases.
There are the following methods for judging the name of a criminal law document:
the method is a relatively original method, and is characterized in that the crime name is judged purely manually by professionals generally according to related knowledge. The method requires manpower with rich criminal law knowledge and experience, the accuracy of the method also depends on the professional degree of personnel, a lot of time is needed, and the efficiency is low.
The automatic prediction method based on key word matching determines the corresponding names of the crimes by collecting and customizing a dictionary corresponding to different types of the crimes and matching the text of each case with the custom dictionary for the key words. The method is high in speed, but low in precision, needs a large amount of priori knowledge to construct a matching word dictionary, and is high in labor cost.
The classification algorithm based on the machine learning model, such as a support vector machine, a random forest and the like, is mainly characterized in that according to the description of the observed legal document case of the heart and the characteristics of the fact text, the characteristics are constructed through manual extraction and then serve as input, and the classification result about the criminal name is obtained through the machine learning algorithm. When the algorithms face the multi-label multi-classification problem, the accuracy is not high, the training time is not long enough, and a large amount of priori knowledge is needed to manually construct complex features.
The general deep learning algorithm based on the sequence model or the Recurrent Neural Network does not need to manually construct features, but has high requirements on original training data, for example, when the criminal legal document data is very unbalanced, the deep learning model is not enough to well obtain semantic information of criminal legal document case description and factual text, and the semantic information is often combined with input capable of extracting the semantic information of the original text to have better precision effect, and the method has the defect of long training time because of complex Network structure.
Therefore, the method for recommending criminal legal documents and criminal names has low labor cost and high criminal name acquisition efficiency, and is a technical problem to be solved in the field.
Disclosure of Invention
The invention aims to provide a criminal legal document and criminal name recommendation method and system with low labor cost and high criminal name acquisition efficiency.
In order to achieve the purpose, the invention provides the following scheme:
a method of recommending criminal legal document and criminal names, comprising:
obtaining a criminal law document, and performing word segmentation processing on the criminal law document to obtain a text set with entries as units;
obtaining a word2vec word vector model and a text separable convolutional neural network model;
establishing a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model;
and obtaining a criminal name tag matrix according to the text set by using the criminal name recommendation model, and further determining the criminal name corresponding to the criminal legal document according to the criminal name tag matrix.
Preferably, the obtaining the criminal law documents and performing the word segmentation processing on the criminal law documents to obtain the text set with the entries as the units further includes:
preprocessing the text set; the pretreatment comprises the following steps: stop words and punctuation are removed.
Preferably, the recommendation method further includes:
selecting the criminal law documents meeting preset conditions from the criminal law documents to perform calibration sampling;
acquiring a text set of the criminal law documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal law documents subjected to calibration sampling as a training sample pair;
and training the criminal name recommendation model by adopting the training samples.
Preferably, the training of the guilty name recommendation model by using the training samples specifically includes:
acquiring a text set in the training sample pair;
obtaining a word vector matrix according to the text set in the training sample pair by using the word2vec word vector model;
obtaining a first guilt name tag matrix according to the word vector matrix by using the text separable convolutional neural network model;
judging whether the first guilt name tag matrix is a guilt name tag matrix corresponding to the text set in the training sample pair, if so, directly outputting the first guilt name tag matrix to obtain a trained guilt name recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
A system for recommending criminal legal document and criminal names, comprising:
the system comprises a text set determining module, a word segmentation module and a word segmentation module, wherein the text set determining module is used for acquiring criminal legal documents and performing word segmentation on the criminal legal documents to obtain a text set with entries as units;
the acquisition module is used for acquiring a word2vec word vector model and a text separable convolutional neural network model;
a criminal name recommendation model building module, which is used for building a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model;
and the criminal name determining module is used for obtaining a criminal name label matrix according to the text set by utilizing the criminal name recommending model and further determining the criminal name corresponding to the criminal legal document according to the criminal name label matrix.
Preferably, the system further comprises:
the preprocessing module is used for preprocessing the text set; the pretreatment comprises the following steps: stop words and punctuation are removed.
Preferably, the recommendation system further comprises:
the calibration adopting module is used for selecting criminal law documents meeting preset conditions from the criminal law documents to perform calibration sampling;
the training sample pair acquisition module is used for acquiring a text set of the criminal legal documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal legal documents subjected to calibration sampling as a training sample pair;
and the training module is used for training the guiltname recommendation model by adopting the training samples.
Preferably, the training module specifically includes:
a text set obtaining unit, configured to obtain a text set in the training sample pair;
a word vector matrix determining unit, configured to obtain a word vector matrix according to the text set in the training sample pair by using the word2vec word vector model;
the first guilt name tag matrix determining unit is used for obtaining a first guilt name tag matrix according to the word vector matrix by utilizing the text separable convolutional neural network model;
a guiltname recommendation model training unit for judging whether the first guiltname label matrix is a guiltname label matrix corresponding to the text set in the training sample pair, if so, directly outputting the first guiltname label matrix to obtain a trained guiltname recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the criminal legal document and the system for recommending the names of the criminal legal documents, the names of the criminal legal documents can be obtained by inputting the text set in the criminal legal documents, taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model with the built-up criminal name recommending model, so that the manpower cost can be greatly reduced, and the acquirement efficiency of the names of the criminal legal documents can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method for recommending criminal legal document and criminal names provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a criminal name recommendation model according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a criminal legal document and criminal name recommendation system according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a criminal legal document and criminal name recommendation method and system with low labor cost and high criminal name acquisition efficiency.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of a method for recommending criminal legal document and criminal names according to an embodiment of the present invention, and as shown in fig. 1, the method for recommending criminal legal document and criminal names includes:
s1, obtaining the criminal law documents, and performing word segmentation processing on the criminal law documents to obtain a text set with entries as units.
And S2, acquiring a word2vec word vector model and a text separable convolutional neural network model.
And S3, constructing a criminal name recommendation model by taking the word2vec word vector model as an embedding layer and combining the text separable convolutional neural network model.
S4, obtaining a criminal name label matrix according to the text set by using the criminal name recommendation model, and further determining the criminal name corresponding to the criminal legal document according to the criminal name label matrix.
In order to improve the accuracy of the recommendation of the criminal name by the criminal name recommendation model, after obtaining the criminal legal document at the step S1 and performing word segmentation on the criminal legal document to obtain a text set with entries as units, the method for recommending the criminal legal document name further comprises the following steps:
preprocessing the text set; the pretreatment comprises the following steps: stop words and punctuation are removed.
In order to further improve the accuracy of acquiring the criminal name, as another embodiment of the present invention, the recommendation method further includes:
and selecting the criminal law documents meeting preset conditions from the criminal law documents for calibration and sampling.
And acquiring a text set of the criminal legal documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal legal documents subjected to calibration sampling as a training sample pair.
And training the criminal name recommendation model by adopting the training samples. The training process specifically comprises:
and acquiring a text set in the training sample pair.
And obtaining a word vector matrix according to the text set in the training sample pair by using the word2vec word vector model.
Obtaining a first guilt name tag matrix according to the word vector matrix by using the text separable convolutional neural network model;
judging whether the first guilt name tag matrix is a guilt name tag matrix corresponding to the text set in the training sample pair, if so, directly outputting the first guilt name tag matrix to obtain a trained guilt name recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
As another embodiment of the present invention, the training of the guiltname recommendation model may further include:
the method comprises the following steps: dividing criminal legal documents into a text training set and a test set, and performing word segmentation treatment on the text training set and the test set to form a text set which takes entries as units, wherein the text set specifically comprises the following steps:
A. a collection of texts on criminal case descriptions in criminal law documents is extracted as a training set.
B. And (4) extracting a criminal name label part corresponding to each criminal case description in the criminal legal document as a training set label.
C. And performing word segmentation on the text set in the A by using a Chinese word segmentation tool to obtain a vocabulary entry set taking vocabulary entries as units.
D. And constructing an integer criminal name label matrix aiming at the related criminal name label part in the B.
Step two: and (4) constructing a targeted stop word list according to the statistical information of the criminal law documents, and then preprocessing the entry text set in the step one. The preprocessing mainly comprises the steps of removing stop words and punctuation marks according to a specific stop word list, then randomly dividing a training set to obtain an input entry set, and simultaneously converting labels of the training set into a guilty name label matrix. The method specifically comprises the following steps:
A. and D, observing and counting the entry set obtained in the step one. The observing and counting comprises obtaining the maximum and average entry length; determining common times, locations, people words, logically unrelated words, and the like.
B. And B, designing a stop word list for the criminal legal documents according to the related words and the common stop words in the step A, and using the stop words and the punctuation marks.
C. And numbering the entry sets according to the original sequence, and randomly dividing the entry sets according to the numbers to obtain input entry sets.
Step three: and (4) acquiring Chinese text corpus training to generate a word2vec word vector model, and converting the input entry set obtained in the step two into a word vector matrix by using the word vector model. The method specifically comprises the following steps:
A. and collecting a public Chinese short text corpus in the Internet as a text training set of word2 vec.
B. And inputting the external corpus training set obtained in the step A into a word2vec model for training to obtain a word2vec word vector model capable of converting the vocabulary entry into a word vector.
C. And converting the input entry set in the step two into a word vector matrix through a word2vec model.
Step four: and extracting unbalanced data items with less quantity, and generating additional data items by oversampling according to the word vector similarity between the samples to fully balance the sample data size.
And counting the unbalanced data label categories, and sorting corresponding word vectors into a set.
Regarding each category, regarding all word vectors in the word vector matrix as a semantic set, traversing all word vectors of all samples in sequence, respectively solving the similarity of the mean word vector of each existing semantic set, if the similarity is higher, adding the word vector into the set, otherwise, regarding as a new semantic set, then taking a fixed percentage of semantic sets to sort the sequence according to the number of the word vectors, and randomly generating the data item of the label category by using the mean word vector of each semantic set. The method comprises the following specific steps:
A. assume that a word vector matrix of a first class with a sample size n is:
{w1,w2,w3...,wn}
B. each word vector in the word vector matrix is considered as a semantic set:
s1s2s3...sn
C. and sequentially taking each word vector wx of the samples in the rest first categories, and respectively solving the similarity I with the mean word vector of each semantic set:
I=Sim(wx,Avg(si))
D. if I is above a certain threshold, s is setiAdding wx, otherwise, regarding as a new semantic set sx
E. And circulating the A-D process until all the word vectors contained in the class sample are traversed.
F. And ordering all semantic sets by word vector quantity, taking the first x percent (determining the specific value of x according to actual needs), and finally randomly generating new data items of the category by using the mean word vectors in the semantic sets so as to achieve data balance.
Step five: designing and realizing a deep learning network based on text-separable-cnn (text separable convolutional neural network model), filling parameters, and selecting and realizing a classifier layer;
the structure for realizing text-platform-cnn predicted aiming at criminal legal document and criminal name includes: 128 convolution kernels with window sizes of 2, 3, 4, 5 can separate the convolution layer, the BatchNorm layer, the relu activation function layer, the max pooling layer, and the sigmoid output layer.
Step six: training a criminal name recommendation model which is formed by taking a word2vec vector model as an embedded layer and combining a text separable convolutional neural network, and specifically comprises the following steps:
and (3) taking the word2vec word vector model generated in the third step as a word embedding layer, combining a text-semantic-cnn network realized in the fourth step, taking a word vector matrix of a training set in the third step as input, taking a guiltname label matrix in the second step as training output, and training a guiltname recommendation model to obtain a deep learning model for predicting guiltnames of criminal legal documents (the specific structure of the guiltname recommendation model is shown in figure 2).
In order to improve the fitting degree of the model, a random gradient descent algorithm is adopted in the training process, and the parameters of the criminal name recommendation model are adjusted. When the loss function of the guilt name recommendation model is lower than a set threshold value, the guilt name label matrix output by the guilt name recommendation model obtained through current training is the same as or similar to the guilt name label matrix calibrated in the training sample pair.
In addition, aiming at the recommendation method of the criminal legal document and criminal names, the invention also correspondingly provides a recommendation system of the criminal legal document and criminal names, the structure of which is shown in fig. 3, and the system comprises: the system comprises a text set determining module 1, an obtaining module 2, a crime recommendation model building module 3 and a crime determining module 4.
The text set determining module 1 is used for acquiring criminal legal documents, and performing word segmentation processing on the criminal legal documents to obtain a text set with entries as units; the obtaining module 2 is used for obtaining a word2vec word vector model and a text separable convolutional neural network model; the criminal name recommendation model building module 3 is used for building a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model; and the criminal name determining module 4 is used for obtaining a criminal name label matrix according to the text set by utilizing the criminal name recommending model, and further determining the criminal name corresponding to the criminal legal document according to the criminal name label matrix.
In order to improve the accuracy of the recommendation of the criminal name, the system further comprises: a pre-processing module for pre-processing the text collection.
As a further optimization of the system, the recommendation system further comprises: the system comprises a calibration adopting module, a training sample pair obtaining module and a training module.
The calibration adopting module is used for selecting criminal legal documents meeting preset conditions from the criminal legal documents to perform calibration sampling; the training sample pair acquisition module is used for acquiring a text set of the criminal legal documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal legal documents subjected to calibration sampling as a training sample pair; the training module is used for training the guiltname recommendation model by adopting the training samples.
Wherein, the training module specifically includes: the system comprises a text set acquisition unit, a word vector matrix determination unit, a first guiltname tag matrix determination unit and a guiltname recommendation model training unit.
The text set acquisition unit is used for acquiring a text set in the training sample pair.
And the word vector matrix determining unit is used for obtaining a word vector matrix according to the text set in the training sample pair by utilizing the word2vec word vector model.
The first guilt name tag matrix determining unit is used for obtaining a first guilt name tag matrix according to the word vector matrix by utilizing the text separable convolutional neural network model;
the training unit of the name of the guilty recommendation model is used for judging whether the first name of the guilty label matrix is the name of the guilty label matrix corresponding to the text set in the training sample pair, if yes, the first name of the guilty label matrix is directly output to obtain the trained name of the guilty recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
Compared with the prior art, the criminal legal document and criminal name recommendation method and system provided by the invention can reduce noise by constructing the disuse word list of the criminal legal document; the method has the advantages that word vectors containing semantics and adapting to criminal legal documents are calculated by adopting a word2vec word vector model, unbalanced data are subjected to oversampling processing of similarity among categories, a specific structure of text-private-cnn is constructed, and other technical means, accuracy of criminal name recommendation can be remarkably improved, and meanwhile, the problems of long time spent on human criminal name determination, low efficiency and low accuracy in the prior art can be solved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A method for recommending criminal legal document and criminal names, comprising:
obtaining a criminal law document, and performing word segmentation processing on the criminal law document to obtain a text set with entries as units;
obtaining a word2vec word vector model and a text separable convolutional neural network model;
establishing a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model;
and obtaining a criminal name tag matrix according to the text set by using the criminal name recommendation model, and further determining the criminal name corresponding to the criminal legal document according to the criminal name tag matrix.
2. The method for recommending criminal legal document and criminal names according to claim 1, wherein said obtaining criminal legal document and performing word segmentation process on said criminal legal document to obtain text collection in terms of entries further comprises:
preprocessing the text set; the pretreatment comprises the following steps: stop words and punctuation are removed.
3. A method of recommending criminal legal prosecution names according to claim 1, characterized in that said method of recommending further comprises:
selecting the criminal law documents meeting preset conditions from the criminal law documents to perform calibration sampling;
acquiring a text set of the criminal law documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal law documents subjected to calibration sampling as a training sample pair;
and training the criminal name recommendation model by adopting the training samples.
4. The criminal legal document criminal name recommendation method according to claim 3, wherein the training of the criminal name recommendation model by the training sample specifically comprises:
acquiring a text set in the training sample pair;
obtaining a word vector matrix according to the text set in the training sample pair by using the word2vec word vector model;
obtaining a first guilt name tag matrix according to the word vector matrix by using the text separable convolutional neural network model;
judging whether the first guilt name tag matrix is a guilt name tag matrix corresponding to the text set in the training sample pair, if so, directly outputting the first guilt name tag matrix to obtain a trained guilt name recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
5. A system for recommending criminal legal document and criminal names, comprising:
the system comprises a text set determining module, a word segmentation module and a word segmentation module, wherein the text set determining module is used for acquiring criminal legal documents and performing word segmentation on the criminal legal documents to obtain a text set with entries as units;
the acquisition module is used for acquiring a word2vec word vector model and a text separable convolutional neural network model;
a criminal name recommendation model building module, which is used for building a criminal name recommendation model by taking the word2vec word vector model as an embedded layer and combining the text separable convolutional neural network model;
and the criminal name determining module is used for obtaining a criminal name label matrix according to the text set by utilizing the criminal name recommending model and further determining the criminal name corresponding to the criminal legal document according to the criminal name label matrix.
6. A system for recommending criminal legal document and criminal names according to claim 5, characterized in that said system further comprises:
the preprocessing module is used for preprocessing the text set; the pretreatment comprises the following steps: stop words and punctuation are removed.
7. A recommendation system for criminal legal document and criminal names according to claim 5, characterized in that said recommendation system further comprises:
the calibration adopting module is used for selecting criminal law documents meeting preset conditions from the criminal law documents to perform calibration sampling;
the training sample pair acquisition module is used for acquiring a text set of the criminal legal documents subjected to calibration sampling and a criminal name label matrix corresponding to the text set of the criminal legal documents subjected to calibration sampling as a training sample pair;
and the training module is used for training the guiltname recommendation model by adopting the training samples.
8. The system for recommending criminal legal document and criminal names of claim 7, wherein said training module specifically comprises:
a text set obtaining unit, configured to obtain a text set in the training sample pair;
a word vector matrix determining unit, configured to obtain a word vector matrix according to the text set in the training sample pair by using the word2vec word vector model;
the first guilt name tag matrix determining unit is used for obtaining a first guilt name tag matrix according to the word vector matrix by utilizing the text separable convolutional neural network model;
a guiltname recommendation model training unit for judging whether the first guiltname label matrix is a guiltname label matrix corresponding to the text set in the training sample pair, if so, directly outputting the first guiltname label matrix to obtain a trained guiltname recommendation model; otherwise, adjusting the filling parameters of the text separable convolutional neural network model until the first guilty name tag matrix output by the text separable convolutional neural network model is the guilty name tag matrix, and stopping the adjustment of the filling parameters to obtain the trained guilty name recommendation model.
CN202010236444.0A 2020-03-30 2020-03-30 Criminal legal document and criminal name recommendation method and system Pending CN111460100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010236444.0A CN111460100A (en) 2020-03-30 2020-03-30 Criminal legal document and criminal name recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010236444.0A CN111460100A (en) 2020-03-30 2020-03-30 Criminal legal document and criminal name recommendation method and system

Publications (1)

Publication Number Publication Date
CN111460100A true CN111460100A (en) 2020-07-28

Family

ID=71681548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010236444.0A Pending CN111460100A (en) 2020-03-30 2020-03-30 Criminal legal document and criminal name recommendation method and system

Country Status (1)

Country Link
CN (1) CN111460100A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101559A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case and criminal name inference method based on machine learning
US11568018B2 (en) 2020-12-22 2023-01-31 Dropbox, Inc. Utilizing machine-learning models to generate identifier embeddings and determine digital connections between digital content items
US11567812B2 (en) 2020-10-07 2023-01-31 Dropbox, Inc. Utilizing a natural language model to determine a predicted activity event based on a series of sequential tokens

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A kind of prison term prediction technique based on multitask artificial neural network
CN109376963A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge law article unified prediction neural network based
CN109840279A (en) * 2019-01-10 2019-06-04 山东亿云信息技术有限公司 File classification method based on convolution loop neural network
CN110858269A (en) * 2018-08-09 2020-03-03 清华大学 Criminal name prediction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399230A (en) * 2018-02-13 2018-08-14 上海大学 A kind of Chinese financial and economic news file classification method based on convolutional neural networks
CN110858269A (en) * 2018-08-09 2020-03-03 清华大学 Criminal name prediction method and device
CN109213864A (en) * 2018-08-30 2019-01-15 广州慧睿思通信息科技有限公司 Criminal case anticipation system and its building and pre-judging method based on deep learning
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A kind of prison term prediction technique based on multitask artificial neural network
CN109376963A (en) * 2018-12-10 2019-02-22 杭州世平信息科技有限公司 A kind of criminal case charge law article unified prediction neural network based
CN109840279A (en) * 2019-01-10 2019-06-04 山东亿云信息技术有限公司 File classification method based on convolution loop neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JOHNSON, RIE等: "Deep pyramid convolutional neural networks for text categorization", 《PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
卢誉声: "《移动平台深度神经网络实战原理、架构与优化》", 31 January 2020, 机械工业出版社 *
胥桂仙: "《文本分类技术研究》", 30 June 2010, 中央民族大学出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101559A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case and criminal name inference method based on machine learning
CN112101559B (en) * 2020-09-04 2023-08-04 中国航天科工集团第二研究院 Case crime name deducing method based on machine learning
US11567812B2 (en) 2020-10-07 2023-01-31 Dropbox, Inc. Utilizing a natural language model to determine a predicted activity event based on a series of sequential tokens
US11853817B2 (en) 2020-10-07 2023-12-26 Dropbox, Inc. Utilizing a natural language model to determine a predicted activity event based on a series of sequential tokens
US11568018B2 (en) 2020-12-22 2023-01-31 Dropbox, Inc. Utilizing machine-learning models to generate identifier embeddings and determine digital connections between digital content items
US12008065B2 (en) 2020-12-22 2024-06-11 Dropbox, Inc. Utilizing machine-learning models to generate identifier embeddings and determine digital connections between digital content items

Similar Documents

Publication Publication Date Title
CN108319666B (en) Power supply service assessment method based on multi-modal public opinion analysis
CN106886580B (en) Image emotion polarity analysis method based on deep learning
CN111310476B (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN110750635B (en) French recommendation method based on joint deep learning model
CN111460100A (en) Criminal legal document and criminal name recommendation method and system
CN111506732B (en) Text multi-level label classification method
CN111125354A (en) Text classification method and device
CN109872162A (en) A kind of air control classifying identification method and system handling customer complaint information
CN113254643B (en) Text classification method and device, electronic equipment and text classification program
CN109325125B (en) Social network rumor detection method based on CNN optimization
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN110910175A (en) Tourist ticket product portrait generation method
CN105975497A (en) Automatic microblog topic recommendation method and device
CN115840812A (en) Method and system for intelligently matching enterprises according to policy text
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN113806547A (en) Deep learning multi-label text classification method based on graph model
CN111177386A (en) Proposal classification method and system
CN113806548A (en) Petition factor extraction method and system based on deep learning model
CN114416979A (en) Text query method, text query equipment and storage medium
CN113051887A (en) Method, system and device for extracting announcement information elements
CN111159405B (en) Irony detection method based on background knowledge
CN112287240A (en) Case microblog evaluation object extraction method and device based on double-embedded multilayer convolutional neural network
CN114356990A (en) Base named entity recognition system and method based on transfer learning
CN117390198A (en) Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field
CN113220964A (en) Opinion mining method based on short text in network communication field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200728