CN112100212A - Case scenario extraction method based on machine learning and rule matching - Google Patents

Case scenario extraction method based on machine learning and rule matching Download PDF

Info

Publication number
CN112100212A
CN112100212A CN202010920756.3A CN202010920756A CN112100212A CN 112100212 A CN112100212 A CN 112100212A CN 202010920756 A CN202010920756 A CN 202010920756A CN 112100212 A CN112100212 A CN 112100212A
Authority
CN
China
Prior art keywords
text
matching
training
word
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010920756.3A
Other languages
Chinese (zh)
Inventor
梁鸿翔
胡潇
时子威
陈放
颉明明
杨帅
张博羿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Second Research Institute Of Casic
Original Assignee
Second Research Institute Of Casic
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Second Research Institute Of Casic filed Critical Second Research Institute Of Casic
Priority to CN202010920756.3A priority Critical patent/CN112100212A/en
Publication of CN112100212A publication Critical patent/CN112100212A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Technology Law (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a case scenario extraction method based on machine learning and rule matching, wherein the case scenario extraction method comprises the following steps: the keyword matching and regularization matching method comprises the following steps: extracting description sentences containing specified keywords or conforming to the regular expression in the paragraphs of the referee document as features; searching an episode corresponding to the characteristics in a pre-constructed episode library; the deep learning process comprises the following steps: performing word segmentation processing on the text to obtain a word sequence after word segmentation; vectorizing the word sequence after word segmentation to obtain a text vector of the text to be extracted; and inputting the text vector of the text to be extracted into a pre-constructed deep learning extraction model, and obtaining a result according to the output of the extraction model. The invention can extract not only explicit plots with strong interpretability, but also implicit plots with not so strong interpretability. And the analysis accuracy of case facts of low-frequency criminal names is improved by using different deep neural networks for different criminal names.

Description

Case scenario extraction method based on machine learning and rule matching
Technical Field
The invention relates to the electronic technology of legal documents, in particular to a case scenario extraction method based on machine learning and rule matching.
Background
The legal scenario extraction task aims to automatically extract the most important scenarios from the case fact description part in the legal document. On the one hand, help people without legal basis understand important stories; on the other hand, legal references are provided for professional legal personnel. In recent years, China is continuously and deeply promoted to build an intelligent judicial law, and extraction of legal plots is an important link. The key word matching algorithm is utilized to research the plot extraction of cases as early as the last century. There have also been some studies related to legal plot extraction in recent years. With the rapid development of the deep learning technology, some scholars extract case scenarios in legal documents by using a deep neural network, and a good effect is achieved.
The chinese patent "109285094 legal document processing method and device" provides a method for extracting crime keywords in a target legal document and determining the crime episodes of case crime names in the target legal document in a pre-constructed crime database according to the crime keywords.
The Chinese patent '110032721 a referee document pushing method and device' provides a method and device for pushing referee documents by obtaining case plot characteristics through keyword matching and regular expression matching and searching similar characteristics.
Chinese patent '110263323 keyword extraction method and system based on a fence type long-time memory neural network' provides a keyword extraction method and system based on a neural network. : inputting legal text corpora of keywords to be extracted into a text coding model of a neural network to obtain a text semantic feature vector sequence; and inputting the text semantic feature vector sequence into a keyword recognition model to obtain a keyword extraction result.
The keyword matching and regular expression matching method can simply and effectively extract the obvious plot with high confidence coefficient, but the result is easy to make mistakes due to neglecting a plurality of semantic subtleties. But also has a low recall rate due to the expression of the regular expression without keywords and mismatch. On the basis of certain performance, even if a small promotion is required, enormous manpower is required to design a tighter regular expression. The deep learning method can learn some scenes which are difficult to match by regular expressions. But has the disadvantage that a large amount of labeled data is typically required for training. Furthermore, due to the problem of data imbalance, the deep learning method has poor analysis accuracy when dealing with cases of low-frequency guilt names. And the deep learning method lacks a certain interpretability.
The keyword matching and regular expression matching method is simple and efficient, has good interpretability, can extract explicit plots, and has the defects of low recall rate and labor consumption. The deep learning technology can extract some implicit plots through some implicit expressions, but has poor interpretability, needs a large amount of data, and has low analysis accuracy and coverage rate on case facts of some low-frequency crime names.
Disclosure of Invention
The invention aims to provide a case plot extraction method based on machine learning and rule matching, which is used for solving the problem that the deep learning method has poor analysis accuracy when processing cases of low-frequency criminal names by using different deep neural networks for different criminal names.
The invention relates to a case scenario extraction method based on machine learning and rule matching, which comprises the following steps: the keyword matching and regularization matching method comprises the following steps: extracting description sentences containing specified keywords or conforming to the regular expression in the paragraphs of the referee document as features; searching an episode corresponding to the characteristics in a pre-constructed episode library; the deep learning process comprises the following steps: performing word segmentation processing on the text to obtain a word sequence after word segmentation; vectorizing the word sequence after word segmentation to obtain a text vector of the text to be extracted; and inputting the text vector of the text to be extracted into a pre-constructed deep learning extraction model, and obtaining a result according to the output of the extraction model.
According to an embodiment of the case scenario extraction method based on machine learning and rule matching, the method further comprises the following steps: an episode base is constructed in advance.
According to an embodiment of the case scenario extraction method based on machine learning and rule matching, the method further comprises the following steps: a deep learning extraction model is constructed in advance: collecting official documents aiming at different crimes; cleaning data of the referee document, and dividing and extracting a part of case fact description according to keywords; manually calibrating the plot corresponding to the case fact; and (5) training the model.
According to an embodiment of the case scenario extraction method based on machine learning and rule matching, the pre-constructing scenario library comprises the following steps: (1) determining general plots and exclusive plots of various criminal names; (2) making regular expressions and matching rules for the determined universal plots and the special plots of the various criminal names; (3) and testing the various criminal names by using massive actual cases, and modifying the regular expression and the matching rule according to the test result.
According to an embodiment of the case scenario extraction method based on machine learning and rule matching, the method for constructing the deep learning extraction model further comprises the following steps: the method comprises the steps of dividing a referee document according to the names of the crimes, dividing the referee document with each name of the crimes into a training set, a testing set and a development set according to a certain proportion, wherein the training set is used for training a model, the development set is used for adjusting model parameters, and the testing set is used for finally evaluating the performance of the model.
According to an embodiment of the case scenario extraction method based on machine learning and rule matching, the training model comprises: an input layer, a hidden layer, and an output layer, wherein: an input layer: inputting a word vector two-dimensional matrix of a training text; order to
Figure BDA0002666656860000031
Representing a k-dimensional word vector corresponding to the ith word in a sentence, the sentence of length n is represented as:
Figure BDA0002666656860000032
Figure BDA0002666656860000033
wherein
Figure BDA0002666656860000034
Is a concatenation operator; hiding the layer: the system comprises a convolutional neural network, a word vector, a maximum pool and a vector matrix, wherein the convolutional neural network is used for abstracting a text input vector matrix to obtain deeper text information, extracting features for classification by using a binary classification task with plot extraction regarded as different criminals, performing convolution operation on the word vector by using convolution kernels with different sizes, and splicing the obtained features together to obtain final features through maximum pool pooling; an output layer: and (3) passing the obtained feature vector through one or more full-connection layers and activation function layers, and then passing through a Sigmoid activation function to obtain a text-based feature prediction classification result.
According to one embodiment of the case scenario extraction method based on machine learning and rule matching, the experiment is set, a jieba word segmentation component is used for Chinese word segmentation in the experiment, and a pre-training word vector of an Tencent AI laboratory is used; the experimental hidden layer adopts a convolutional neural network, the convolutional neural network uses convolutional kernels with the window sizes of 1, 2, 3 and 4, each convolutional kernel has 64, the output layer adopts a structure of two linear layers, and the characteristic size is SfThe size of the hidden layer is ShThe number of labels is SlFirst by Sf×ShOf (2) isLayer, then by Tanh function:
Tanh(x)=(ex-e-x)/(ex+e-x)
by Sh×SlIn which the layer size S is hiddenh256 and linear layer dropout, probability 0.5; the learning rate for the training was set to 0.001, the training used Adam optimizer and BCELoss as loss function and Sigmoid activation function.
The case plot extraction method based on machine learning and rule matching can extract not only explicit plots with strong interpretability, but also some implicit plots with not so strong interpretability. And the analysis accuracy of case facts of low-frequency criminal names is improved by using different deep neural networks for different criminal names.
Drawings
FIG. 1 is a main flow chart of a case scenario extraction method based on machine learning and rule matching according to the present invention;
FIG. 2 is a diagram of the steps of pre-building a deep learning extraction model;
fig. 3 is a specific topology structure diagram of the deep learning extraction model.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
Fig. 1 is a main flow chart of a case scenario extraction method based on machine learning and rule matching according to the present invention, and as shown in fig. 1, the method for extracting a scenario according to the present invention includes two modules after a text of a scenario to be extracted is input, one of which is a keyword matching and regularization matching flow on the left side of fig. 1 and a deep learning flow on the right side of fig. 1.
The keyword matching and regularization matching process comprises the following steps:
(1) in a paragraph of the referee document, only the keyword or the descriptive sentence conforming to the regular expression is included and extracted as the feature.
(2) And searching the plot corresponding to the characteristic in a pre-constructed plot library.
For example, the expression "(not | not) {0,4} antecedent" has a feature corresponding to the scenario "non-criminal antecedent".
The deep learning process comprises the following steps:
(1) and performing word segmentation processing on the text to obtain a word sequence after word segmentation.
(2) And vectorizing the word sequence after word segmentation to obtain a text vector of the text to be extracted.
(3) And inputting the text vector of the text to be extracted into a pre-constructed deep learning extraction model, and obtaining a result according to the output of the extraction model.
In order to make the technical field of the embodiment of the invention better understand, the embodiment of the invention is further described in detail below.
The method comprises the following steps of constructing an episode base in advance according to the criminal instruction opinions in various regions of the people's republic of China criminal law and the criminal action law of the people related to law, wherein the method comprises the following steps:
(1) the related people of law determine the universal plot and the exclusive plot of each criminal name according to the criminal instruction opinions in various regions of the criminal law of the people's republic of China and the criminal action law of the people's republic of China.
(2) Under the guidance of law-related people, regular expressions and matching rules are formulated for the determined general plots and the special plots of the various criminal names.
(3) And testing the various criminal names by using massive actual cases, and modifying the regular expressions and the matching rules under the guidance of persons related to law.
Fig. 2 is a diagram of steps for constructing a deep learning extraction model in advance, and the steps shown in fig. 2 are as follows:
(1) a large number of official documents for different names of crimes are collected. A total of 2426402 official documents were collected all over the country since 2000 years.
(2) And (4) cleaning data of the referee document, and dividing and extracting a part of the case fact description according to the keywords. 889690 parts of official documents remained after washing.
(3) The corresponding plot of the case fact is manually calibrated according to the criminal law of the people's republic of China and the criminal action of the people's republic of China in various regions of the sentencing guidance opinions.
(4) The referee documents are divided according to the names of the crimes, and a training set, a testing set and a development set are divided according to a certain proportion for the referee documents with the names of the crimes. The training set is used for training the model, the development set is used for adjusting the model parameters, and the test set is used for finally evaluating the model performance.
(5) And (5) training the model.
The specific topology of the deep learning extraction model is shown in fig. 3, and the model includes: input layer, hidden layer, output layer, wherein:
an input layer: the input is a two-dimensional matrix of word vectors of training text, for example, if the maximum length of the text is defined as 500 words, and the dimension of each word vector is set to 200, then the input should be a two-dimensional matrix of 500 × 200.
Order to
Figure BDA0002666656860000061
Representing a k-dimensional word vector corresponding to the ith word in a sentence, a sentence of length n is represented as
Figure BDA0002666656860000062
Wherein
Figure BDA0002666656860000063
Is the concatenation operator.
Hiding the layer: the method is mainly used for abstracting the text input vector matrix to obtain deeper text information. The conventional Convolutional Neural Network (CNN) or long-short term memory network (LSTM) can be used as the layer, and the Convolutional Neural Network (CNN) is taken as an example, the convolutional neural network is a deep feedforward artificial neural network and achieves remarkable results in the aspects of computer vision and speech recognition. Two classification tasks that treat episode extraction as different names of guilties use CNN to extract features for classification. And performing convolution operation on the word vectors by using convolution kernels with different sizes, performing maximum pooling, and splicing the obtained features together to obtain the final features.
In particular, for CNN, a convolution kernel
Figure BDA0002666656860000064
Generating features for h words, e.g. from the word xi:i+h-1Window generation feature ci
ci=f(w·xi:i+h-1+b)
Wherein
Figure BDA0002666656860000073
Is a bias term, f is a non-linear function like a linear rectification function (relu (x) max (0, x)), for a sentence { x1:h,x2:h+1,…xn-h+1:nGeneration of feature maps by convolution with a convolution kernel
c=[c1,c2,…cn-h+1]
Wherein
Figure BDA0002666656860000071
Each feature map is considered to capture the most important features and may process sentences of variable length. Maximum pooling of features
Figure BDA0002666656860000072
An output layer: and passing the obtained feature vector through one or more full connection layers and an activation function layer. And activating a function through Sigmoid:
Sigmoid(x)=σ(x)=(1+e-x)-1
and obtaining a text feature prediction based classification result.
The linear layer input is c and the output is y, which can be expressed as y ═ cAT+ b. Where b is a deviation term.
The model is lightweight, does not occupy excessive time cost, and has good robustness.
(6) Experimental setup
The experiment uses the jieba word segmentation component for Chinese word segmentation and uses the 200-dimensional pre-training word vector of Tencent AI laboratories.
The experimental hidden layer used CNN, which used convolution kernels with window sizes of 1, 2, 3, and 4, with 64 convolution kernels per convolution kernel. The output layer adopts a structure of two linear layers, and the characteristic size is SfThe size of the hidden layer is ShThe number of labels is Sl. First through Sf×ShAnd then through the Tanh function:
Tanh(x)=(ex-e-x)/(ex+e-x)
finally, pass through Sh×SlThe linear layer of (2). Wherein the hidden layer size Sh256. And the linear layer sets dropout with a probability of 0.5.
The training batch size is set to 100, the maximum epoch of the training is set to 50, the training is stopped using the early-stop mechanism, and the training is stopped when f1score is no longer elevated on the development set after 10 epochs, where f1 score:
f1score=2*(precision*recall)/(precision+recall)
the learning rate for training was set to 0.001. Training uses an Adam optimizer and BCELoss as a loss function, which can be expressed as ln=-wn[yn·logxn+(1-yn)·log(1-xn)]Wherein y isnIs true value label, xnIs output by the network. To ensure xnFor numbers between 0 and 1, Sigmoid activation function is used.
According to the invention, by integrating keyword matching and regular expression matching extraction scenarios and deep learning extraction scenarios, not only can explicit scenarios with strong interpretability be extracted, but also some implicit scenarios with not so strong interpretability can be extracted. The invention pre-constructs a deep learning extraction model and an episode base help system to extract episodes.
Compared with the prior art, the technical scheme provided by the invention integrates the advantages of the keyword matching and regular expression matching and the deep learning plot extraction method. Not only can explicit plots with strong interpretability be extracted, but also some implicit plots with not so strong interpretability can be extracted. Meanwhile, different deep neural networks are used for different criminal names, so that the problem of low analysis accuracy of low-frequency criminal names is solved. From the experimental results, the extraction effect is good after the method is adopted. The requirement of plot extraction is basically met. The invention is simple to realize, effectively extracts the plot and meets the application requirement.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (7)

1. A case scenario extraction method based on machine learning and rule matching is characterized by comprising the following steps:
the keyword matching and regularization matching method comprises the following steps:
extracting description sentences containing specified keywords or conforming to the regular expression in the paragraphs of the referee document as features;
searching an episode corresponding to the characteristics in a pre-constructed episode library;
the deep learning process comprises the following steps:
performing word segmentation processing on the text to obtain a word sequence after word segmentation;
vectorizing the word sequence after word segmentation to obtain a text vector of the text to be extracted;
and inputting the text vector of the text to be extracted into a pre-constructed deep learning extraction model, and obtaining a result according to the output of the extraction model.
2. The case scenario extraction method based on machine learning and rule matching as claimed in claim 1, further comprising: an episode base is constructed in advance.
3. The case scenario extraction method based on machine learning and rule matching as claimed in claim 1, further comprising: a deep learning extraction model is constructed in advance:
collecting official documents aiming at different crimes;
cleaning data of the referee document, and dividing and extracting a part of case fact description according to keywords;
manually calibrating the plot corresponding to the case fact;
and (5) training the model.
4. The case scenario extraction method based on machine learning and rule matching as claimed in claim 1, wherein pre-constructing a scenario library comprises:
(1) determining general plots and exclusive plots of various criminal names;
(2) making regular expressions and matching rules for the determined universal plots and the special plots of the various criminal names;
(3) and testing the various criminal names by using massive actual cases, and modifying the regular expression and the matching rule according to the test result.
5. The case scenario extraction method based on machine learning and rule matching as claimed in claim 3, wherein the building of deep learning extraction model further comprises:
the method comprises the steps of dividing a referee document according to the names of the crimes, dividing the referee document with each name of the crimes into a training set, a testing set and a development set according to a certain proportion, wherein the training set is used for training a model, the development set is used for adjusting model parameters, and the testing set is used for finally evaluating the performance of the model.
6. The case scenario extraction method based on machine learning and rule matching as claimed in claim 3, wherein training the model comprises: an input layer, a hidden layer, and an output layer, wherein:
an input layer: inputting a word vector two-dimensional matrix of a training text;
order to
Figure FDA0002666656850000021
The representation corresponds to a sentenceThe k-dimensional word vector of the ith word and the sentence with the length of n are expressed as
Figure FDA0002666656850000022
Wherein
Figure FDA0002666656850000023
Is a concatenation operator;
hiding the layer: the system comprises a convolutional neural network, a word vector, a maximum pool and a vector matrix, wherein the convolutional neural network is used for abstracting a text input vector matrix to obtain deeper text information, extracting features for classification by using a binary classification task with plot extraction regarded as different criminals, performing convolution operation on the word vector by using convolution kernels with different sizes, and splicing the obtained features together to obtain final features through maximum pool pooling;
an output layer: and (3) passing the obtained feature vector through one or more full-connection layers and activation function layers, and then passing through a Sigmoid activation function to obtain a text-based feature prediction classification result.
7. The case scenario extraction method based on machine learning and rule matching as claimed in claim 3, further comprising: setting an experiment, wherein the experiment uses a jieba word segmentation component to perform Chinese word segmentation and uses a pre-training word vector of an Tencent AI laboratory;
the experimental hidden layer adopts a convolutional neural network, the convolutional neural network uses convolutional kernels with the window sizes of 1, 2, 3 and 4, each convolutional kernel has 64, the output layer adopts a structure of two linear layers, and the characteristic size is SfThe size of the hidden layer is ShThe number of labels is SlFirst by Sf×ShAnd then through the Tanh function:
Tanh(x)=(ex-e-x)/(ex+e-x)
by Sh×SlIn which the layer size S is hiddenh256 and linear layer dropout, probability 0.5;
the learning rate for the training was set to 0.001, the training used Adam optimizer and BCELoss as loss function and Sigmoid activation function.
CN202010920756.3A 2020-09-04 2020-09-04 Case scenario extraction method based on machine learning and rule matching Pending CN112100212A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010920756.3A CN112100212A (en) 2020-09-04 2020-09-04 Case scenario extraction method based on machine learning and rule matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010920756.3A CN112100212A (en) 2020-09-04 2020-09-04 Case scenario extraction method based on machine learning and rule matching

Publications (1)

Publication Number Publication Date
CN112100212A true CN112100212A (en) 2020-12-18

Family

ID=73757730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010920756.3A Pending CN112100212A (en) 2020-09-04 2020-09-04 Case scenario extraction method based on machine learning and rule matching

Country Status (1)

Country Link
CN (1) CN112100212A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784578A (en) * 2021-03-16 2021-05-11 北京华宇元典信息服务有限公司 Legal element extraction method and device and electronic equipment
CN113378563A (en) * 2021-02-05 2021-09-10 中国司法大数据研究院有限公司 Case feature extraction method and device based on genetic variation, semi-supervision and reinforcement learning
CN114611486A (en) * 2022-03-09 2022-06-10 上海弘玑信息技术有限公司 Information extraction engine generation method and device and electronic equipment
CN115687632A (en) * 2022-08-25 2023-02-03 中国司法大数据研究院有限公司 Criminal measuring plot decomposition analysis method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107894981A (en) * 2017-12-13 2018-04-10 武汉烽火普天信息技术有限公司 A kind of automatic abstracting method of case semantic feature
CN107918921A (en) * 2017-11-21 2018-04-17 南京擎盾信息科技有限公司 Criminal case court verdict measure and system
CN108009284A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Using the Law Text sorting technique of semi-supervised convolutional neural networks
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device
CN110991694A (en) * 2019-10-30 2020-04-10 南京大学 Sentencing prediction method based on deep learning
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111198953A (en) * 2018-11-16 2020-05-26 北京智慧正安科技有限公司 Case text information based method and system for recommending cases and computer readable storage medium
CN111476027A (en) * 2020-04-07 2020-07-31 南京森林警察学院 Big data based anti-smuggling case information extraction method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107918921A (en) * 2017-11-21 2018-04-17 南京擎盾信息科技有限公司 Criminal case court verdict measure and system
CN107894981A (en) * 2017-12-13 2018-04-10 武汉烽火普天信息技术有限公司 A kind of automatic abstracting method of case semantic feature
CN108009284A (en) * 2017-12-22 2018-05-08 重庆邮电大学 Using the Law Text sorting technique of semi-supervised convolutional neural networks
CN110032721A (en) * 2018-01-11 2019-07-19 北京国双科技有限公司 A kind of judgement document's method for pushing and device
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111198953A (en) * 2018-11-16 2020-05-26 北京智慧正安科技有限公司 Case text information based method and system for recommending cases and computer readable storage medium
CN110276068A (en) * 2019-05-08 2019-09-24 清华大学 Law merit analysis method and device
CN110991694A (en) * 2019-10-30 2020-04-10 南京大学 Sentencing prediction method based on deep learning
CN111476027A (en) * 2020-04-07 2020-07-31 南京森林警察学院 Big data based anti-smuggling case information extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林志宏 等: "基于卷积神经网络的公安案件文本语义特征提取方法研究" *
陈慧炜: "公安领域案件文本信息抽取研究综述" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378563A (en) * 2021-02-05 2021-09-10 中国司法大数据研究院有限公司 Case feature extraction method and device based on genetic variation, semi-supervision and reinforcement learning
CN113378563B (en) * 2021-02-05 2022-05-17 中国司法大数据研究院有限公司 Case feature extraction method and device based on genetic variation and semi-supervision
CN112784578A (en) * 2021-03-16 2021-05-11 北京华宇元典信息服务有限公司 Legal element extraction method and device and electronic equipment
CN114611486A (en) * 2022-03-09 2022-06-10 上海弘玑信息技术有限公司 Information extraction engine generation method and device and electronic equipment
CN114611486B (en) * 2022-03-09 2022-12-16 上海弘玑信息技术有限公司 Method and device for generating information extraction engine and electronic equipment
CN115687632A (en) * 2022-08-25 2023-02-03 中国司法大数据研究院有限公司 Criminal measuring plot decomposition analysis method and system
CN115687632B (en) * 2022-08-25 2024-04-09 中国司法大数据研究院有限公司 Criminal investigation plot decomposition analysis method and system

Similar Documents

Publication Publication Date Title
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
CN112231447B (en) Method and system for extracting Chinese document events
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN109902175A (en) A kind of file classification method and categorizing system based on neural network structure model
CN108595708A (en) A kind of exception information file classification method of knowledge based collection of illustrative plates
CN110287323B (en) Target-oriented emotion classification method
CN112100212A (en) Case scenario extraction method based on machine learning and rule matching
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN103473327A (en) Image retrieval method and image retrieval system
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN110502742B (en) Complex entity extraction method, device, medium and system
CN111191442A (en) Similar problem generation method, device, equipment and medium
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN112256939A (en) Text entity relation extraction method for chemical field
CN113220844A (en) Remote supervision relation extraction method based on entity characteristics
CN114020906A (en) Chinese medical text information matching method and system based on twin neural network
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
CN114417851A (en) Emotion analysis method based on keyword weighted information
CN110659392B (en) Retrieval method and device, and storage medium
CN112329441A (en) Legal document reading model and construction method
CN114881043A (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN114925702A (en) Text similarity recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201218