CN115687632B - Criminal investigation plot decomposition analysis method and system - Google Patents

Criminal investigation plot decomposition analysis method and system Download PDF

Info

Publication number
CN115687632B
CN115687632B CN202211026955.5A CN202211026955A CN115687632B CN 115687632 B CN115687632 B CN 115687632B CN 202211026955 A CN202211026955 A CN 202211026955A CN 115687632 B CN115687632 B CN 115687632B
Authority
CN
China
Prior art keywords
scenario
criminal
sentency
plot
names
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211026955.5A
Other languages
Chinese (zh)
Other versions
CN115687632A (en
Inventor
段智峰
任呈祥
梁新
郭伟登
刘贤艳
谭晓颖
孙晓锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Judicial Big Data Research Institute Co ltd
Original Assignee
China Judicial Big Data Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Judicial Big Data Research Institute Co ltd filed Critical China Judicial Big Data Research Institute Co ltd
Priority to CN202211026955.5A priority Critical patent/CN115687632B/en
Publication of CN115687632A publication Critical patent/CN115687632A/en
Application granted granted Critical
Publication of CN115687632B publication Critical patent/CN115687632B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a criminal investigation and analysis method and a system thereof. The method comprises the following steps: acquiring an original scenario name from a criminal law and a criminal standardized file, and constructing a criminal scenario map; constructing a case scenario tree based on the referee document and the constructed sentencing scenario map; model training is carried out on the sentency based on the case scenario tree, and prediction tasks between the sentency and between the sentency and the sentency result are trained; and using the parameter distribution of the trained model to assist in the sentence analysis. According to the invention, based on the fact that the sentence definition is extracted from the sentence standardization file, the sentence and the judge result are extracted from the sentence judge file, the sentence and the judge result are subjected to joint analysis, and the unbalanced cutting points of the sentence name and the sentence corresponding measurement are automatically identified by utilizing the splitting points of the decision tree, so that better judicial improvement schemes can be built for staff in the legislative and judicial industries and social situation analysis can be assisted.

Description

Criminal investigation plot decomposition analysis method and system
Technical Field
The invention belongs to the field of criminal investigation scientific research and judicial application, and particularly relates to a criminal investigation decomposition analysis and auxiliary criminal investigation standard legislation and intelligent rule criminal investigation processing method and system based on machine learning, decision tree splitting decomposition and plot concept extraction technologies.
Background
With rapid advancement of court informatization, in the process of criminal litigation, the task of processing criminal suspects by including the investigation court and the court in the judging process is more and more, the highest court meets the criminal standardization subject group, the criminal standardization problem is substantially investigated from 2005, on the basis of repeated demonstration and wide hearing of opinions in various circles, two files of the "people court criminal guidance opinion (trial run)" and the "people court criminal program guidance opinion (trial run)" are drafted, and the highest court decides to comprehensively try criminal case standardization leather in the national court from 10 months 1, so that criminal publicity and transparency are enhanced by introducing the criminal standardization program into the criminal suggestion. Especially in forensic social development, it is necessary to make a good explanation of the rationality of episodes in the criminal process and to be able to quickly identify and judge the key episodes involved in the criminal facts, while in the description of the key episodes, there are both non-episodes and quantitative criminal episodes, how to measure these episodes accurately and fairly is a very important and very difficult part of the criminal investigation.
The traditional investigation of the criminal standardization is very time-consuming and labor-consuming, under the promotion of the standardized reform of criminal case, corresponding practice rules of the criminal guidance about common crimes are issued by all provinces nationwide at present, the practice rules of all provinces refer to reference mutually, and meanwhile, regional detailed regulations exist, but the case cover of the rules is only a very small part, and accounts for 5% of the total case cover at present.
Based on the mode of big data, the restriction effect of the criminal law of the people's republic of China is utilized, and the example judgment of wide judges is exerted as data assistance, so that the objective rule of the criminal is summarized from practice. By utilizing scientific statistical methods, correlation analysis methods, deep learning and other scientific means, reasonable decomposition points or intervals of the sentency are searched, and reasonable explanation of the sentency and improvement of the sentency standardization guiding opinion legislation are better promoted.
The decision tree splitting decomposition technology is a very important technology for reasoning and causal analysis, and the dependent variable is disassembled according to the method with the lowest optimal contribution degree and global loss, and the decomposition points accord with the mathematical characteristics on statistics, so that the decision tree splitting decomposition technology can be identified by criminal specialists. At present, no processing method for intelligently decomposing the sentencing episodes by adopting a big data mode and assisting in the judicial interpretation of the literacy by cases of the ' handling project ', the ' standardized instruction of sentencing ' and the intelligent rule sentencing ' exists.
Disclosure of Invention
Aiming at the problems, the invention provides a processing method and a processing system for the rule sentency based on machine learning, decision tree splitting decomposition and plot concept extraction technology and assisting in the rule sentency standard legislation and the intelligent rule sentency.
The invention solves the problems through the following technical scheme:
in a first aspect, the present invention provides a method for criminal investigation analysis, which is a method for processing criminal investigation based on machine learning, decision tree splitting decomposition and plot concept extraction techniques, and auxiliary criminal investigation standardization legislation and intelligent rule criminal investigation, comprising the steps of:
acquiring an original scenario name from a criminal law and a criminal standardized file, and constructing a criminal scenario map;
constructing a case scenario tree based on the referee document and the constructed sentencing scenario map;
model training is carried out on the sentency based on the case scenario tree, and prediction tasks between the sentency and between the sentency and the sentency result are trained;
and using the parameter distribution of the trained model to assist in the sentence analysis.
Further, the obtaining of the original scenario name from the criminal law and the criminal standardization file, the construction of the criminal scenario map, includes:
acquiring an original scenario name from criminal law and sentencing standardization files;
normalizing the original scenario names extracted from each data source to form universal candidate standard scenario names;
acquiring an association relation between candidate standard plot names, and forming a candidate standard plot name undirected graph with the candidate standard plot names as vertexes and relationship categories as edges;
clustering the candidate standard plot name undirected graph, and generating recommended candidate standard plot names from each clustering center;
manually confirming the recommended candidate standard scenario names to form standard scenario names;
classifying the tree of the standard scenario names with the upper and lower relationships to form the standard scenario names with the tree structure;
and correlating the original scenario names with the standard scenario names of the tree structure to form the sentencing scenario map.
Further, the obtaining the association relationship between the candidate standard scenario names includes: and carrying out similarity calculation on candidate standard plot names, considering that plot names exceeding a set similarity threshold are aligned, forming a 'quasi-alignment relation' between the candidate standard plot names and the plot names, and simultaneously acquiring the association relation between the candidate standard plot names according to the positions of the sections and sentences of the original normalized document before each candidate standard plot name, wherein the association relation comprises but is not limited to the relation types of front-back co-occurrence relation, parallel relation under the same constraint, quotation relation, son-father relation and the like.
Further, the clustering the candidate standard scenario name undirected graph includes: and clustering the candidate standard scenario names in the candidate standard scenario name undirected graph by adopting related field algorithms such as graph clustering, community discovery and the like.
Further, the recommended candidate standard scenario names are confirmed through manual assistance to form standard scenario names, so that hundreds of standard scenario names which need important consideration in the actual criminal investigation process can be effectively selected from tens of thousands of candidate standard scenario names; the standard scenario names confirmed by legal specialists can be used as unified term standard standards for sentencing, and the standard scenario names normalized to be consistent in all provinces and regions throughout the country can be realized.
Further, the tree classification is performed on the standard scenario names with the upper-lower relationship, for example, the standard scenario names of 'less than fourteen', 'less than sixteen full-fourteen', 'less than sixteen full eighteen', and the upper classification is 'underage'.
Further, constructing a case scenario tree based on the referee document and the constructed sentencing plot, comprising:
preprocessing a referee document, calling a structuring engine, and identifying important plot areas in the referee document;
calling a scenario extraction engine to call knowledge of a sentencing scenario map for an important scenario area and extracting a sentencing scenario of a case;
extracting the judge result, and taking the judge result as a special case scenario;
all episodes (sentencing episodes) are combined to form a case episode tree.
Further, the training of the model for the sentency is to train the model for the sentency by adopting a machine learning and deep learning method, wherein the model is a common model and model combination, including, but not limited to LightGBM, XGBoost, transform, MLP, CNN and the like; the training process comprises the following steps:
expanding, deforming, converting and the like all the case scenario trees according to the input requirements of the training model;
different reasoning tasks are set, and prediction tasks between the sentency and between the sentency and the case scenario are trained, wherein the prediction tasks comprise a multi-task combined training mode, and combined prediction is realized on the sentency, the sentency result, the finding facts in the referee document and the court opinions in the referee document.
Further, the training of the parameter distribution of the model is used to assist in the analysis of the sentency, which is a task model with remarkable training effect, analyzes the network of the task model, and assists in the analysis of the sentency from the parameter distribution, comprising:
extracting a Tree model of an Estimator from the LightGBM-Regress model, and performing scenario feature correlation ranking (Importance);
the data sorted by the relevance can analyze the effective scenario and the ineffective scenario related to the judge result;
using a histogram to show the distribution of decision hits for groupings of values (e.g. "less than fourteen" for "minor", "less than sixteen" for "full sixteen", "less than eighteen") for each criminal scenario when doing task reasoning;
feature_names, threshold, cat_boundaries, cat_threshold in model parameters of the decision tree are analyzed, and in quantitative type sentry, split points are determined. For example: "woman (man)", split point [0,3,8], "theft amount (Yuan)", split point [0,3000,30000,400000], etc. Through split point analysis, a popular judgment interval for the theft amount (element) when the actual judge is obtained according to a mass judge document, thereby assisting in the formation or correction of judicial interpretations and assisting in the formation of normalized guidance opinions of new criminal names.
The above method steps are not limited to machine learning algorithm models with decision tree features, such as LightGBM, XGboost, but also include other types of deep learning models that can produce feature value splits or groupings of candidate features.
Further, the following steps are adopted to carry out comparative analysis on the sentencing plots:
analyzing the relativity degree of the annualized sentry plots and the referee results, observing and analyzing the importance fluctuation change condition of the sentry plots in the past years, and reflecting the change trend of the sentry plots in the social and economic development process and the change influence of the sentry plots on the criminal period;
analyzing the criminal plot-assault model, and analyzing the effect of punishment quantity brought by the criminal result by using the interval length of assaults and reissues, for example, decomposing into five-year reissues, two-year reissues and one-year reissues to judge whether obvious frightening effect can be achieved;
analyzing the occurrence rate of the sentency of the past year, observing and inducing the role of the sentency in the social management process, for example, decomposing "positive compensation" into "positive compensation all", "positive compensation most", "positive compensation part", wherein the proportion of the "positive compensation all" rises year by year;
and (5) analyzing the sentency name term standardization by using the sentency map.
In a second aspect, the present invention provides a system for criminal investigation and analysis of a scenario, comprising:
the sentencing plot construction module is used for acquiring an original sentence name from a criminal law and sentence standardization file and constructing a sentence plot;
the case scenario tree construction module is used for constructing a case scenario tree based on the judge document and the constructed sentency plot;
the combined training module is used for carrying out model training on the sentency based on the case scenario tree, and training prediction tasks between the sentency and between the sentency and the sentency result;
and the sentencing plot analysis module is used for assisting in the sentencing plot analysis by utilizing the parameter distribution of the trained model.
Compared with the prior art, the invention has the advantages that:
1) The invention can effectively solve the problem that the sentenks used in the standardized sentenks in the national process are similar but are different in description, and can quickly standardize the sentenks which are different in description into uniform professional terms by using a graph clustering technology and a similarity technology, thereby greatly reducing the difficulty and the workload of the legal expert in the carding process.
2) According to the invention, after the model is learned by means of algorithms of deep learning and machine learning, the influence condition of each standard sentency is explained by reversely utilizing the parameters of the model, so that the non-uniform sentency scale and standard of each place are rapidly verified, the situation that standardized sentency comments are different in thickness and scale in each place is avoided, and a basis is provided for subsequent improvement and perfection of comments.
Drawings
FIG. 1 is a diagram showing normalized instruction opinions of the sentence in various provinces throughout the country.
FIG. 2 is a block diagram of a sentencing plot analysis flow.
FIG. 3 is a flow chart of the construction of a sentencing plot.
Fig. 4 is a flow chart of the sentencing scenario multitasking process.
Fig. 5 is a flow chart of the sentencing plot training and feature split point analysis.
Fig. 6 is a schematic diagram of feature split points.
Fig. 7 is a flow chart of the sentence analysis.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions and specific implementations of the present invention will be further described in detail below with reference to the accompanying drawings.
As shown in fig. 2, the present invention mainly includes the following modules:
1. sentencing plot concept processing module: is used for establishing a sentencing plot.
2. Judge document element extraction module: is used for building a case scenario tree based on the sentencing scenario map.
3. And a joint training module: the system is used for carrying out multi-task combined training and prediction based on the case scenario tree.
4. Sentencing plot analysis and reporting module: the method is used for conducting the sentency analysis and forming a comprehensive analysis report of the sentency.
The invention mainly adopts the following technology:
1. sentencing plot construction technology
The method for constructing the sentencing plot comprises the following steps:
as in step "2.1" in fig. 3, the original story name is obtained from criminal law and the referee document in the normalized document of the sentence listed in fig. 1.
In step 3.1 of fig. 3, normalization is performed on the original scenario names extracted from each data source to form generalized candidate standard scenario names.
The candidate standard scenario names are calculated through similarity, and the candidate standard scenario names exceeding the set similarity threshold value are considered to be aligned, so that a 'quasi-alignment relation' is formed between the candidate standard scenario names; and simultaneously, according to the relation types such as the co-occurrence relation, the parallel relation, the quotation relation, the father relation and the like before and after each candidate standard scenario name is obtained from paragraphs and sentences of the original standard scenario normalized file, and a candidate standard scenario name undirected graph taking the candidate standard scenario name as a vertex and the relation class as an edge is formed.
And (3) in step 4.1 in FIG. 3, clustering the candidate standard plot names by adopting related field algorithms such as graph clustering, community discovery and the like in the candidate standard plot name undirected graph, and generating recommended candidate standard plot names for manual confirmation from each clustering center.
In step 5.1 in fig. 3, the recommended candidate standard scenario names are manually confirmed to form standard scenario names, and the step can effectively select hundreds of standard scenario names which need important consideration in the actual investigation process from tens of thousands of candidate standard scenario names. The standard scenario names confirmed by legal specialists can be used as unified term standard standards for sentencing, and the standard scenario names normalized to be consistent in all provinces and regions throughout the country can be realized.
In step 6.1 of fig. 3, tree classification is performed on the upper and lower relationships of the standard scenario names to form the sentencing scenario names with tree structures. For example, the standard scenario names "less than fourteen", "less than sixteen to fourteen", "less than eighteen to sixteen", are classified as "underage" in the upper sense.
In step 7.1 of fig. 3, the original scenario name is correlated with the sentency name of the tree structure to form the sentency map.
2. Sentencing plot extraction technique
Through the sentencing pattern, the sentencing of each referee document is extracted by a sentencing extraction engine, comprising the following steps:
the scenario extraction engine invokes knowledge of the sentencing scenario atlas; initializing a scenario extraction engine;
in step 2.1 of fig. 4, the referee document is preprocessed, and the structuring engine is invoked to identify important story areas in the referee document. The structuring engine refers to a comprehensive structuring engine technology, and comprises a set of compound technologies of syntax, lexicon, regularization, structure definition, word segmentation, part-of-speech tagging, relation extraction, hierarchical decomposition and the like, text in a sequence format is converted into a data structure in a tree form and a reference format, an output format can be output as XML and Json, and the reference relation can be connected by an ID or entity name.
As in step "3.1" in fig. 4, the scenario extraction engine is invoked to extract the case sentency for the important scenario area.
As in step 3.2 of fig. 4, the referee result is extracted, and the referee result is taken as a special case scenario.
As in step "4.1" of FIG. 4, all episodes (sentencing episodes, case episodes) are combined to form a case episode tree.
3. Congratulation scenario and congratulation period combined training prediction technology
The criminal scenario is model trained by adopting a machine learning and deep learning method, and the model is a common model and model combination, including, but not limited to LightGBM, XGBoost, transform, MLP, CNN and the like.
In step 2.1 in fig. 5, all the case scenario trees are expanded, deformed, converted and the like according to the input requirements of the training model. The OneHot-Label in the step "2.1" in FIG. 5 refers to a plot feature Label, such as Zhang Sanzhu case, the basic plot includes three items of "full sixteen is not full eighteen", "gun holder", and "self-head", 58 plot factors to be considered in the robbed case are the plot labels corresponding to the three items are 1, and the labels of the rest plots are all 0 for model training of LightGBM.
As in step "3.1" in fig. 5, different reasoning tasks are set, training the prediction tasks between the sentency and between the sentency and the case scenario, such as training "from first, holding, entering home, theft, scenario reasoning of 25000 yuan" for 3 years and 2 months of the hiking "," from first, and the immediate "reasoning" for alleviation ".
And (3) automatically optimizing the training model by using the Flaml (a high-efficiency lightweight automatic machine learning framework) to obtain the optimal model effect.
By adopting multi-task combined training, the combined prediction is realized on the criminal plot, the criminal result, the finding facts in the referee document and the court opinions in the referee document, the criminal result is predicted by the court opinions (seq) +the finding facts in the referee document (label), and the criminal plot (label) is predicted by the court opinions (seq) +the finding facts in the referee document.
4. Model parameter characteristic analysis technology
And analyzing a network of task models for the task models with remarkable training effect, and assisting the sentency analysis from parameter distribution.
Step "4.1", "5.1" in FIG. 5, extracting the Tree model of Estimator from the LightGBM-Regress model, and performing relevance ranking (relativity) on the scenario features. Where "scenario feature" refers to a scenario feature factor that affects an increase or decrease in the criminal period, such as "whether to enter a room" in a theft crime.
Then, the valid scenario and the invalid scenario concerning the referee result can be analyzed by the feature ranking data.
As in step "6.1" of fig. 5, the distribution of the value groupings at the time of task reasoning for each sentence is shown using a histogram.
As in step "6.2" in fig. 5, the split points in the decision tree are extracted, the feature_ names, threshold, cat _ boundaries, cat _threshold of the decision tree model is analyzed, and in the quantitative criminal scenario, the split points are determined to be divided, for example: "women (people)", dividing points exist [0,3,8], "theft amount (element)", dividing points are [0,3000,30000,400000], individuals illegally absorb public deposit (element), dividing points are [0,200000,1000000] in eastern regions, and dividing points are [0,200000,1000000] in western regions. Through split point analysis, a popular judgment section of the theft amount (element) and possible value judgment differences in different areas during actual judge of judges can be obtained from a large number of judge documents. Thereby assisting the formation or correction of judicial interpretation and the formation of normalized instruction opinions of new criminal names. As shown in fig. 6, the number of dead people of the traffic culprit is 2.5 as a digital cut point, i.e. < = 2.5, and the upward rounding is 3 people as a split point affecting the quality change of the starting point of the culprit. FIG. 6 shows a split visualization of feature values in a decision tree, where values less than or equal to the split point continue to judge to the left, and if greater than, continue to judge according to the right rule. In an actual traffic culprit case, the number of dead people can be more than 0 to 100, but the criminal period is obviously aggravated by more than 100 people, namely, the quantity change is changed into quality change, and the point of the quality change, namely, the splitting point, can be observed through the visual technology. The upper right "forgiveness" in fig. 6 indicates the scenario name, which indicates whether a criminal gets a understanding of an invaded person or its family members, and if "forgiveness" is taken, the criminal period is appropriately alleviated. Also related to economic crimes, the crime amount is a continuous array, and the observation of quality changes through split points can be divided into several intervals.
The above method steps are not limited to machine learning algorithm models with decision tree features, such as LightGBM, XGboost, but also include other types of deep learning models that can produce feature value splits or groupings of candidate features.
5. Sentencing plot analysis technique
The sentency analysis mainly analyzes the distribution of the sentency, the sentency and the judge result, the basic condition of the sentency and the principal, the sentency and the social and economic condition, the change of the number of the sentency history cases and the like.
The following steps are adopted to carry out comparative analysis on the sentencing plots:
in the step 2.1 in fig. 7, the degree of correlation between the annualized sentry and the referee result is analyzed, the fluctuation change condition of the importance of the sentry in the past year is observed and analyzed, the change trend of the sentry in the social and economic development process is reflected, and the change influence of the sentry on the criminal period is reflected.
In step "2.2" of fig. 7, the criminal scenario-offence model is analyzed, and the effect analysis is performed on the punishment amount caused by the criminal result by using the interval length of offence and reissue, for example, decomposing into five-year reissue, two-year reissue and one-year reissue, so as to judge whether the obvious frightening effect can be achieved.
As in steps "2.3", "3.1" in fig. 7, the occurrence rate of the sentency of the past year is analyzed, and the roles of the sentency in the social management process are observed and generalized, for example, "positive reimbursement" is decomposed into "positive reimbursement complete", "positive reimbursement majority", "positive reimbursement part", in which the proportion of "positive reimbursement complete" rises year by year.
The term standardization of the sentency names is analyzed by using the sentency map, for example, "fraud relief, rescue, flood prevention, poverty-relief, medical money", "fraud relief, rescue, flood prevention, poverty-relief, immigrants, epidemic prevention and medical money" can be comprehensively changed into "rescue, relief, epidemic prevention, flood prevention, pacifying, poverty-relief, immigrants and medical money", and the term standardization of each province is improved.
As in step "4.1" of FIG. 7, a report of the comprehensive analysis of the sentence is formed.
In the invention, the extraction of the candidate scenario names can be established by using a concept entity extraction method of natural language processing.
In the invention, the joint multitasking training model can adopt other implementation schemes such as a distributed training model and the like.
In the invention, the sentence characteristic analysis can be other scientific analysis methods of correlation analysis and causal analysis.
Based on the same inventive concept, another embodiment of the present invention provides a system for criminal analysis of a scenario, comprising:
the sentencing plot construction module is used for acquiring an original sentence name from a criminal law and sentence standardization file and constructing a sentence plot;
the case scenario tree construction module is used for constructing a case scenario tree based on the judge document and the constructed sentency plot;
the combined training module is used for carrying out model training on the sentency based on the case scenario tree, and training prediction tasks between the sentency and between the sentency and the sentency result;
and the sentencing plot analysis module is used for assisting in the sentencing plot analysis by utilizing the parameter distribution of the trained model.
Wherein the specific implementation of each module is referred to the previous description of the method of the present invention.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smart phone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps in the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program which, when executed by a computer, implements the steps of the inventive method.
Parts of the invention not described in detail are known to those skilled in the art.
The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims (6)

1. A method of criminal investigation and analysis of a decomposition of a plot, comprising the steps of:
acquiring an original scenario name from a criminal law and a criminal standardized file, and constructing a criminal scenario map;
constructing a case scenario tree based on the referee document and the constructed sentencing scenario map;
model training is carried out on the sentency based on the case scenario tree, and prediction tasks between the sentency and between the sentency and the sentency result are trained;
the parameter distribution of the model which is completed through training is utilized to assist in the criminal investigation;
the method for constructing the sentency map comprises the steps of:
acquiring an original scenario name from criminal law and sentencing standardization files;
normalizing the original scenario names to form candidate standard scenario names;
acquiring an association relation between candidate standard plot names, and forming a candidate standard plot name undirected graph with the candidate standard plot names as vertexes and relationship categories as edges;
clustering the candidate standard plot name undirected graph, and generating recommended candidate standard plot names from each clustering center;
manually confirming the recommended candidate standard scenario names to form standard scenario names;
classifying the tree of the standard scenario names with the upper and lower relationships to form the standard scenario names with the tree structure;
correlating the original scenario names with standard scenario names of the tree structure to form a sentencing scenario map;
the obtaining the association relation between the candidate standard scenario names comprises the following steps:
carrying out similarity calculation on candidate standard scenario names, and considering that the scenario is to be aligned when the candidate standard scenario names exceed a set similarity threshold value, so as to form a quasi-alignment relation between the candidate standard scenario names and the scenario;
acquiring association relations among candidate standard scenario names according to the positions of the candidate standard scenario names in paragraphs and sentences of the original text normalized file, wherein the association relations comprise a front-back co-occurrence relation, a parallel relation under the same constraint, a quotation relation and a child-father relation;
the case scenario tree is constructed based on the referee document and the constructed sentencing scenario map, and the case scenario tree comprises:
preprocessing a referee document, calling a structuring engine, and identifying important plot areas in the referee document;
calling a scenario extraction engine to call knowledge of a sentencing scenario map for an important scenario area and extracting a sentencing scenario of a case;
extracting the judge result, and taking the judge result as a special case scenario;
combining all the sentencing episodes and case episodes to form a case episode tree;
the training model parameter distribution is used for assisting the sentencing plot analysis, and the training model parameter distribution comprises the following steps:
sorting relevance of plot features;
analyzing the valid scenario and the invalid scenario related to the judge result through the data of the relevance rank;
the histogram is used for showing the distribution of value groups of each criminal plot when task reasoning is carried out;
the splitting points are determined in the quantitative criminal plots, and the judgment interval when the actual judge is obtained according to the massive judge documents through splitting point analysis, so that the formation or correction of judicial interpretation and the formation of new criminal name and criminal standardization guidance opinions are assisted.
2. The method according to claim 1, wherein the model training of the sentence is performed by a machine learning and deep learning method; the training process comprises the following steps:
expanding, deforming and converting all case scenario trees according to the input requirements of a training model;
the method adopts a multitask combined training mode to realize combined prediction on the investigation facts in the criminal scenario, the criminal result and the referee document and the court opinions in the referee document, and comprises the following steps: the criminal result is predicted by the court opinions in the referee document, the finding facts in the referee document and the criminal plot, and the criminal plot is predicted by the court opinions in the referee document and the finding facts in the referee document.
3. The method of claim 1, wherein the training of the parameter distribution of the model is used to assist in the sentency analysis, further comprising:
analyzing the relativity degree of the annualized sentry plots and the referee results, observing and analyzing the importance fluctuation change condition of the sentry plots in the past years, and reflecting the change trend of the sentry plots in the social and economic development process and the change influence of the sentry plots on the criminal period;
analyzing the criminal scenario-assault model, and analyzing the effect of punishment quantity brought by the criminal result by using the interval length of assaults and reissues to judge whether an obvious frightening effect can be obtained;
analyzing the occurrence rate of the sentency of the past year, and observing and generalizing the effect of the sentency in the social treatment process;
and (5) analyzing the sentency name term standardization by using the sentency map.
4. A system for criminal investigation analysis using the method of any of claims 1-3, comprising:
the sentencing plot construction module is used for acquiring an original sentence name from a criminal law and sentence standardization file and constructing a sentence plot;
the case scenario tree construction module is used for constructing a case scenario tree based on the judge document and the constructed sentency plot;
the combined training module is used for carrying out model training on the sentency based on the case scenario tree, and training prediction tasks between the sentency and between the sentency and the sentency result;
and the sentencing plot analysis module is used for assisting in the sentencing plot analysis by utilizing the parameter distribution of the trained model.
5. An electronic device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-3.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1-3.
CN202211026955.5A 2022-08-25 2022-08-25 Criminal investigation plot decomposition analysis method and system Active CN115687632B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211026955.5A CN115687632B (en) 2022-08-25 2022-08-25 Criminal investigation plot decomposition analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211026955.5A CN115687632B (en) 2022-08-25 2022-08-25 Criminal investigation plot decomposition analysis method and system

Publications (2)

Publication Number Publication Date
CN115687632A CN115687632A (en) 2023-02-03
CN115687632B true CN115687632B (en) 2024-04-09

Family

ID=85061442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211026955.5A Active CN115687632B (en) 2022-08-25 2022-08-25 Criminal investigation plot decomposition analysis method and system

Country Status (1)

Country Link
CN (1) CN115687632B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150039357A (en) * 2013-10-02 2015-04-10 김성인 An expert system for determining penalty and operating method thereof
AU2018101652A4 (en) * 2018-11-04 2018-12-06 GU, Ming MR Criminal Case Intelligent Management and Processing System (CCIMPS)
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111126057A (en) * 2019-12-09 2020-05-08 航天科工网络信息发展有限公司 Case plot accurate criminal measuring system of hierarchical neural network
CN111428466A (en) * 2018-12-24 2020-07-17 北京国双科技有限公司 Legal document analysis method and device
CN112100212A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case scenario extraction method based on machine learning and rule matching
CN112732865A (en) * 2020-12-29 2021-04-30 长春市把手科技有限公司 Method and device for measuring and calculating criminal period influence ratio of criminal case plots
CN112948571A (en) * 2019-12-11 2021-06-11 中国司法大数据研究院有限公司 Calendar trial case association method and device based on referee document, electronic equipment and computer readable medium
CN113239130A (en) * 2021-06-18 2021-08-10 广东博维创远科技有限公司 Criminal judicial literature-based knowledge graph construction method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120191619A1 (en) * 2011-01-20 2012-07-26 John Nicholas Gross System & Method For Locating & Assessing Intellectual Property Assets
US20130297540A1 (en) * 2012-05-01 2013-11-07 Robert Hickok Systems, methods and computer-readable media for generating judicial prediction information

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150039357A (en) * 2013-10-02 2015-04-10 김성인 An expert system for determining penalty and operating method thereof
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
AU2018101652A4 (en) * 2018-11-04 2018-12-06 GU, Ming MR Criminal Case Intelligent Management and Processing System (CCIMPS)
CN111428466A (en) * 2018-12-24 2020-07-17 北京国双科技有限公司 Legal document analysis method and device
CN111126057A (en) * 2019-12-09 2020-05-08 航天科工网络信息发展有限公司 Case plot accurate criminal measuring system of hierarchical neural network
CN112948571A (en) * 2019-12-11 2021-06-11 中国司法大数据研究院有限公司 Calendar trial case association method and device based on referee document, electronic equipment and computer readable medium
CN112100212A (en) * 2020-09-04 2020-12-18 中国航天科工集团第二研究院 Case scenario extraction method based on machine learning and rule matching
CN112732865A (en) * 2020-12-29 2021-04-30 长春市把手科技有限公司 Method and device for measuring and calculating criminal period influence ratio of criminal case plots
CN113239130A (en) * 2021-06-18 2021-08-10 广东博维创远科技有限公司 Criminal judicial literature-based knowledge graph construction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN115687632A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN110675023B (en) Litigation request rationality prediction model training method based on neural network, and litigation request rationality prediction method and device based on neural network
CN112148832B (en) Event detection method of dual self-attention network based on label perception
CN113051365A (en) Industrial chain map construction method and related equipment
CN113779260B (en) Pre-training model-based domain map entity and relationship joint extraction method and system
CN109241199B (en) Financial knowledge graph discovery method
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN113312490B (en) Event knowledge graph construction method for emergency
CN112449700A (en) Semantic model instantiation method, system and device
CN111581193A (en) Data processing method, device, computer system and storage medium
CN112051986A (en) Code search recommendation device and method based on open source knowledge
CN113688635A (en) Semantic similarity based class case recommendation method
CN114860882A (en) Fair competition review auxiliary method based on text classification model
CN113312476A (en) Automatic text labeling method and device and terminal
CN117112782A (en) Method for extracting bid announcement information
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology
CN115099310A (en) Method and device for training model and classifying enterprises
CN110362828B (en) Network information risk identification method and system
CN115687632B (en) Criminal investigation plot decomposition analysis method and system
CN116703328A (en) Project review method and system
CN112163423B (en) Method and system for calculating check-in case handling work amount
CN113742495A (en) Rating characteristic weight determination method and device based on prediction model and electronic equipment
CN113515599A (en) Method for arranging help semantic analysis and scheme recommendation
Xiuli et al. Electronic Commerce Data Mining using Rough Set and Logistic Regression.
CN118012921B (en) Man-machine interaction data processing system for intellectual property virtual experiment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant