CN116823541A - Criminal investigation calculation method and system based on nonlinear model - Google Patents
Criminal investigation calculation method and system based on nonlinear model Download PDFInfo
- Publication number
- CN116823541A CN116823541A CN202311090895.8A CN202311090895A CN116823541A CN 116823541 A CN116823541 A CN 116823541A CN 202311090895 A CN202311090895 A CN 202311090895A CN 116823541 A CN116823541 A CN 116823541A
- Authority
- CN
- China
- Prior art keywords
- criminal
- sentencing
- determining
- nonlinear model
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 42
- 238000011840 criminal investigation Methods 0.000 title description 20
- 238000009826 distribution Methods 0.000 claims abstract description 28
- 238000004088 simulation Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 15
- 238000000034 method Methods 0.000 claims description 37
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 229920006395 saturated elastomer Polymers 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 7
- 238000007667 floating Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000005315 distribution function Methods 0.000 claims description 4
- 206010014405 Electrocution Diseases 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 4
- 208000014674 injury Diseases 0.000 description 12
- 208000027418 Wounds and injury Diseases 0.000 description 7
- 230000008859 change Effects 0.000 description 7
- 230000006378 damage Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 238000007619 statistical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012417 linear regression Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 231100000517 death Toxicity 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010052428 Wound Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 208000037974 severe injury Diseases 0.000 description 1
- 230000009528 severe injury Effects 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of legal text processing, in particular to a criminal calculation method and system based on a nonlinear model, comprising the following steps: text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics; determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule; estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations; and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
Description
Technical Field
The invention relates to the technical field of legal text processing, in particular to a criminal investigation calculation method and system based on a nonlinear model.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The criminal investigation system is a text data processing system for outputting predicted criminal investigation results to judicial workers according to text information in the cases, the system generally depends on characteristics extracted from a large amount of text data in the cases, and the criminal investigation results are output after certain processing and calculation according to influences of criminal investigation factors in different cases on criminal investigation results. At present, the system is divided into two types, namely a system built by following a traditional linear regression statistical model or a verification method, and a system based on machine learning and natural language processing technology for mining key information in a case text and decision logic contained in the text.
The linear regression statistical model uses a statistical method, and processes original text data in a case according to results such as a large number law, a central limit theorem and the like to obtain a calculation result of the crime, but the model itself does not limit a criminal period interval and cannot completely adapt to an actual criminal scene (the actual criminal scene has nonlinear saturation characteristics), so that the criminal mechanism contained in the text data cannot be accurately analyzed and controlled, and the obtained criminal result has larger difference from the criminal scene in the actual case description, and is difficult to help judicial staff to improve the working efficiency. In addition, because the number of the files disclosed at present is limited, the statistical method needs to assume that the data meets good statistical properties (such as independent and same distribution) a priori, so that the value and the precision of the estimated parameters in the model are difficult to determine, and the model cannot master the adjustment ratio of the sentry plots to the benchmark crimes in different files.
For related technologies and methods of machine learning and natural language processing, a large amount of legal text information is usually required to train a model with a powerful generalization capability, and the number of cases (decision texts) disclosed at present cannot support the capability of the model to obtain reliable output, so that it is difficult to determine the accuracy of estimating parameters in the model through training.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a criminal calculation method and a system based on a nonlinear model, which apply a nonlinear saturation model according to effective information extracted from a file text, and apply Bayesian embedding, random simulation and multi-stage calculation methods to solve the problem of accuracy judgment of parameter estimation in the process of obtaining a criminal result under a limited data sample, overcome the limitation that the prior method needs a sufficiently large data sample size, and simultaneously present the change trend of influence of each criminal feature along with time.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the first aspect of the invention provides a method for calculating a sentence based on a nonlinear model, comprising the steps of:
text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
The pretreatment comprises the following steps: and extracting characteristic fields related to the sentencing from the text data, and merging to obtain structured case text data.
The nonlinear model is a saturated nonlinear regression model, and the floating upper bound and the floating lower bound of the saturated nonlinear regression model are determined according to the case type.
Estimating noise of a nonlinear model, specifically: and acquiring the estimated noise of the nonlinear model based on a least square method, obtaining an empirical distribution curve by using the estimated noise, and determining the normal density function and variance of the noise.
The starting point of the sentencing is determined, specifically: dividing the sentencing interval into a plurality of equal parts, respectively calculating the accuracy of each equal part and the estimated value of the offset term, minimizing the offset term on the premise of ensuring that the calculation accuracy meets the set value, and determining the position of the sentencing starting point in the sentencing interval.
The weight of the sentencing factors is obtained, and the weight is specifically as follows: the magnitude of the sentencing factor weight is determined based on a multi-stage stochastic quasi-newton adaptive learning algorithm.
Determining an error limit of a nonlinear model parameter estimation value through simulation for a plurality of times, wherein the error limit is specifically as follows:
acquiring a plurality of preprocessed text data as samples, and acquiring a multidimensional output observation set based on a nonlinear model;
the multidimensional output observation set is based on a multistage random quasi-Newton self-adaptive learning algorithm to respectively obtain parameter estimation corresponding to frequency simulation;
and determining that the corresponding parameter estimation error belongs to the interval where the error limit is located at least with a certain probability according to the parameter estimation error of a certain dimension component and the empirical distribution function.
A second aspect of the present invention provides a system for implementing the above method, comprising:
a text preprocessing module configured to: text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
a first parameter estimation module configured to: determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
a second parameter estimation module configured to: estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
a result output module configured to: and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a non-linear model based electrocuting calculation method as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a non-linear model based electrocuting calculation method as described above when executing the program.
Compared with the prior art, the above technical scheme has the following beneficial effects:
1. according to effective information extracted from a real case text, determining the starting point of the crime and the weight of the crime factor, determining the error limit of the estimated value of the model parameter through multiple times of simulation according to generated sample data, and obtaining the value of an unknown parameter vector corresponding to the crime characteristic factor in a nonlinear model through the corresponding relation between the weight of the crime factor and the estimated value of the parameter, so that the model can better master the adjustment proportion of the crime scenario in different cases to the reference crime under the condition of limited data samples when calculating the crime period, thereby being more suitable for actual crime scene.
2. The sentencing scenes in each case description are independent, the adjustment proportion of the sentencing scenes to the reference crimes in different cases is different, the same multiple sentencing scenes are difficult to find to train the model, the weight estimation is given according to text data in the real case description, the weight estimation is given by randomly generated data, and the high probability confidence boundary of the estimation is obtained by utilizing the corresponding relation between the two parts of data, so that the adjustment proportion of the sentencing scenes to the reference crimes in different cases can be better mastered under limited data samples when the model calculates criminal periods.
3. The nonlinear saturation model is adopted to be well suitable for the criminal scene, the declaration can be limited in the required criminal interval for cases exceeding or being lower than the corresponding criminal interval, so that the applicability limitation of the traditional linear model is made up, and the analysis requirement on small data samples can be met.
4. The criminal characteristics can be estimated reliably, the change trend of the influence of each criminal characteristic along with time is presented, and the criminal characteristic factors corresponding to the cases can be analyzed and determined by judicial staff, so that the reliability of different cases in outputting prediction results is ensured.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a schematic diagram of the overall architecture of a sentencing calculation method provided by one or more embodiments of the present invention;
FIG. 2 is a schematic diagram showing the comparison of the accuracy of S-model and L-model calculation for severe cases according to one or more embodiments of the present invention;
FIG. 3 is a schematic diagram of a partial attention variable trend of an S-model provided in one or more embodiments of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The basic steps of the criminal investigation comprise determining a starting point of the criminal investigation, determining a reference criminal investigation and determining an announcement criminal investigation, wherein the starting point of the criminal investigation is the criminal investigation determined according to the basic facts in the case description, the reference criminal investigation is required to be adjusted according to the fact that each of the cases occurs in the case description process as an actual condition, so that the reference criminal investigation is added on the basis of the starting point of the criminal investigation to form the announcement criminal investigation, and the parameter which can influence the determination of the reference criminal investigation in the case description is the investigation factor.
As described in the background art, when a criminal system is used to process a criminal result of a document text information output reference, the following problems exist:
in the aspect of applicability of the first model, the traditional linear model lacks limitation of criminal intervals, has applicability limitation, and the deep learning model has larger requirement on data quantity.
Secondly, in the aspect of a calculation method, theoretical requirement data of the existing calculation method has stronger statistical assumption, and reliability of calculation of limited data samples is not guaranteed.
Therefore, the following embodiment provides a criminal calculation method and system based on a nonlinear model, which uses a nonlinear saturation model according to effective information extracted from a file text, and uses a Bayesian embedding and random simulation and multi-stage calculation method to solve the problem of accuracy judgment of parameter estimation in the process of obtaining a criminal result under a limited data sample, overcome the limitation that a simple statistical method needs a sufficiently large data sample size, and simultaneously present the change trend of the influence of each criminal feature along with time.
Embodiment one:
as shown in fig. 1 to 3, a criminal calculation method based on a nonlinear model comprises the following steps:
text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
Specific:
1. text preprocessing
Step 1-1: for judicial judgment document information and a case description text, a natural language processing technology is used for extracting a sentencing related characteristic field on the basis of reasonable segmentation and manual labeling, and the obtained natural characteristic is used as an element influencing the sentencing.
Step 1-2: on the basis, according to different calculation purposes, further feature selection is respectively carried out, and the specific flow is as follows:
step 1-2-1: classifying according to legal attributes of the fields (such as combining 'forensic' and 'penalized by criminal'), obtaining a slow criminal, and proposing a plurality of 'natural features' such as additional civil litigation for criminal calculation.
Step 1-2-2: combining the natural features according to the correlation; certain natural features are of a legally related or proximate nature and such features may be combined. For example, according to criminal act, item 26, a "principal" includes types such as "principal", "general principal", "employment of others", and so on, and thus three natural features of the above types may be combined into a "principal". For example, according to criminal law, clause 67, the term "self" includes types such as "quasi-self", "active self", "persuade self", and so on, and thus, three natural features of the above types may be combined into "self".
The present embodiment combines the extracted natural features in the text according to the natural features, and removes some of the excessively sparse features (e.g., less than 200 frequently, insufficient to affect after addition) according to the distribution of the specific features. To further exclude the effects of these features, cases containing these features are also deleted accordingly. The main characteristic factors are finally determined in the modeling.
2. Applicable to saturated nonlinear regression models
Step 2-1: and determining the floating upper bound and the floating lower bound of the saturated nonlinear regression model by combining the common characteristics of the sentencing scenes:
;
the specific definition of the saturation function is as follows:
;
in the course of the model,y t is the firsttThe criminal period (unit: month) of the case in question; the cases are ordered according to the judging time sequence, the cases occurring on the same day can be randomly ordered, and the calculation result is not substantially influenced.
Represents the starting point of the sentencing (units: months); />Representing the first separately in intentional injury crimestThe number of light injuries, heavy injuries and deaths which determine the criminal penalty in each case (the first is shown in the counterfeit registered trademark crimes respectively)tThe illegal license amount, the illegal obtained amount and the number of related trademark types in each case are respectively expressed in the financing fraud crimestFraud amount in individual cases, etc.);
b,c,dthe criminal period increased by each light injury, heavy injury and death is represented in the intentional injury crimes (the hazard degree of corresponding factors is represented in the counterfeit registered trademark crimes and the collecting fraud crimes);
representing regression vectors formed by selected sentencing characteristic factors; />Characterization of sentencingThe unknown parameter vector corresponding to the factors, each component represents the percentage of the corresponding characteristic factor action;eis a modeled bias term representing the combined effects of other sentencing feature factors that may not be considered; />Is random noise that may be present;U t is the upper limit of the corresponding forensic interval,L t is the lower limit of the corresponding forensic intervals, which vary with the case properties, and float according to the nature of the legal criminal episodes.
3. Calculation analysis method
Step 3-1 introducing a new high-dimensional regression vectorThe following are provided:
;
accordingly, an unknown parameter vector is defined:
;
accordingly, the nonlinear stochastic model in step 2-1 may be converted into the following saturation model:
;
wherein,,it is a known input vector that,y t is an output of the device and is,θis an unknown parameter to be estimated. The prior tightening convex set D where the parameters are can be given according to the actual case scene.
Parameters (parameters)θThe actual meaning of (1) is "the rate of adjustment of the sentency to the benchmark crime", i.e. the weight of the sentency factor, i.e. the parametersθIs the real value of the projection intervalθIs described a priori by (2)A range; whereas the range of the a priori set D corresponds to the specific provision in the legal system for the adjustment of the amplitude of the benchmark crimes with respect to the sentence.
Step 3-2: in order to solve the problem of a distribution function of model noise required in concrete sentencing calculation, firstly, based on judicial judgment data, the noise of a concrete sentencing model is estimated by a least square method.
Specifically, only the samples in the unsaturated zone are selected, and when the t-th sampleWhen arriving, the parameters are updated by using iterative least square algorithmθEstimate of +.>Estimated value of reuse parameter +.>Give a sample->Noise estimate ∈>The specific calculation formula is as follows:
;
estimation using noiseAnd (3) giving an empirical distribution curve of the noise, and determining a normal density function and variance of the noise, wherein the method is applied to a specific sentencing calculation process.
Step 3-3: in order to overcome the difficulty of solving the mathematical equation in the electrocution model, the size of the electrocution starting point is specifically determined by using a uniform segmentation method and a prediction precision comparison method. Six equal parts of the sentencing interval are divided, and the sentencing starting points are respectively taken as 1/6,1/3,1/2,2/3,5/6 parts of the sentencing interval and the lowest point and the highest point of the interval to calculate, so that 7 groups of different calculation precision and bias item estimated values are obtained.
The principle of selecting the starting point of the sentencing is to reduce the bias term as much as possible on the premise of ensuring higher calculation accuracy. The present example finally determines by experiment that the starting point of the sentencing is selected at 1/3 of the corresponding sentencing interval.
Step 3-4: the weight of the sentencing factor is calculated using a multi-stage random quasi-newton adaptive learning Method (MSQN).
Priori compacting convex set D aiming at parameters and positive definite matrixIntroducing a projection operator +.>The definition is:
;
wherein the normDefined as->The above projection operator is utilized.
Parameters (parameters)θThe actual meaning of (1) is "the rate of adjustment of the sentency to the benchmark crime", i.e. the weight of the sentency factor, i.e. the parametersθIs the real value of the projection intervalθIs a priori said range of (2); whereas the range of the a priori set D corresponds to the specific provision in the legal system for the adjustment of the amplitude of the benchmark crimes with respect to the sentence.
For each momenttThe algorithm inputs regression vectorsThe method comprises the steps of carrying out a first treatment on the surface of the Model outputy t Regularization factorμ j,t (j is more than or equal to 1 is more than or equal to K), algorithm stage number K, parameter projection priori set D, noise distribution ∈K>Algorithm ofInitial valueP j,0 (1≤j≤K),/>(1≤j≤K)。
tThe iterative estimation formula of moment is obtained based on an MSQN self-adaptive learning method, and the MSQN method (multi-stage random quasi-Newton self-adaptive learning method) is the prior art, and is not repeated in this embodiment.
Specifically, in the criminal investigation data analysis of this embodiment, k=3 is taken; through the estimation of the data noise in the step 3-1, the noise distribution is taken as normal distribution with the mean value of 0 and the standard deviation of 5;
furthermore, the function G t (-) can be expressed specifically as:
G t (x)=U t +(L t -x)F(L t -x)-(U t -x)F(U t -x)+25[f(L t -x)-f(U t -x)];
derivative G t The expression of (-) is:
G´ t (x)=F(U t -x)-(L t -x);
wherein, F (-) and F (-) are the distribution function and probability density function of normal distribution N (0, 25), respectively;
the projection interval of the corresponding parameter of the characteristic factor 'number of serious injury' is [0,40 ]]The projection interval of the corresponding parameter of the characteristic factor 'number of light wounds' is [0,10 ]]The projection interval of the offset item is [ -1,1]The projection interval of the parameters corresponding to the criminal characteristic factors (such as holding) in the other characteristics is [ -0.1,1]The projection interval of the corresponding parameter of the characteristic factor of the crime (such as self-beginning) is [ -1,0.1];μ j,t =25, 1+.j+.3; for j is more than or equal to 1 and less than or equal to 3Initial value P j,0 =I,/>N is the sample size.
The multi-stage Method (MSQN) in this embodiment is designed for a sentencing system with saturated properties, unlike the conventional linear least squares (RLS) method. The algorithm does not need regression vectors formed by data to meet independent and equitable statistical assumptions which are difficult to meet in the tradition, is more suitable for the characteristic of complex text information such as judicial judgment, and provides calculation support for assisting judicial staff in judging the actual influence of each sentence on criminal period judgment.
Step 3-5: the parameter estimation precision (high probability confidence limit) under the limited criminal data sample is given by comprehensively utilizing Bayesian embedding, random simulation and multi-stage calculation methods, and the limitation that the simple statistical method needs a sufficiently large data sample size is overcome. The method comprises the following steps:
step 3-5-1: independently and uniformly distribute and extract N samplesObeys the distribution->Where U is a uniform distribution over the parameter a priori set D and F is noise subject to a normal distribution (noise estimate distribution obtained using step 3-2, where samples are generated using this noise distribution, not extracted from the raw data).
Step 3-5-2: using N samples { X ] 1 ,X 2 ,...X N Characteristics (V-shaped)And a saturated nonlinear regression model form, generating an n-dimensional output observation set { Y } 1 ,Y 2 ,...Y N }。
Step 3-5-3: parameter estimation under the simulation condition of N times is respectively given by using an MSQN method, and parameter estimation errors of a j-th dimensional component (j=1,..m) of the parameter are calculatedThen calculate +.>Is a function of the empirical distribution of (a):
;
step 3-5-4: for any oneAnd meet->Error in estimating j-th dimension parameter calculated by MSQN>At least with probability->Belonging to the following intervals:
;
wherein,,is an empirical distribution->Is->Quantiles.
Step 3-6: by using in step 3-1And->Gives the corresponding relation of +.>And estimation accuracy (high probability confidence limits).
The step 3-4 is the "weight estimation" given according to the text data in the real case description, here the "weight estimation" given by the randomly generated data is used for obtaining the estimated high probability confidence limit, so that the model can better master the adjustment proportion of the sentencing plot to the reference criminal plot in different cases under the limited data sample when calculating the criminal period.
4. Verification
In this embodiment, criminal period prediction and factor analysis are performed by taking intentional injury crime as an example, and samples are taken from intentional injury crime initial review judgment data disclosed in 2011 month 1 to 2021 month 6, which is 19.959 ten thousand in total.
Step 4-1: based on the nonlinear recursive identification algorithm in the step 3-3, a specific parameter estimation value and an error bound thereof can be obtained according to the structural data extracted from the actual judgment document information, and a nonlinear saturated model is provided by taking an S-model as the embodiment, wherein the L-model is a traditional linear model.
Step 4-2: the criminal period can be calculated by using the nonlinear saturation model (MSQN algorithm, abbreviated as S-model) and the parameter estimation value calculated in step 4-1, and compared with the criminal calculation accuracy of the conventional linear model (RLS algorithm, abbreviated as L-model), and the specific results are shown in table 1:
table 1: s-model and L-model calculation accuracy comparison
The calculation accuracy of the sentence is defined as the average value of the relative calculation errors (Prediction accuracy prediction accuracy). It can be seen that the accuracy of the S-model is improved compared with the L-model in terms of criminal period calculation of the case of serious injury caused by intentional injury.
Fig. 2 shows the trend of calculation accuracy of severe injury cases in the criminal period from 2011 to 2021. It can be seen that the calculation accuracy of the S-model is always much higher than that of the L-model.
Step 4-3: in addition, the MSQN algorithm can also be utilized to present the time-dependent change trend of the influence of the sentencing characteristic elements, so as to find the rule of judicial change and the law treatment transition track behind the rule of judicial change. It was found by analysis that most of the sentry characteristic factors in the modeling appeared smooth over time, but there were also distinct characteristic changes, such as the bias term and the crime penalties, which appeared to change significantly over time, as shown in figure 3.
According to the embodiment, according to the criminal pattern extracted from the legal text, the criminal system can assist judicial judgment according to the reliable document information, and the criminal result for reference is output to judicial staff.
In terms of applicability of the model, a nonlinear saturation model with accuracy and interpretability is applied. The model can be well adapted to the criminal scene, and the declaration of the case exceeding or being lower than the corresponding normal criminal interval is ensured to be limited in the normal criminal interval, so that the applicability limitation of the traditional linear model is made up, and the model can be adapted to the analysis requirement of small data samples.
In the aspect of a calculation method, considering that complex social data such as judicial decision texts are far from meeting traditional data assumptions such as independent same distribution, a self-adaptive method is provided, and theoretical guarantee can be established under weaker data conditions.
In terms of precision assurance, reliability assurance of parameter estimation of the situation of limited data samples is given theoretically to accurately define the actual action size of the sentence.
In terms of computational effect. Compared with the traditional linear regression model, the established nonlinear model and the corresponding new calculation method can give an explanatory criminal period result according to the given case description, the accuracy of criminal period calculation is improved (taking intentional injury as an example), and a criminal suggestion result with more reference value can be provided for a judge.
Embodiment two:
the system for realizing the method comprises the following steps:
a text preprocessing module configured to: text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
a first parameter estimation module configured to: determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
a second parameter estimation module configured to: estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
a result output module configured to: and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
Embodiment III:
the present embodiment provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a non-linear model based electrocuting calculation method as described in the above embodiment.
Embodiment four:
the present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in a method for criminal computation based on a nonlinear model as described in the above embodiment.
The steps or networks involved in the above embodiments two to four correspond to the embodiment one, and the detailed description of the embodiment one can be referred to in the relevant description section of the embodiment one. The term "computer-readable storage medium" should be taken to include a single medium or multiple media including one or more sets of instructions; it should also be understood to include any medium capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any one of the methods of the present invention.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. The criminal calculation method based on the nonlinear model is characterized by comprising the following steps of:
text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
2. The method of calculating a crime based on a nonlinear model as claimed in claim 1, wherein the preprocessing includes: and extracting characteristic fields related to the sentencing from the text data, and merging to obtain structured case text data.
3. The sentencing calculation method based on a nonlinear model as claimed in claim 1, wherein the nonlinear model is a saturated nonlinear regression model, and the floating upper and lower bounds of the saturated nonlinear regression model are determined according to the case type.
4. The method for calculating the sentence based on the nonlinear model as claimed in claim 1, wherein the noise of the nonlinear model is estimated, specifically: and acquiring the estimated noise of the nonlinear model based on a least square method, obtaining an empirical distribution curve by using the estimated noise, and determining the normal density function and variance of the noise.
5. A method of calculating a crime based on a nonlinear model as claimed in claim 1, characterized by determining the starting point of the crime, in particular: dividing the sentencing interval into a plurality of equal parts, respectively calculating the accuracy of each equal part and the estimated value of the offset term, minimizing the offset term on the premise of ensuring that the calculation accuracy meets the set value, and determining the position of the sentencing starting point in the sentencing interval.
6. The method for calculating the sentence based on the nonlinear model as claimed in claim 1, wherein the magnitude of the weight of the sentence factor is obtained, specifically: the magnitude of the sentencing factor weight is determined based on a multi-stage stochastic quasi-newton adaptive learning algorithm.
7. The method of claim 1, wherein determining the margin of error of the nonlinear model parameter estimation value through several simulations comprises:
and acquiring a plurality of preprocessed text data as samples, and acquiring a multidimensional output observation set based on a nonlinear model.
8. The method for calculating a crime based on a nonlinear model according to claim 7, wherein the error margin of the nonlinear model parameter estimation value is determined through several simulations, further comprising:
the multidimensional output observation set is based on a multistage random quasi-Newton self-adaptive learning algorithm to respectively obtain parameter estimation corresponding to the frequency simulation.
9. The method for calculating a crime based on a nonlinear model according to claim 8, wherein the error margin of the nonlinear model parameter estimation value is determined through several simulations, further comprising:
and determining that the corresponding parameter estimation error belongs to the interval where the error limit is located at least with a certain probability according to the parameter estimation error of a certain dimension component and the empirical distribution function.
10. A non-linear model based electrocution computing system comprising:
a text preprocessing module configured to: text data in case description is obtained, and the text data are preprocessed to obtain sentencing factor characteristics;
a first parameter estimation module configured to: determining a criminal starting point based on the obtained criminal factor characteristics, and determining a range of a criminal range of the criminal rule adjustment reference according to the criminal factor characteristics to obtain the weight of the criminal rule;
a second parameter estimation module configured to: estimating noise of the nonlinear model, determining noise distribution, generating a plurality of samples obeying the noise distribution based on the sentencing factor characteristics, and determining error limits of parameter estimation values of the nonlinear model through a plurality of simulations;
a result output module configured to: and in the error limit, according to the corresponding relation between the weight of the sentencing factors and the parameter estimation value, determining the estimation value and the confidence limit of the unknown parameter vector corresponding to the sentencing characteristic factors in the nonlinear model, and obtaining the prediction criminal period output by the model as a reference standard for judgment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311090895.8A CN116823541A (en) | 2023-08-29 | 2023-08-29 | Criminal investigation calculation method and system based on nonlinear model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311090895.8A CN116823541A (en) | 2023-08-29 | 2023-08-29 | Criminal investigation calculation method and system based on nonlinear model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116823541A true CN116823541A (en) | 2023-09-29 |
Family
ID=88114819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311090895.8A Pending CN116823541A (en) | 2023-08-29 | 2023-08-29 | Criminal investigation calculation method and system based on nonlinear model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116823541A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872439A (en) * | 2010-01-29 | 2010-10-27 | 秦野 | Common criminal law sentencing method and system for hundred accusations |
US20160275147A1 (en) * | 2013-07-31 | 2016-09-22 | Ubic, Inc. | Document classification system, document classification method, and document classification program |
CN109241528A (en) * | 2018-08-24 | 2019-01-18 | 讯飞智元信息科技有限公司 | A kind of measurement of penalty prediction of result method, apparatus, equipment and storage medium |
CN109376227A (en) * | 2018-10-29 | 2019-02-22 | 山东大学 | A kind of prison term prediction technique based on multitask artificial neural network |
-
2023
- 2023-08-29 CN CN202311090895.8A patent/CN116823541A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101872439A (en) * | 2010-01-29 | 2010-10-27 | 秦野 | Common criminal law sentencing method and system for hundred accusations |
US20160275147A1 (en) * | 2013-07-31 | 2016-09-22 | Ubic, Inc. | Document classification system, document classification method, and document classification program |
CN109241528A (en) * | 2018-08-24 | 2019-01-18 | 讯飞智元信息科技有限公司 | A kind of measurement of penalty prediction of result method, apparatus, equipment and storage medium |
CN109376227A (en) * | 2018-10-29 | 2019-02-22 | 山东大学 | A kind of prison term prediction technique based on multitask artificial neural network |
Non-Patent Citations (3)
Title |
---|
LANTIAN ZHANG等: "Adaptive Identification with Guaranteed Performance Under Saturated-Observation and Non-Persistent Excitation", 《IEEE》, pages 6 - 7 * |
王芳等: "非线性递推辨识理论在量刑数据分析中的应用", 《中国科学:信息科学》, vol. 52, no. 10, pages 1840 - 1850 * |
甘勤涛等主编: "《MATLAB 2020智能算法从入门到精通》", vol. 1, 机械工业出版社, pages: 79 - 80 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112434169B (en) | Knowledge graph construction method and system and computer equipment thereof | |
CN109766950B (en) | Industrial user short-term load prediction method based on morphological clustering and LightGBM | |
Myronenko et al. | Accounting for dependencies in deep learning based multiple instance learning for whole slide imaging | |
CN103745482B (en) | A kind of Dual-threshold image segmentation method based on bat algorithm optimization fuzzy entropy | |
CN111539444B (en) | Gaussian mixture model method for correction type pattern recognition and statistical modeling | |
CN116958688B (en) | Object detection method and system based on YOLOv network | |
Tong et al. | Learning fractional white noises in neural stochastic differential equations | |
CN115859302A (en) | Source code vulnerability detection method, device, equipment and storage medium | |
CN104636318B (en) | The distribution or incremental calculation method of a kind of big data variance criterion difference | |
CN114792397A (en) | SAR image urban road extraction method, system and storage medium | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
CN105653567A (en) | Method for quickly looking for feature character strings in text sequential data | |
CN116823541A (en) | Criminal investigation calculation method and system based on nonlinear model | |
Kargar et al. | A proposed method for solving fuzzy system of linear equations | |
CN115187266B (en) | Credit card fraud detection method and system based on memory variation self-coding model | |
CN102081753A (en) | GMM (Gaussian mixture models) classification method on basis of online splitting and merging EM (expectation maximization) algorithm | |
Rui et al. | Data Reconstruction based on supervised deep auto-encoder | |
Paco Ramos et al. | A feature extraction method based on convolutional autoencoder for plant leaves classification | |
CN105955925A (en) | Time series warping approach for depth scale learning | |
Hadiyat et al. | Comparing statistical feature and artificial neural networks for control chart pattern recognition: a case study | |
Raja | A Novel Thinking To Enhance The Gradient Boost Decision Tree Classifier For Identifying Path In Autonomous Vehicle | |
Amekoe et al. | Self-Reinforcement Attention Mechanism For Tabular Learning | |
CN109145267A (en) | Legal decision case matching process and device based on topic model | |
Sivapragasam et al. | Identifying Optimal Training Data Set-A New Approach | |
Wang | Forecasting Credit Card Defaults Using Light Gradient Boosting Machine with Dart Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |