CN115934971A - Litigation case screening method and device, electronic equipment and storage medium - Google Patents

Litigation case screening method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115934971A
CN115934971A CN202211626255.XA CN202211626255A CN115934971A CN 115934971 A CN115934971 A CN 115934971A CN 202211626255 A CN202211626255 A CN 202211626255A CN 115934971 A CN115934971 A CN 115934971A
Authority
CN
China
Prior art keywords
case
sample
litigation
prediction model
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211626255.XA
Other languages
Chinese (zh)
Inventor
吴炳霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202211626255.XA priority Critical patent/CN115934971A/en
Publication of CN115934971A publication Critical patent/CN115934971A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples; acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the impression yield; training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a completely-trained case prediction model; and acquiring a target case, and screening the target case according to the positive sample probability of the target case after predicting the positive sample probability of the target case by using the case prediction model. The invention solves the technical problem that the proper litigation case cannot be effectively screened out in the prior art.

Description

Litigation case screening method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of internet, in particular to a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium.
Background
When a bank carries out loan, a lot of bad assets are often encountered, however, if litigation is carried out for each bad asset, judicial resources are inevitably wasted, and the litigation cost is increased. Therefore, how to use good steel on the blade effectively utilizes the limited judicial resources and reduces the asset loss of companies is the central importance of disposing the bad assets.
The model targets of the cards A, B and C in the flow in the market are clear and well defined: and timely repayment is a positive sample, and a case entering a bad pool is a negative sample. However, the selection of positive and negative examples presents a number of difficulties for priority litigation cases. Specifically, the sample which can be recovered in a short time is not necessarily a positive sample, for example, a case can be recovered without litigation, and no litigation is required, so that the sample is not a positive sample at this time, and a sample which cannot be recovered for a long time is not necessarily a positive sample, for example, a good discount yield cannot be achieved obviously by resorting to legal means, and at this time, besides causing inconvenience for a trip to a client, the purpose of rapid reimbursement cannot be achieved.
At present, no effective way is available for distinguishing the positive sample and the negative sample of the litigation case, so that judicial resources are not effectively utilized, and the judicial cost of the company is increased.
Disclosure of Invention
The invention aims to overcome the technical defects, provides a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium which are suitable for the financial science and technology or other related technical fields, and solves the technical problem that a proper litigation case cannot be screened effectively in the prior art.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
in a first aspect, the invention provides a litigation case screening method, which comprises the following steps:
obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;
acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount earnings;
training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability;
and obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.
In some embodiments, the obtaining a sample case set and performing sample division on each case in the sample case set based on a discount yield of each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples includes:
acquiring a sample case set, and calculating the discount rate of each case in the sample case set;
sorting the cases based on the level of the discount yield of each case;
and dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.
In some embodiments, the partitioning the sample set of cases according to the ranking results and litigation case of each case comprises:
the litigation cases in the top N% of cases are used as positive samples, the unllitigation cases are used as negative samples, the litigation cases in the other cases except the N% of cases are used as negative samples, and the unllitigation cases are used as positive samples.
In some embodiments, the obtaining a candidate label set of a training data set, and filtering the candidate label set based on the training data set to obtain a plurality of labels related to the discount profitability includes:
acquiring a candidate label set of a training data set;
rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;
and adopting a preset correlation calculation model to perform correlation calculation on the initially selected label set so as to screen out labels related to the discount earnings.
In some embodiments, the training a case prediction model constructed in advance based on the screened labels and the training dataset to obtain a case prediction model trained completely, where an output of the case prediction model is a positive sample probability includes:
dividing the data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and calculating the WOE value of each data group of each label;
and training a pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a completely-trained case prediction model.
In some embodiments, the case prediction model is built based on a logistic regression algorithm.
In some embodiments, the case prediction model is:
Figure BDA0004003557480000031
where P (Y =1 × X) represents the probability that a case is a positive sample, X represents a WOE value, w is a set of vectors representing weight values of respective labels, and b represents an intercept value of a linear function.
In a second aspect, the present invention also provides a litigation case screening device, comprising:
the training set establishing module is used for acquiring a sample case set and carrying out sample division on each case in the sample case set based on the impression profitability of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;
the label screening module is used for obtaining a candidate label set of a training data set and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount earnings;
the model training module is used for training a case prediction model which is constructed in advance based on the screened labels and the training data set so as to obtain a case prediction model which is trained completely, wherein the output of the case prediction model is positive sample probability;
and the case screening module is used for obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.
In a third aspect, the present invention also provides an electronic device, including: a processor and a memory;
the memory having stored thereon a computer program executable by the processor;
the processor, when executing the computer program, implements the steps in the litigation case screening method described above.
In a fourth aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the litigation case screening method as described above.
Compared with the prior art, the litigation case screening method, the litigation case screening device, the electronic device and the storage medium, provided by the invention, have the advantages that firstly, a sample case set is obtained, and based on the current yield of each case in the sample case set, sample division is carried out on each case in the sample case set, so that a training data set with a plurality of positive samples and negative samples is obtained; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively applies the machine learning algorithm to the maximum utilization of effective judicial resources, so that the judicial resources can be used according to the data, the complex manual comparative analysis is omitted, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in combination, and the performance target is strictly jointed with the actual target.
Drawings
FIG. 1 is a flow chart of an embodiment of a litigation case screening method provided by the present invention;
FIG. 2 is a flowchart of one embodiment of step S100 in the litigation case screening method of the present invention;
FIG. 3 is a flowchart of one embodiment of step S200 in the litigation case screening method of the present invention;
FIG. 4 is a flowchart of one embodiment of step S300 in the litigation case screening method provided by the present invention;
FIG. 5 is a functional block diagram of a litigation case screening device provided by an embodiment of the invention;
fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a litigation case screening method according to an embodiment of the present invention, which can be used for litigation case screening in the financial field to screen out appropriate litigation cases. The litigation case screening method can be executed by an electronic device, the electronic device can receive or send data and the like, and the litigation case screening method can be various electronic devices which are provided with display screens and support webpage browsing, including but not limited to smart phones, tablet computers, portable computers, desktop servers and the like. As shown in fig. 1, the method specifically includes the following steps S100 to S400.
S100, obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples.
In the embodiment, in order to provide training samples for subsequent case prediction models, after a sample case set is obtained, cases are divided according to the discount earnings of each case, the discount earnings are standards for measuring the value of the cases, and the higher the earnings are, the higher the litigation value of the cases is. Wherein, the yield rate of the core is = (recovery amount-recovery cost amount) × (accrual coefficient ^ (recovery time-pick-up time)/case principal balance, and the sample case set is divided by the accrual yield rate, so that a higher-quality training sample can be provided for the model, and the identification accuracy rate of the model is higher.
S200, obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield.
In this embodiment, a candidate tag set is first obtained, where the tag is a definition of various data of the sample, such as an identity number, an overdue number, a debit amount, and the like, and the candidate tag set may be obtained by specifically drawing through a cross-department conference, organizing professional talents in each aspect of legal affairs, finance, first-line business, communicating with filing rules, a flow, court limits in each department, case selection standards, and the like, or by learning the basis of actual work filing through a business system filing reason, case promotion, and the like, or by borrowing a tag used for building other models in the past. Because not all data corresponding to the candidate tags are suitable for predicting positive and negative samples, after the candidate tag set is obtained, the candidate tag set is screened, tags related to the appearing yield are selected, and after the data corresponding to the tags are input into the model, the trained model can accurately obtain the probability that the case is a positive sample.
S300, training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a well-trained case prediction model, wherein the output of the case prediction model is positive sample probability.
In the embodiment, after the screened labels are obtained, the labels and the training data set can be used for training the case prediction model which is constructed in advance, so that a model capable of predicting the probability of positive samples of cases is obtained, and the litigation condition of the cases can be accurately judged.
S400, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.
In the embodiment, after the case prediction model is trained, the target case can be screened according to the label data of the target case, the label data of the target case is input into the case prediction model, and then the case prediction model is used for outputting the positive sample probability, so that whether the target case is suitable for litigation can be judged according to the positive sample probability, and reference is provided for a user.
The method comprises the steps of firstly obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the presentation profitability of each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then, training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a well-trained case prediction model, wherein the output of the case prediction model is the probability of a positive sample; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.
In some embodiments, referring to fig. 2, the step S100 specifically includes:
s110, obtaining a sample case set, and calculating the discount yield of each case in the sample case set;
s120, sorting the cases based on the level of the discount yields of the cases;
s130, dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.
In the embodiment, after a sample case set is obtained, the discount earnings of all cases are calculated firstly, then the cases are divided into a plurality of customer groups by adopting the overdue days and the administration court according to the expert opinions, and the cases in the same customer group are sorted in a reverse order according to the discount earnings. And then, according to the sequencing result, carrying out sample division on each case to obtain a positive sample and a negative sample in the training data set, thereby facilitating the training of a subsequent model.
In some embodiments, the step S130 specifically includes:
the litigation cases in the top N% of cases are used as positive samples, the unllitigation cases are used as negative samples, the litigation cases in the other cases except the N% of cases are used as negative samples, and the unllitigation cases are used as positive samples.
In this embodiment, the cases in the top N% are all samples with higher current-applying recovery rate, if there is no litigation, the cases are not effectively utilized to obtain revenue, and therefore, the cases are classified as negative samples, if there is litigation, the cases are effectively utilized to obtain revenue, and therefore, the cases are classified as positive samples, the cases outside N% are all samples with lower current-applying revenue rate, if there is no litigation, the cases represent reasonable cost control, and therefore, the cases are classified as positive samples, and if there is litigation, the cases represent waste of litigation cost and judicial resources, and therefore, the cases are classified as negative samples. Illustratively, N is taken to be 60.
In some embodiments, referring to fig. 3, the step S300 specifically includes:
s310, acquiring a candidate label set of a training data set;
s320, rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;
s330, performing relevance calculation on the initially selected label set by adopting a preset relevance calculation model so as to screen out labels relevant to the appearing profitability.
In this embodiment, first, data distribution of each label is checked, labels with a high data loss rate, a high label single value ratio and a high label dereferencing specificity, such as identification numbers, are removed, and then the labels are removed according to linear correlation analysis, where the linear correlation is a relationship between numerical value changes between two labels, for example, if a is increased by 1 unit, and b is increased by 2 to 3 units, then a strong correlation relationship is between a and b. In this embodiment, the correlation between the tags is calculated by using a pearson correlation coefficient, the tags having a correlation coefficient greater than 0.75 have strong correlation, and only the tag having the highest correlation with the target variable is retained in each tag group having strong correlation.
In some embodiments, referring to fig. 4, the step S300 specifically includes:
s310, dividing data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and then calculating a WOE value of each data group of each label;
and S320, training the pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a well-trained case prediction model.
In some embodiments, the case prediction model is built based on a logistic regression algorithm.
In some embodiments, the case prediction model is:
Figure BDA0004003557480000101
where P (Y =1 × X) represents the probability that a case is a positive sample, X represents a WOE value, w is a set of vectors representing weight values of respective labels, and b represents an intercept value of a linear function.
In this embodiment, 1 tag is generally divided into several groups by value, and woe can calculate the relative advantage of the positive sample rate and the negative sample rate in one group, and calculate the formula: WOE = ln [ (number of positive samples/total positive samples within the group)/(number of negative samples/total negative samples within the group) ]. An IV value can be calculated according to the WOE value, the IV value in the group can calculate the positive sample prediction capability of the group, and the calculation formula is as follows: IV = [ (number of positive samples/total number of positive samples within group) - (number of negative samples/total number of negative samples within group) ] -WOE. The processing of the data corresponding to the tag can be performed according to the IV value, specifically, the tag includes a classification tag and a numerical tag, the value in the classification tag cannot be subjected to addition, subtraction, multiplication and division, if it is very satisfactory or more satisfactory, for the classification tag, the packet with the sample occupation ratio being too small is combined with the adjacent packet in sequence according to the IV value of the data group corresponding to the tag in the tag sequence, the value in the numerical tag can be subjected to addition, subtraction, multiplication and division, the numerical type tag is subjected to equal-width bucket division, that is, the numerical value sequence is sorted and is divided into N equal parts, the equal parts of the sample quantity in each equal part are approximately the same (note that the same value is in the same bucket), and the buckets with similar IV values are combined into the same group according to the IV value in each bucket sequence. And then, after groups are accurately obtained, performing model training by using WOE values of all groups, performing model training by using a logistic regression algorithm (LR), assigning WOE values calculated by all groups of the labels to training samples as input, starting model training, and outputting training parameters w and b of all the labels, wherein w is a group of vectors, the inside is the weight value of each label, b is an intercept value of a linear function, and the probability that each sample is a positive sample is calculated according to the group of values.
And the trained model parameters w and b are used for being substituted into the samples, the positive sample probability of each sample is calculated, the positive sample probability is subjected to reverse sequencing, the AUC value, the KS value, the accuracy, the recall rate, the precision rate and the like are calculated, and the label bucket distribution, input and output are adjusted according to the result so as to optimize the model.
And after the model is obtained, testing the model through a test set, substituting the calculated WOE value, the trained w value and b value into the model, and calculating an AUC value, a KS value, accuracy, recall rate and precision rate so as to verify the model.
According to the technical scheme provided by the invention, a sample case set is obtained firstly, and based on the impression profitability of each case in the sample case set, each case in the sample case set is subjected to sample division to obtain a training data set with a plurality of positive samples and negative samples; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.
Another embodiment of the present invention provides a litigation case screening device, please refer to fig. 5, which includes a training set establishing module 11, a tag screening module 12, a model training module 13, and a case screening module 14.
The training set establishing module 11 is configured to obtain a sample case set, and perform sample division on each case in the sample case set based on a discount yield of each case in the sample case set, so as to obtain a training data set with a plurality of positive samples and negative samples.
The label screening module 12 is configured to obtain a candidate label set of a training data set, and screen the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield.
The model training module 13 is configured to train a case prediction model that is constructed in advance based on the screened labels and the training data set to obtain a case prediction model that is trained completely, where an output of the case prediction model is a positive sample probability.
The case screening module 14 is configured to obtain a target case, and screen the target case according to the positive sample probability of the target case after predicting the positive sample probability of the target case by using the case prediction model.
In the embodiment, a sample case set is obtained firstly, and based on the discount yield of each case in the sample case set, each case in the sample case set is subjected to sample division to obtain a training data set with a plurality of positive samples and negative samples; then, a candidate label set of a training data set is obtained, and the candidate label set is screened based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, acquiring a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.
It should be noted that the modules referred to in the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable for describing the execution process of metadata completeness detection than a program.
In some embodiments, the training set establishing module 11 specifically includes a discount yield calculation unit, a sorting unit, and a dividing unit.
The discount yield calculation unit is used for acquiring the sample case set and calculating discount yields of all cases in the sample case set.
The sorting unit is used for sorting the cases based on the level of the subsidence yield of each case.
The dividing unit is used for dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.
In some embodiments, the dividing unit is specifically configured to:
the litigation cases in the top N% of cases are used as positive samples, the unllitigation cases are used as negative samples, the litigation cases in the other cases except the N% of cases are used as negative samples, and the unllitigation cases are used as positive samples.
In some embodiments, the tag screening module 12 is specifically configured to:
acquiring a candidate label set of a training data set;
rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;
and adopting a preset correlation calculation model to perform correlation calculation on the initially selected label set so as to screen out labels related to the discount earnings.
In some embodiments, the model training module 13 specifically includes:
dividing data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and then calculating the WOE value of each data group of each label;
in some embodiments, the case prediction model is built based on a logistic regression algorithm, and the case prediction model is trained by using the WOE value of each data set of each label as input and the positive sample probability as output.
In some embodiments, the case prediction model is:
Figure BDA0004003557480000141
where P (Y =1 × X) represents the probability that a case is a positive sample, X represents a WOE value, w is a set of vectors representing weight values of respective labels, and b represents an intercept value of a linear function.
Another embodiment of the present invention provides an electronic device, as shown in fig. 6, an electronic device 10 includes:
one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 6, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 6 illustrates a connection by a bus as an example.
The processor 110 is used to implement various control logic for the electronic device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip microcomputer, an ARM (Acorn RISC Machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The memory 120 is used as a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the litigation case screening method in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the electronic device 10 by executing the nonvolatile software programs, instructions and units stored in the memory 120, so as to implement the litigation case screening method in the above method embodiment.
The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the platform, at least one function; the stored data area may store data created from use of the electronic device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to electronic device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more units are stored in the memory 120, which when executed by the one or more processors 110, perform the litigation case screening method of any of the above-described method embodiments, e.g., perform the above-described method steps S100-S400 of fig. 1.
Another embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S400 of fig. 1 described above.
By way of example, a computer-readable storage medium can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.
In summary, the litigation case screening method, the litigation case screening device, the electronic device and the storage medium provided by the present invention first obtain a sample case set, and based on the realized profitability of each case in the sample case set, perform sample division on each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples; then, a candidate label set of a training data set is obtained, and the candidate label set is screened based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.
The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A litigation case screening method is characterized by comprising the following steps:
acquiring a sample case set, and carrying out sample division on each case in the sample case set based on the subsidence yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;
acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield;
training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability;
and obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.
2. The litigation case screening method of claim 1, wherein the obtaining of the sample case set and the sample division of each case in the sample case set based on the realized profitability of each case in the sample case set to obtain the training data set with a plurality of positive samples and negative samples comprises:
acquiring a sample case set, and calculating the discount yield of each case in the sample case set;
sorting the cases based on the level of the discount yield of each case;
and dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.
3. The litigation case screening method of claim 2, wherein the partitioning of the sample case set according to the ranking results and the litigation case of each case comprises:
the litigation cases in the top N% of cases are used as positive samples, the unllitigation cases are used as negative samples, the litigation cases in the other cases except the N% of cases are used as negative samples, and the unllitigation cases are used as positive samples.
4. The litigation case screening method of claim 1, wherein the obtaining of the candidate tag set of the training data set, and the screening of the candidate tag set based on the training data set to obtain a plurality of tags related to the discount rate comprises:
acquiring a candidate label set of a training data set;
rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;
and adopting a preset correlation calculation model to perform correlation calculation on the initially selected label set so as to screen out labels related to the discount earnings.
5. The litigation case screening method of claim 1, wherein the training of the case prediction model constructed in advance based on the screened labels and the training data set to obtain the well-trained case prediction model, wherein the output of the case prediction model is a positive sample probability, comprises:
dividing the data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and calculating the WOE value of each data group of each label;
and training the pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a well-trained case prediction model.
6. The litigation case screening method of claim 5, wherein the case prediction model is established based on a logistic regression algorithm.
7. The litigation case screening method of claim 5, wherein the case prediction model is:
Figure FDA0004003557470000021
where P (Y =1 × X) represents the probability that a case is a positive sample, X represents a WOE value, w is a set of vectors representing weight values of respective labels, and b represents an intercept value of a linear function.
8. A litigation case screening apparatus, comprising:
the training set establishing module is used for acquiring a sample case set and carrying out sample division on each case in the sample case set based on the impression profitability of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;
the label screening module is used for obtaining a candidate label set of a training data set and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount earnings;
the model training module is used for training a case prediction model which is constructed in advance based on the screened labels and the training data set so as to obtain a case prediction model which is completely trained, wherein the output of the case prediction model is positive sample probability;
and the case screening module is used for obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.
9. An electronic device, comprising: a processor and a memory;
the memory having stored thereon a computer program executable by the processor;
the processor, when executing the computer program, performs the steps in the method for litigation case screening of any one of claims 1 to 7.
10. A computer-readable storage medium, comprising: a processor and a memory;
the memory having stored thereon a computer program executable by the processor;
the processor when executing the computer program realizes the steps in the litigation case screening method of any one of claims 1 to 7.
CN202211626255.XA 2022-12-16 2022-12-16 Litigation case screening method and device, electronic equipment and storage medium Pending CN115934971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211626255.XA CN115934971A (en) 2022-12-16 2022-12-16 Litigation case screening method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211626255.XA CN115934971A (en) 2022-12-16 2022-12-16 Litigation case screening method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115934971A true CN115934971A (en) 2023-04-07

Family

ID=86698939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211626255.XA Pending CN115934971A (en) 2022-12-16 2022-12-16 Litigation case screening method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115934971A (en)

Similar Documents

Publication Publication Date Title
Sensini Selection of Determinants in Corporate Financial Distress
CN108416669A (en) User behavior data processing method, device, electronic equipment and computer-readable medium
CN109657932A (en) Business risk analysis method, device, computer equipment and storage medium
CN107220217A (en) Characteristic coefficient training method and device that logic-based is returned
CN107423613A (en) The method, apparatus and server of device-fingerprint are determined according to similarity
US8984022B1 (en) Automating growth and evaluation of segmentation trees
CN113095408A (en) Risk determination method and device and server
CN106991175A (en) A kind of customer information method for digging, device, equipment and storage medium
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN107545038A (en) A kind of file classification method and equipment
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN113674087A (en) Enterprise credit rating method, apparatus, electronic device and medium
CN112102006A (en) Target customer acquisition method, target customer search method and target customer search device based on big data analysis
CN111179055A (en) Credit limit adjusting method and device and electronic equipment
CN112950347B (en) Resource data processing optimization method and device, storage medium and terminal
CN110389963A (en) The recognition methods of channel effect, device, equipment and storage medium based on big data
CN113435900A (en) Transaction risk determination method and device and server
CN114611850A (en) Service analysis method and device and electronic equipment
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN116757476A (en) Method and device for constructing risk prediction model and method and device for risk prevention and control
CN116468547A (en) Credit card resource allocation method and system based on data mining
CN115934971A (en) Litigation case screening method and device, electronic equipment and storage medium
CN113434660A (en) Product recommendation method, device, equipment and storage medium based on multi-domain classification
CN112084408A (en) List data screening method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination