CN115934971A

CN115934971A - Litigation case screening method and device, electronic equipment and storage medium

Info

Publication number: CN115934971A
Application number: CN202211626255.XA
Authority: CN
Inventors: 吴炳霞
Original assignee: Ping An Bank Co Ltd
Current assignee: Ping An Bank Co Ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-04-07

Abstract

The invention discloses a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples; acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the impression yield; training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a completely-trained case prediction model; and acquiring a target case, and screening the target case according to the positive sample probability of the target case after predicting the positive sample probability of the target case by using the case prediction model. The invention solves the technical problem that the proper litigation case cannot be effectively screened out in the prior art.

Description

Litigation case screening method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of internet, in particular to a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium.

Background

When a bank carries out loan, a lot of bad assets are often encountered, however, if litigation is carried out for each bad asset, judicial resources are inevitably wasted, and the litigation cost is increased. Therefore, how to use good steel on the blade effectively utilizes the limited judicial resources and reduces the asset loss of companies is the central importance of disposing the bad assets.

The model targets of the cards A, B and C in the flow in the market are clear and well defined: and timely repayment is a positive sample, and a case entering a bad pool is a negative sample. However, the selection of positive and negative examples presents a number of difficulties for priority litigation cases. Specifically, the sample which can be recovered in a short time is not necessarily a positive sample, for example, a case can be recovered without litigation, and no litigation is required, so that the sample is not a positive sample at this time, and a sample which cannot be recovered for a long time is not necessarily a positive sample, for example, a good discount yield cannot be achieved obviously by resorting to legal means, and at this time, besides causing inconvenience for a trip to a client, the purpose of rapid reimbursement cannot be achieved.

At present, no effective way is available for distinguishing the positive sample and the negative sample of the litigation case, so that judicial resources are not effectively utilized, and the judicial cost of the company is increased.

Disclosure of Invention

The invention aims to overcome the technical defects, provides a litigation case screening method, a litigation case screening device, electronic equipment and a storage medium which are suitable for the financial science and technology or other related technical fields, and solves the technical problem that a proper litigation case cannot be screened effectively in the prior art.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

in a first aspect, the invention provides a litigation case screening method, which comprises the following steps:

obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;

acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount earnings;

training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability;

and obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.

In some embodiments, the obtaining a sample case set and performing sample division on each case in the sample case set based on a discount yield of each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples includes:

acquiring a sample case set, and calculating the discount rate of each case in the sample case set;

sorting the cases based on the level of the discount yield of each case;

and dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.

In some embodiments, the partitioning the sample set of cases according to the ranking results and litigation case of each case comprises:

the litigation cases in the top N% of cases are used as positive samples, the unllitigation cases are used as negative samples, the litigation cases in the other cases except the N% of cases are used as negative samples, and the unllitigation cases are used as positive samples.

In some embodiments, the obtaining a candidate label set of a training data set, and filtering the candidate label set based on the training data set to obtain a plurality of labels related to the discount profitability includes:

acquiring a candidate label set of a training data set;

rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;

and adopting a preset correlation calculation model to perform correlation calculation on the initially selected label set so as to screen out labels related to the discount earnings.

In some embodiments, the training a case prediction model constructed in advance based on the screened labels and the training dataset to obtain a case prediction model trained completely, where an output of the case prediction model is a positive sample probability includes:

dividing the data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and calculating the WOE value of each data group of each label;

and training a pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a completely-trained case prediction model.

In some embodiments, the case prediction model is built based on a logistic regression algorithm.

In some embodiments, the case prediction model is:

where P (Y =1 × X) represents the probability that a case is a positive sample, X represents a WOE value, w is a set of vectors representing weight values of respective labels, and b represents an intercept value of a linear function.

In a second aspect, the present invention also provides a litigation case screening device, comprising:

the training set establishing module is used for acquiring a sample case set and carrying out sample division on each case in the sample case set based on the impression profitability of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;

the label screening module is used for obtaining a candidate label set of a training data set and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount earnings;

the model training module is used for training a case prediction model which is constructed in advance based on the screened labels and the training data set so as to obtain a case prediction model which is trained completely, wherein the output of the case prediction model is positive sample probability;

and the case screening module is used for obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.

In a third aspect, the present invention also provides an electronic device, including: a processor and a memory;

the memory having stored thereon a computer program executable by the processor;

the processor, when executing the computer program, implements the steps in the litigation case screening method described above.

In a fourth aspect, the present invention also provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the litigation case screening method as described above.

Compared with the prior art, the litigation case screening method, the litigation case screening device, the electronic device and the storage medium, provided by the invention, have the advantages that firstly, a sample case set is obtained, and based on the current yield of each case in the sample case set, sample division is carried out on each case in the sample case set, so that a training data set with a plurality of positive samples and negative samples is obtained; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively applies the machine learning algorithm to the maximum utilization of effective judicial resources, so that the judicial resources can be used according to the data, the complex manual comparative analysis is omitted, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in combination, and the performance target is strictly jointed with the actual target.

Drawings

FIG. 1 is a flow chart of an embodiment of a litigation case screening method provided by the present invention;

FIG. 2 is a flowchart of one embodiment of step S100 in the litigation case screening method of the present invention;

FIG. 3 is a flowchart of one embodiment of step S200 in the litigation case screening method of the present invention;

FIG. 4 is a flowchart of one embodiment of step S300 in the litigation case screening method provided by the present invention;

FIG. 5 is a functional block diagram of a litigation case screening device provided by an embodiment of the invention;

fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Referring to fig. 1, fig. 1 is a schematic flow chart of a litigation case screening method according to an embodiment of the present invention, which can be used for litigation case screening in the financial field to screen out appropriate litigation cases. The litigation case screening method can be executed by an electronic device, the electronic device can receive or send data and the like, and the litigation case screening method can be various electronic devices which are provided with display screens and support webpage browsing, including but not limited to smart phones, tablet computers, portable computers, desktop servers and the like. As shown in fig. 1, the method specifically includes the following steps S100 to S400.

S100, obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the discount yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples.

In the embodiment, in order to provide training samples for subsequent case prediction models, after a sample case set is obtained, cases are divided according to the discount earnings of each case, the discount earnings are standards for measuring the value of the cases, and the higher the earnings are, the higher the litigation value of the cases is. Wherein, the yield rate of the core is = (recovery amount-recovery cost amount) × (accrual coefficient ^ (recovery time-pick-up time)/case principal balance, and the sample case set is divided by the accrual yield rate, so that a higher-quality training sample can be provided for the model, and the identification accuracy rate of the model is higher.

S200, obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield.

In this embodiment, a candidate tag set is first obtained, where the tag is a definition of various data of the sample, such as an identity number, an overdue number, a debit amount, and the like, and the candidate tag set may be obtained by specifically drawing through a cross-department conference, organizing professional talents in each aspect of legal affairs, finance, first-line business, communicating with filing rules, a flow, court limits in each department, case selection standards, and the like, or by learning the basis of actual work filing through a business system filing reason, case promotion, and the like, or by borrowing a tag used for building other models in the past. Because not all data corresponding to the candidate tags are suitable for predicting positive and negative samples, after the candidate tag set is obtained, the candidate tag set is screened, tags related to the appearing yield are selected, and after the data corresponding to the tags are input into the model, the trained model can accurately obtain the probability that the case is a positive sample.

S300, training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a well-trained case prediction model, wherein the output of the case prediction model is positive sample probability.

In the embodiment, after the screened labels are obtained, the labels and the training data set can be used for training the case prediction model which is constructed in advance, so that a model capable of predicting the probability of positive samples of cases is obtained, and the litigation condition of the cases can be accurately judged.

S400, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case.

In the embodiment, after the case prediction model is trained, the target case can be screened according to the label data of the target case, the label data of the target case is input into the case prediction model, and then the case prediction model is used for outputting the positive sample probability, so that whether the target case is suitable for litigation can be judged according to the positive sample probability, and reference is provided for a user.

The method comprises the steps of firstly obtaining a sample case set, and carrying out sample division on each case in the sample case set based on the presentation profitability of each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then, training a pre-constructed case prediction model based on the screened labels and the training data set to obtain a well-trained case prediction model, wherein the output of the case prediction model is the probability of a positive sample; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.

In some embodiments, referring to fig. 2, the step S100 specifically includes:

s110, obtaining a sample case set, and calculating the discount yield of each case in the sample case set;

s120, sorting the cases based on the level of the discount yields of the cases;

s130, dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.

In the embodiment, after a sample case set is obtained, the discount earnings of all cases are calculated firstly, then the cases are divided into a plurality of customer groups by adopting the overdue days and the administration court according to the expert opinions, and the cases in the same customer group are sorted in a reverse order according to the discount earnings. And then, according to the sequencing result, carrying out sample division on each case to obtain a positive sample and a negative sample in the training data set, thereby facilitating the training of a subsequent model.

In some embodiments, the step S130 specifically includes:

In this embodiment, the cases in the top N% are all samples with higher current-applying recovery rate, if there is no litigation, the cases are not effectively utilized to obtain revenue, and therefore, the cases are classified as negative samples, if there is litigation, the cases are effectively utilized to obtain revenue, and therefore, the cases are classified as positive samples, the cases outside N% are all samples with lower current-applying revenue rate, if there is no litigation, the cases represent reasonable cost control, and therefore, the cases are classified as positive samples, and if there is litigation, the cases represent waste of litigation cost and judicial resources, and therefore, the cases are classified as negative samples. Illustratively, N is taken to be 60.

In some embodiments, referring to fig. 3, the step S300 specifically includes:

s310, acquiring a candidate label set of a training data set;

s320, rejecting unqualified labels in the candidate label set according to the data distribution condition of each label in the candidate label set to obtain a primary selection label set;

s330, performing relevance calculation on the initially selected label set by adopting a preset relevance calculation model so as to screen out labels relevant to the appearing profitability.

In this embodiment, first, data distribution of each label is checked, labels with a high data loss rate, a high label single value ratio and a high label dereferencing specificity, such as identification numbers, are removed, and then the labels are removed according to linear correlation analysis, where the linear correlation is a relationship between numerical value changes between two labels, for example, if a is increased by 1 unit, and b is increased by 2 to 3 units, then a strong correlation relationship is between a and b. In this embodiment, the correlation between the tags is calculated by using a pearson correlation coefficient, the tags having a correlation coefficient greater than 0.75 have strong correlation, and only the tag having the highest correlation with the target variable is retained in each tag group having strong correlation.

In some embodiments, referring to fig. 4, the step S300 specifically includes:

s310, dividing data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and then calculating a WOE value of each data group of each label;

and S320, training the pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a well-trained case prediction model.

In some embodiments, the case prediction model is:

In this embodiment, 1 tag is generally divided into several groups by value, and woe can calculate the relative advantage of the positive sample rate and the negative sample rate in one group, and calculate the formula: WOE = ln [ (number of positive samples/total positive samples within the group)/(number of negative samples/total negative samples within the group) ]. An IV value can be calculated according to the WOE value, the IV value in the group can calculate the positive sample prediction capability of the group, and the calculation formula is as follows: IV = [ (number of positive samples/total number of positive samples within group) - (number of negative samples/total number of negative samples within group) ] -WOE. The processing of the data corresponding to the tag can be performed according to the IV value, specifically, the tag includes a classification tag and a numerical tag, the value in the classification tag cannot be subjected to addition, subtraction, multiplication and division, if it is very satisfactory or more satisfactory, for the classification tag, the packet with the sample occupation ratio being too small is combined with the adjacent packet in sequence according to the IV value of the data group corresponding to the tag in the tag sequence, the value in the numerical tag can be subjected to addition, subtraction, multiplication and division, the numerical type tag is subjected to equal-width bucket division, that is, the numerical value sequence is sorted and is divided into N equal parts, the equal parts of the sample quantity in each equal part are approximately the same (note that the same value is in the same bucket), and the buckets with similar IV values are combined into the same group according to the IV value in each bucket sequence. And then, after groups are accurately obtained, performing model training by using WOE values of all groups, performing model training by using a logistic regression algorithm (LR), assigning WOE values calculated by all groups of the labels to training samples as input, starting model training, and outputting training parameters w and b of all the labels, wherein w is a group of vectors, the inside is the weight value of each label, b is an intercept value of a linear function, and the probability that each sample is a positive sample is calculated according to the group of values.

And the trained model parameters w and b are used for being substituted into the samples, the positive sample probability of each sample is calculated, the positive sample probability is subjected to reverse sequencing, the AUC value, the KS value, the accuracy, the recall rate, the precision rate and the like are calculated, and the label bucket distribution, input and output are adjusted according to the result so as to optimize the model.

And after the model is obtained, testing the model through a test set, substituting the calculated WOE value, the trained w value and b value into the model, and calculating an AUC value, a KS value, accuracy, recall rate and precision rate so as to verify the model.

According to the technical scheme provided by the invention, a sample case set is obtained firstly, and based on the impression profitability of each case in the sample case set, each case in the sample case set is subjected to sample division to obtain a training data set with a plurality of positive samples and negative samples; then obtaining a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.

Another embodiment of the present invention provides a litigation case screening device, please refer to fig. 5, which includes a training set establishing module 11, a tag screening module 12, a model training module 13, and a case screening module 14.

The training set establishing module 11 is configured to obtain a sample case set, and perform sample division on each case in the sample case set based on a discount yield of each case in the sample case set, so as to obtain a training data set with a plurality of positive samples and negative samples.

The label screening module 12 is configured to obtain a candidate label set of a training data set, and screen the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield.

The model training module 13 is configured to train a case prediction model that is constructed in advance based on the screened labels and the training data set to obtain a case prediction model that is trained completely, where an output of the case prediction model is a positive sample probability.

The case screening module 14 is configured to obtain a target case, and screen the target case according to the positive sample probability of the target case after predicting the positive sample probability of the target case by using the case prediction model.

In the embodiment, a sample case set is obtained firstly, and based on the discount yield of each case in the sample case set, each case in the sample case set is subjected to sample division to obtain a training data set with a plurality of positive samples and negative samples; then, a candidate label set of a training data set is obtained, and the candidate label set is screened based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, acquiring a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.

It should be noted that the modules referred to in the present invention refer to a series of computer program instruction segments capable of performing specific functions, and are more suitable for describing the execution process of metadata completeness detection than a program.

In some embodiments, the training set establishing module 11 specifically includes a discount yield calculation unit, a sorting unit, and a dividing unit.

The discount yield calculation unit is used for acquiring the sample case set and calculating discount yields of all cases in the sample case set.

The sorting unit is used for sorting the cases based on the level of the subsidence yield of each case.

The dividing unit is used for dividing the sample case set according to the sequencing result and the litigation situation of each case to obtain a training data set with a plurality of positive samples and negative samples.

In some embodiments, the dividing unit is specifically configured to:

In some embodiments, the tag screening module 12 is specifically configured to:

acquiring a candidate label set of a training data set;

In some embodiments, the model training module 13 specifically includes:

dividing data corresponding to each label into a plurality of data groups based on the screened labels and the training data set, and then calculating the WOE value of each data group of each label;

in some embodiments, the case prediction model is built based on a logistic regression algorithm, and the case prediction model is trained by using the WOE value of each data set of each label as input and the positive sample probability as output.

In some embodiments, the case prediction model is:

Another embodiment of the present invention provides an electronic device, as shown in fig. 6, an electronic device 10 includes:

one or more processors 110 and a memory 120, where one processor 110 is illustrated in fig. 6, the processor 110 and the memory 120 may be connected by a bus or other means, and fig. 6 illustrates a connection by a bus as an example.

The processor 110 is used to implement various control logic for the electronic device 10, which may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a single chip microcomputer, an ARM (Acorn RISC Machine) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. Also, the processor 110 may be any conventional processor, microprocessor, or state machine. Processor 110 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The memory 120 is used as a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the litigation case screening method in the embodiment of the present invention. The processor 110 executes various functional applications and data processing of the electronic device 10 by executing the nonvolatile software programs, instructions and units stored in the memory 120, so as to implement the litigation case screening method in the above method embodiment.

The memory 120 may include a storage program area and a storage data area, wherein the storage program area may store an application program required for operating the platform, at least one function; the stored data area may store data created from use of the electronic device 10, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 120 optionally includes memory located remotely from processor 110, which may be connected to electronic device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more units are stored in the memory 120, which when executed by the one or more processors 110, perform the litigation case screening method of any of the above-described method embodiments, e.g., perform the above-described method steps S100-S400 of fig. 1.

Another embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions for execution by one or more processors, for example, to perform method steps S100-S400 of fig. 1 described above.

By way of example, a computer-readable storage medium can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Synchronous RAM (SRAM), dynamic RAM, (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The disclosed memory components or memory of the operating environment described herein are intended to comprise one or more of these and/or any other suitable types of memory.

In summary, the litigation case screening method, the litigation case screening device, the electronic device and the storage medium provided by the present invention first obtain a sample case set, and based on the realized profitability of each case in the sample case set, perform sample division on each case in the sample case set to obtain a training data set with a plurality of positive samples and negative samples; then, a candidate label set of a training data set is obtained, and the candidate label set is screened based on the training data set to obtain a plurality of labels related to the discount yield; then training a case prediction model constructed in advance based on the screened labels and the training data set to obtain a case prediction model with complete training, wherein the output of the case prediction model is positive sample probability; and finally, obtaining a target case, adopting the case prediction model to predict the positive sample probability of the target case, and then screening the target case according to the positive sample probability of the target case. The invention creatively uses the machine learning algorithm on how to maximally utilize the effective judicial resources, so that the use of the judicial resources is based on the foundation, the tedious manual comparison and analysis are saved, the decision is more scientific, the recovery amount and the cost are creatively considered, the future cash flow and the current value are considered in a combined manner, and the performance target is strictly jointed with the actual target.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A litigation case screening method is characterized by comprising the following steps:

acquiring a sample case set, and carrying out sample division on each case in the sample case set based on the subsidence yield of each case in the sample case set so as to obtain a training data set with a plurality of positive samples and negative samples;

acquiring a candidate label set of a training data set, and screening the candidate label set based on the training data set to obtain a plurality of labels related to the discount yield;

2. The litigation case screening method of claim 1, wherein the obtaining of the sample case set and the sample division of each case in the sample case set based on the realized profitability of each case in the sample case set to obtain the training data set with a plurality of positive samples and negative samples comprises:

acquiring a sample case set, and calculating the discount yield of each case in the sample case set;

sorting the cases based on the level of the discount yield of each case;

3. The litigation case screening method of claim 2, wherein the partitioning of the sample case set according to the ranking results and the litigation case of each case comprises:

4. The litigation case screening method of claim 1, wherein the obtaining of the candidate tag set of the training data set, and the screening of the candidate tag set based on the training data set to obtain a plurality of tags related to the discount rate comprises:

acquiring a candidate label set of a training data set;

5. The litigation case screening method of claim 1, wherein the training of the case prediction model constructed in advance based on the screened labels and the training data set to obtain the well-trained case prediction model, wherein the output of the case prediction model is a positive sample probability, comprises:

and training the pre-constructed case prediction model by taking the WOE value of each data set of each label as input and taking the positive sample probability as output so as to obtain a well-trained case prediction model.

6. The litigation case screening method of claim 5, wherein the case prediction model is established based on a logistic regression algorithm.

7. The litigation case screening method of claim 5, wherein the case prediction model is:

8. A litigation case screening apparatus, comprising:

the model training module is used for training a case prediction model which is constructed in advance based on the screened labels and the training data set so as to obtain a case prediction model which is completely trained, wherein the output of the case prediction model is positive sample probability;

9. An electronic device, comprising: a processor and a memory;

the processor, when executing the computer program, performs the steps in the method for litigation case screening of any one of claims 1 to 7.

10. A computer-readable storage medium, comprising: a processor and a memory;

the processor when executing the computer program realizes the steps in the litigation case screening method of any one of claims 1 to 7.