CN115689713A - Abnormal risk data processing method and device, computer equipment and storage medium - Google Patents

Abnormal risk data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115689713A
CN115689713A CN202211399582.6A CN202211399582A CN115689713A CN 115689713 A CN115689713 A CN 115689713A CN 202211399582 A CN202211399582 A CN 202211399582A CN 115689713 A CN115689713 A CN 115689713A
Authority
CN
China
Prior art keywords
risk
target
data
abnormal
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211399582.6A
Other languages
Chinese (zh)
Inventor
王福静
刘彦兵
林伟丰
陈敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kingdee Software China Co Ltd
Original Assignee
Kingdee Software China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kingdee Software China Co Ltd filed Critical Kingdee Software China Co Ltd
Priority to CN202211399582.6A priority Critical patent/CN115689713A/en
Publication of CN115689713A publication Critical patent/CN115689713A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to an abnormal risk data processing method, an abnormal risk data processing device, a computer device, a storage medium and a computer program product. The method comprises the following steps: responding to an abnormal risk evaluation event aiming at a target object, acquiring source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to acquire risk data of the plurality of target risk indexes, and storing the risk data into a model input data set file; executing the model master file to run a target scoring model, and reading respective risk data of a plurality of target risk indexes from the model input data set file; and inputting the risk data of each target risk index into a target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model. By adopting the method, the abnormal risk assessment efficiency can be improved.

Description

Abnormal risk data processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for processing abnormal risk data.
Background
With the development of computer technology, resource exchange between different participants is more and more widely applied. The resources can include entity material resources, information resources, financial resources and the like, when the resources are exchanged, the resource provider in the participants can provide the resources to the resource demander in the participants, before the appointed exchange expires, the resource provider needs to recover the resources provided to the resource demander, the resources to be recovered have the risk of being incapable of being recovered, and abnormal resources are formed, for example, the financial resources which are incapable of being recovered form a bad account, and the loss of the resource provider is caused. In order to reduce the loss, the resource provider needs to evaluate the abnormal risk of the resource to be recovered to the resource demander. Currently, a resource provider manually observes some data through a service expert and evaluates abnormal risks by subjective experience.
However, at present, the evaluation of the abnormal risk by manual observation of the service experts requires manual operation depending on the specific experience of the specific experts, and the evaluation efficiency is low.
Disclosure of Invention
In view of the foregoing, it is necessary to provide an abnormal risk data processing method, an abnormal risk data processing apparatus, a computer device, a computer readable storage medium, and a computer program product, which can improve the efficiency of abnormal risk assessment.
In a first aspect, the present application provides a method for processing abnormal risk data. The method comprises the following steps:
responding to an abnormal risk evaluation event aiming at a target object, obtaining source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to obtain risk data of the target risk indexes, and storing the risk data to a model input data set file;
executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file;
inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators when trained.
In a second aspect, the application further provides an abnormal risk data processing device. The device comprises:
the system comprises an input module, a model input data set file and a resource exchange recording system, wherein the input module is used for responding to an abnormal risk evaluation event aiming at a target object, acquiring source data of a plurality of target risk indexes of the target object according to a plurality of target risk indexes used by a target scoring model score, preprocessing the source data to acquire risk data of the target risk indexes, and storing the risk data into the model input data set file;
an execution module for executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file; inputting the risk data of the target risk index into the target scoring model for processing to obtain an abnormal risk scoring grade of the target object output by the target scoring model; the target scoring model, when trained, screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators.
In a third aspect, the application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
responding to an abnormal risk evaluation event aiming at a target object, obtaining source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to obtain risk data of the target risk indexes, and storing the risk data to a model input data set file;
executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file;
inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model, when trained, screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
responding to an abnormal risk assessment event aiming at a target object, acquiring source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to acquire risk data of the target risk indexes, and storing the risk data to a model input data set file;
executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file;
inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators when trained.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
responding to an abnormal risk evaluation event aiming at a target object, obtaining source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to obtain risk data of the target risk indexes, and storing the risk data to a model input data set file;
executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file;
inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model, when trained, screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators.
According to the abnormal risk data processing method, the abnormal risk data processing device, the computer equipment, the storage medium and the computer program product, the source data of the target risk indexes of the target object obtained by the resource exchange recording system and the at least one public information source are preprocessed to obtain the risk data of the target risk indexes, and a sufficiently rich data base related to abnormal risk is provided; and part of target risk indexes are screened from a plurality of candidate risk indexes during target scoring model training, the reliability of predicting the abnormal risk scoring grade based on the target scoring model and the plurality of target risk indexes is high, the respective risk data of the plurality of target risk indexes of the target object are input into the target scoring model to obtain the abnormal risk scoring grade of the target object, and the evaluation efficiency is improved while the accurate abnormal risk scoring grade can be obtained.
Drawings
FIG. 1 is a diagram of an application environment of a method for processing abnormal risk data in one embodiment;
FIG. 2 is a schematic flow chart diagram of a method for processing abnormal risk data in one embodiment;
FIG. 3 is a schematic flow chart of the training steps in one embodiment;
FIG. 4 is a graphical illustration of a distribution of scoring data for each abnormal risk score in one embodiment;
FIG. 5 is a diagram illustrating the correspondence between overdue periods and days of overdue in one embodiment;
FIG. 6 is a schematic diagram of the receivables balances for each of the 7 month to 12 month overdue phases in one embodiment;
FIG. 7 is a diagram illustrating migration amounts for each overdue period of 8 months to 12 months in one embodiment;
FIG. 8 is a schematic diagram illustrating the resource migration rate during the overdue phases of 8 months to 12 months in one embodiment;
FIG. 9 is a schematic diagram of a formula for calculating the predicted overdue loss ratio for each overdue phase in one embodiment;
FIG. 10 is a schematic diagram illustrating predicted overdue loss proportions for each overdue phase in one embodiment;
FIG. 11 is a graphical illustration of a predicted bad account amount for the target object for the current month in one embodiment;
FIG. 12 is a flowchart of the steps in a method for exception risk data processing in an exemplary embodiment;
FIG. 13 is a list of data types that a public information source may mine in one embodiment;
FIG. 14 is a data services functionality interface that displays the structured processing results for risk indicators in one embodiment;
FIG. 15 is a partial data presentation diagram for training a model in one embodiment;
FIG. 16 is a KS curve constructed from scoring data for a plurality of sample objects in one embodiment;
FIG. 17 is a graphical illustration of respective cumulative probability distribution functions for good and bad sample objects in one embodiment;
FIG. 18 is a diagram of an abnormal risk prediction project file in accordance with an embodiment;
FIG. 19 is a schematic diagram illustrating an abnormal risk prediction project file in accordance with one embodiment;
FIG. 20 is a flow chart illustrating the transition of overdue accounts receivable to bad accounts in one embodiment;
FIG. 21 is an embodiment of an accounts receivable bad account risk prediction interface;
FIG. 22 is a block diagram showing an example of the structure of an abnormality risk data processing apparatus;
FIG. 23 is a diagram of an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The abnormal risk data processing method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The data server 102 may operate a data server, the resource exchange recording system may operate on the resource server 104, and the resource exchange recording system may also operate on the data server 102; in the case where at least one of the sources of public information includes a plurality of sources of public information, one of the sources of public information may be run on the information server 106 and the other sources of public information may be run on the other information servers; the data server 102 may be in communication with the resource server 104, the information server 106, and other information servers via a network by a data server side. The data storage system may store data that needs to be processed when the data server 102 performs the abnormal risk data processing method. The data storage system may be integrated on the data server 102, or may be located on the cloud or other server.
In the application environment shown in fig. 1, the data server 102 may respond to an abnormal risk assessment event for a target object, obtain source data of each of a plurality of target risk indicators of the target object from a resource exchange recording system running on the resource server 104 and at least one public information source running on the information server 106, perform preprocessing, obtain risk data of each of the plurality of target risk indicators, execute a model master file, input the risk data of each of the plurality of target risk indicators into a target scoring model, and perform processing, thereby obtaining an abnormal risk scoring level of the target object. The data server 102, the resource server 104, and the information server 106 may be implemented by separate servers, or by a server cluster composed of a plurality of servers. In some embodiments, the data server 102 may be replaced with a terminal, or the functions of the data server 102 may be performed by the terminal and the server interactively. The terminal can be various desktop computers, notebook computers, smart phones or tablet computers.
In an embodiment, as shown in fig. 2, there is provided an abnormal risk data processing method, which is described in this embodiment by taking the method as an example applied to the data server 102 in fig. 1, and the method includes the following steps:
step 202, in response to an abnormal risk assessment event for a target object, according to a plurality of target risk indexes used by a target scoring model score, obtaining source data of the target object from a resource exchange recording system and at least one public information source, preprocessing the source data to obtain risk data of the target object, and storing the risk data into a model input data set file.
Among these, resources are any available substances. The resources may include physical resources such as buildings, vehicles, physical goods, or physical currency, or intangible resources such as virtual currency, electronic accounts, data services, software products, or labor services, etc., where physical currency, virtual currency may be referred to as financial resources.
The resource exchange refers to the activity that at least two participants use the same type of resources or different types of resources to exchange according to appointed conditions; when a resource provider in a participant provides resources, the resources are required to be recovered from a resource demand side in the participant in an appointed exchange period to compensate the provided resources, and before the resources are completely recovered, the unrecovered resources are to-be-recovered resources; when a resource to be reclaimed cannot be reclaimed, the resource to be reclaimed may be referred to as an exception resource. The resource exchange recording system is a system for recording various information in the resource exchange process, and the various information can comprise participant identity identification, the number of exchanged resources, the resource value of exchange, resource exchange time or resource recovery appointed time and the like.
The target object is a resource demander who needs to be evaluated to form the risk of the abnormal resource, and can be an enterprise or an individual. The abnormal risk assessment event is an event for assessing the possibility that the resource to be recovered of the target object is converted into the abnormal resource, and may be an automatic trigger event or a manual trigger operation. The automatic trigger event may be a periodic trigger, may be an automatic trigger when a target object is newly added in the system, or may be an automatic trigger when source data of each of a plurality of target risk indicators of the target object is updated; the manual trigger operation may be a click operation of the identification information of the abnormal risk assessment function key. The public information source is a source of public information that can be acquired by the public, and can be various accessible websites, newspapers, periodicals, and the like.
The target scoring model is a mathematical model having specific rules or specific mathematical formulas for assessing the abnormal risk scoring level of the target object. The target risk index is an index of data employed for evaluating an abnormal risk score level of a target subject. The source data of the target risk indicator is the original data under the target risk indicator directly obtained from the resource exchange recording system or at least one public information source.
The preprocessing is to convert at least the data format of the source data and perform data statistics according to the target risk index to form the risk data of the target risk index which can be directly input into the target scoring model. For example, when the target risk indicator is the number of times of being listed as a loss of credit performer in the last year, the source data of the target risk indicator may be records of being listed as loss of credit performers in the year of the target object, and the number of the records is counted, and the value of the number of the records is the risk data of the target risk indicator. The model input data set file is a file that records risk data for a plurality of target risk indicators for a target object.
In one embodiment, the data server may, in response to an abnormal risk assessment event for the target object, obtain, according to an obtaining path of each of the plurality of target risk indicators recorded in the fast lookup table in the resource exchange recording system or the at least one public information source, source data of each of the plurality of target risk indicators of the target object from the resource exchange recording system and the at least one public information source according to the obtaining path. The acquisition path may be a data file path or a website.
In one embodiment, the data server may store the risk data of each of the plurality of target risk indicators of the target object into the model input data set file according to a preset data format and a preset arrangement manner. The preset data format may be a plain text format or an encoded data format. The preset arrangement mode may be a matrix form formed by taking different target objects as one dimension and different target risk indexes as another dimension, and the matrix records respective corresponding risk data of the different target risk indexes of the different objects. The preset arrangement mode may also be that an independent data group exists corresponding to each target object, and the data group stores the risk data of each of the plurality of target risk indicators of the target object.
In one embodiment, the model input data set file includes a data storage area and an index area, the risk data of each of the multiple target risk indicators of the same target object is stored in the data storage area in a centralized manner, the index area records the corresponding relationship between the target object and a data start storage location, and the start storage location is a start location where the risk data of each of the multiple target risk indicators of the target object is stored in the data storage area. In this embodiment, an index area may exist in the model input data set file, and the positions of the risk data of each of the plurality of target risk indicators of the target object in the data storage area may be quickly located and stored by the index area. The data server may record, in the index area of the model input data set file, the target object and a starting storage location of the risk data for each of the plurality of target risk indicators for the target object after storing the risk data for each of the plurality of target risk indicators for the target object in the model input data set file.
Step 204, executing the model master file to run the target scoring model, and reading the risk data of each of the plurality of target risk indexes from the model input data set file.
The model master file is an executable file recorded with an operation code for realizing the target scoring model. The execution model master file may run the objective scoring model.
In one embodiment, the model input dataset file has stored therein risk data for each of a plurality of target risk indicators for a plurality of objects. The data server may traverse the plurality of objects recorded in the model input data set file, and read the respective risk data of the plurality of target risk indicators in the case of traversing to the target object.
In one embodiment, when the index area is stored in the model input data set, the data server may read a starting storage location of risk data of each of the plurality of target risk indicators of the target object from the data storage area of the model input data set file, and after the starting storage location is read, read risk data of each of the plurality of target risk indicators of the target object from the starting storage location.
In one embodiment, the plurality of target risk indicators may include a model operation indicator and a conditional judgment indicator. The data server may read the risk data of the condition judgment index from the model input data set file, and read the risk data of the model operation index from the model input data set file when the risk data of the condition judgment index is input into the target scoring model for processing and a processing result is characterized as risk data of the model operation index that needs to be further processed.
Step 206, inputting the risk data of each of the multiple target risk indexes into a target scoring model for processing, and obtaining the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from the plurality of candidate risk indicators during training.
Wherein the abnormal risk scoring level is the possibility that the resource to be recycled of the resource exchange participant is converted into the abnormal resource. The abnormal risk score level may include a high risk level, a medium risk level, or a low risk level. The candidate risk index is a risk index which is expected to have correlation with the abnormal risk scoring level before the target scoring model is trained.
In one embodiment, the data server may input risk data of each of the multiple target risk indicators into the target scoring model for processing, to obtain scoring data of the multiple target risk indicators, and further determine, according to preset grade division conditions, an abnormal risk scoring grade of the target object according to the scoring data of the target object, the scoring data of the target object output by the target scoring model.
In one embodiment, the data server may input the risk data of the condition determination indicator into the target scoring model for processing, and directly obtain the abnormal risk scoring level of the target object output by the target scoring model when the risk data of the condition determination indicator meets a preset condition.
In one embodiment, the plurality of target risk indicators may include a model operation indicator and a conditional judgment indicator. The data server can input the risk data of the condition judgment indexes into the target scoring model for processing, and when the risk data of the condition judgment indexes do not accord with preset conditions, the model operation indexes are input into the target scoring model for processing, so that the abnormal risk scoring level of the target object output by the target scoring model is obtained.
In the abnormal risk data processing method, the source data of a plurality of target risk indexes of a target object obtained by a resource exchange recording system and at least one public information source are preprocessed to obtain the risk data of the target risk indexes, so that a sufficiently rich data base related to abnormal risk is provided; and partial target risk indexes are screened from the multiple candidate risk indexes during target scoring model training, the reliability of predicting the abnormal risk scoring grade based on the target scoring model and the multiple target risk indexes is high, the risk data of the multiple target risk indexes of the target object are input into the target scoring model, the abnormal risk scoring grade of the target object is obtained, the accurate abnormal risk scoring grade can be obtained, and meanwhile the evaluation efficiency is improved.
In one embodiment, as shown in fig. 3, the objective scoring model is obtained through a training step, which includes the following steps 302 to 308:
step 302, for each sample object in the plurality of sample objects, sample source data of each of the plurality of candidate risk indicators for each sample object is obtained from the resource exchange recording system and the at least one public information source.
The Resource exchange recording system is a system for recording Resource exchange activities between different parties, for example, an Enterprise Management system for recording Resource exchange between an Enterprise and other enterprises or individuals, such as an ERP (Enterprise Resource Planning) system and a CRM (Customer Relationship Management) system. The public information source is a platform for disclosing various kinds of information, and can be various authenticated high-reliability information platforms, such as an industrial and commercial information website, a judicial information website, a personal credit information service platform and the like. And the sample object is a resource demand party which needs to be observed when a target scoring model is obtained through training. The sample source data of the candidate risk indicator is raw data of the candidate risk indicator obtained directly from the resource exchange record system or at least one public information source.
Candidate risk indicators are relevant indicators that may cause abnormal resources to occur for the enterprise on demand. For example, in the case that the resource is a financial resource and the resource to be recycled is accounts receivable of the provider enterprise to the demander enterprise, the candidate risk indicator may be a relevant indicator that may cause the demander enterprise to generate bad accounts. The enterprise of the demand party usually should pay receivable accounts within the appointed time, and the action that the enterprise of the demand party does not pay the receivable accounts according to the appointed time can be called overdue; the abnormal resource can be a bad account, and the bad account is the receivable account which can not be recovered.
The candidate risk indexes can be partial risk indexes in a bad account risk index library of a demander enterprise, and the bad account risk index library has various indexes including enterprise basic information indexes, business risk indexes, judicial risk indexes, financial risk indexes and the like. For example, the financial risk indicators include multiple dimensional indicators such as account age and overdue information, and the risk indicators with dimensions of account age and overdue information include multiple risk indicators with multiple days actually formed by days, for example, receivable balances that will expire in the last 7/30/60/120/150/180 days actually include receivable balances that will expire in the last 7 days, receivable balances that will expire in the last 30 days, receivable balances that will expire in the last 60 days, receivable balances that will expire in the last 120 days, receivable balances that will expire in the last 150 days, and receivable balances that will expire in the last 180 days, and a receivable balance is a difference between a receivable and a collected account, that is, unrellected receivable.
In one embodiment, the data server may obtain, according to an obtaining path of each of the plurality of candidate risk indicators recorded in the sample fast lookup table in the resource exchange recording system or the at least one public information source, sample source data of each of the plurality of candidate risk indicators of each sample object from the resource exchange recording system and the at least one public information source according to the obtaining path.
Step 304, for each candidate risk index of the multiple candidate risk indexes, performing sample preprocessing on the sample source data of the multiple sample objects under the candidate risk index, and obtaining sample risk data of the multiple candidate risk indexes.
The sample preprocessing is a process of performing a series of data processing on sample source data to meet the data requirements of input model training. Sample preprocessing may include data cleaning, data integration or data transformation, etc.; data cleaning such as missing value filling and abnormal value correction, data integration such as statistics of sample source data according to candidate risk indexes, unified storage of the sample source data to a specific position of a data server, data transformation such as normalization, and discretization of the data through characteristic binning processing.
In an embodiment, the data server may perform data normalization on sample source data of each of the plurality of sample objects under each candidate risk indicator to obtain normalized data, classify the same normalized data in the normalized data of each of the plurality of sample objects under each candidate risk indicator to obtain classified data, perform feature binning processing on the classified data when the data amount of the classified data is greater than a threshold, and obtain sample risk data of each of the plurality of candidate risk indicators based on a result of the feature binning processing of the plurality of candidate risk indicators.
And step 306, performing iterative training based on the plurality of candidate risk indexes, iterating and traversing each currently remaining candidate risk index every time, training to obtain a training model based on the sample risk data of the traversed candidate risk index and the sample risk data of the screened target risk index, evaluating the effect of the model, screening out the candidate risk index corresponding to the training model with the optimal model effect as the target risk index, and stopping iteration until the effect of each remaining candidate risk index on the corresponding training model is invalid.
The training is a process of presenting sample risk data of each of a plurality of candidate risk indicators with labels to a model, and enabling the model to learn relationships between the plurality of candidate risk indicators and the labels to form an ideal model. The label is a mark for representing whether the sample object has abnormal resources or not. Iterative training is the process of repeating the training process to reach a training goal. The training model is a mathematical model obtained by training sample risk data adopted in iteration through a specific algorithm. The output of the training model may be the probability of the sample object having abnormal resources. The specific algorithm can implement a binary task, and the specific algorithm may be, for example, a logistic regression algorithm, an XGBoost algorithm (efficient Gradient Boosting decision tree algorithm), a neural network algorithm, or the like. The model effect is the accuracy degree of predicting whether the sample object has abnormal resources or not through the training model. The model effect may be determined by calculating the evaluation parameters of the training model. The evaluation parameter may be a parameter characterizing the ability of the training model to distinguish between sample objects with anomalous resources and sample objects without anomalous resources.
For example, the evaluation parameters of the training model are AUC values (Area of the Curve of Area Under the Curve of ROC (Receiver Operating characteristics) and coordinate axis), KS values (Kolmogorov-Smirnov, absolute value of the maximum difference between true rate and false rate). The AUC value reflects the average level of the training model for distinguishing good sample objects from bad sample objects, the value range of the AUC value is from 0.5 to 1, the greater the value of the AUC value is, the better the model effect is, but the overfitting problem may exist when the AUC value is too high; the KS value reflects the best condition of a training model for distinguishing good sample objects from bad sample objects, the value range of the KS value is from 0 to 1, the larger the value of the KS value is, the better the model effect is, but the too high KS value possibly causes the problem of overfitting of the training model. The good sample object may be a sample object where no abnormal resource exists, and the bad sample object may be a sample object where an abnormal resource exists. The true rate is the proportion of all sample objects that are actually good sample objects and are predicted by the trained model to be good sample objects, and the false positive rate is the proportion of all sample objects that are actually bad sample objects but are predicted by the trained model to be good sample objects.
Wherein, the correspondence between the AUC value and the KS value and the model effect of the training model may be: when the AUC value is 0.5 and/or the KS value is less than 0.2, the characteristic training model has no distinguishing capability; AUC value is greater than 0.5 and less than or equal to 0.7, and/or KS value is greater than or equal to 0.2 and less than or equal to 0.3, the characteristic training model has weak distinguishing ability; (ii) an AUC value greater than 0.7 and less than or equal to 0.8, and/or a KS value greater than 0.3 and less than or equal to 0.5, to an acceptable degree of characterizing training model discriminatory ability; when the AUC value is more than 0.8 and less than or equal to 0.9, and/or the KS value is more than 0.5 and less than 0.75, the distinguishing capability of the characterization training model is strong; when the AUC value is greater than 0.9 and less than or equal to 1, and/or the KS value is greater than or equal to 0.75 and less than or equal to 1, it may characterize that the training model is overfitting, and the training model needs to be used after being verified based on business experience.
The ROC curve is a curve drawn by taking the true rate as an ordinate axis, the false positive rate as an abscissa axis and the truncation point of the sample object as a point on the curve, wherein the truncation point is a probability value of each sample object which is predicted and output by the training model and belongs to a good sample object. When the ROC curve is drawn, sample objects are traversed, the traversed sample object interception points are used as critical values of prediction results, the interception points of all sample objects are compared with the critical values, when the interception points of the sample objects are larger than or equal to the critical values, the sample objects are considered to be good sample objects by the training model, when the interception points of the sample objects are smaller than the critical values, the sample objects are considered to be bad sample objects by the training model, the result that each sample object is predicted to be a good sample object or a bad sample object by the training model is obtained, the true rate and the false positive rate of each interception point are calculated by referring to the actual condition of each sample object until the true rate and the false positive rate corresponding to the interception points of all the sample objects are obtained, respective coordinates are formed by the true rate and the false positive rate of each interception point, and the ROC curve is drawn. The area formed by the ROC curve and the coordinate axis is an AUC value, and the absolute value of the difference value between the real rate and the false positive rate is the KS value when the absolute value is maximum.
In one embodiment, prior to iterative training, the data server may generate a label for each sample object to determine whether an anomalous resource exists for each sample object. In this embodiment, the data server may determine, for each sample object, whether sample risk data of a preset risk index meets a preset abnormal condition; under the condition that the sample risk data of the preset risk index of the sample object meets a preset abnormal condition, endowing the label of the sample object with a value representing that abnormal resources exist; and under the condition that the sample risk data of the preset risk index of the sample object does not accord with a preset abnormal condition, endowing the label of the sample object with a value representing that no abnormal resource exists.
The preset risk index is a preset risk index which can directly determine whether abnormal resources exist in the sample object. The preset abnormal condition is a condition for determining that abnormal resources exist in the sample object according to the sample risk data. For example, when the sample object is an enterprise, the preset risk index may be an operation state of the enterprise, and the preset risk preset abnormal condition may be that when the operation state of the enterprise is represented as stopping operation, in this embodiment, when the operation state of the enterprise is represented as stopping operation, it indicates that accounts receivable of the enterprise cannot be completely recovered, which results in abnormal resources.
And 308, constructing a target scoring model based on the target training model obtained after the iteration is stopped.
Wherein the target training model is a training model obtained by training.
In one embodiment, the data server may output a plurality of target risk indicators constituting the target training model and respective model coefficients of the plurality of target risk indicators after stopping the iteration, construct the target scoring model based on the risk data of the plurality of target risk indicators and the respective model coefficients of the plurality of target risk indicators, such that when the risk data of the plurality of target risk indicators of the target object is input, for the target object, the scoring data of each target risk indicator is obtained based on the risk data of each target risk indicator and the model coefficients of each target risk indicator, thereby obtaining the scoring data of the target object, and determine the abnormal risk scoring level of the target object according to the scoring data of the target object.
In the embodiment, the sample risk source data of a plurality of candidate risk indexes is used for sample preprocessing, so that the interference of noise data on a training model obtained by subsequent training can be reduced, and the training model with a better model effect can be obtained; and the sample risk data of a plurality of candidate risk indexes are subjected to iterative training, the candidate risk indexes screened out when the model effect is optimal are ensured to serve as at least one part of target risk indexes, the obtained training model is used for constructing a target scoring model, and the target risk indexes obtained by a scientific training method and the target scoring model constructed by the training model can enable the accuracy of the abnormal risk scoring level of the subsequent predicted target object to be higher.
In one embodiment, the screened candidate risk indicator is a target risk indicator as a model operation indicator; the step of constructing the target scoring model based on the target training model obtained after stopping iteration comprises the following steps: and constructing a target scoring model based on the preset condition judgment index and the target training model obtained after the iteration is stopped, so that the target scoring model scores and outputs an abnormal risk scoring grade when inputting data which do not accord with the preset condition, and directly outputs a preset abnormal risk scoring grade representing the existence of abnormal risk when inputting data which accord with the preset condition.
Wherein the abnormal risk score level may include a plurality of risk levels, such as a high risk level, a medium risk level, and a low risk level. The existence of abnormal risks characterizes the high possibility that the sample object has abnormal resources. The preset abnormality risk score level may be the highest risk level of the plurality of risk levels, such as a high risk level.
The model operation index is a risk index that needs to be calculated to output the abnormal risk score level of the target object. When the target training model is obtained through the training of the logistic regression algorithm, the formula of the target training model is shown as formula (1), and the scoring data can be obtained through formula (2):
Figure BDA0003935006190000081
score = a-B x ln (odds) formula (2)
Wherein the content of the first and second substances,
Figure BDA0003935006190000082
odds is the odds, which is the ratio of the probability of a bad sample object to the probability of a good sample object, x 1 、x 2 …x n Is a plurality of (n) model operation indexes, and for each model operation index in the plurality of model operation indexes, the value thereof can be the respective WOE value, w of a plurality of characteristic sub-box data of the targeted model operation index 1 、w 2 …w n Is a model coefficient of the model operation index. score is the scoring data that is the difference between the constant a and the logarithm of the odds, which is the product of the logarithm of the odds and the constant B.
The preset condition judgment index is a preset risk index for directly determining the target object with the abnormal risk. The preset condition is a preset condition for outputting the abnormal risk grade of the target object through the condition judgment index or outputting the abnormal risk grade of the target object through the condition judgment index and the model operation index.
For example, when whether the target object has bad accounts is predicted, the condition judgment index may be a business state index, an overdue number of days index, an account age index and a settlement type index; the preset condition can be that the risk data of the business state index represents any one of suspension sale, logout, stop business and settlement, or the risk data of the overdue days index represents that the overdue days exceed 180 days, or the risk data of the account age index represents that the account age exceeds 5 years and the risk data of the settlement type index represents that the target object is of a non-internal settlement type.
In one embodiment, the data server may obtain the model operation index and the model coefficient of the model operation index output after the iteration is stopped in the training step, and construct the target scoring model based on the preset condition judgment index, the preset model operation index and the preset model coefficient of the model operation index.
In the embodiment, the target scoring model is built based on the condition judgment indexes and the target training model, so that when the abnormal risk scoring level of the target object is predicted through the target scoring model subsequently, the risk data of the condition judgment indexes can be processed preferentially, when the risk data of the condition judgment indexes of the target object meet the preset conditions, the fact that the target object has abnormal risk is directly known, and the data processing efficiency is high.
In one embodiment, step 304 includes: carrying out data normalization on sample source data of each candidate risk index of each sample object to obtain sample normalization data of each candidate risk index of each sample object; for each candidate risk index in the candidate risk indexes, carrying out characteristic binning processing on the sample regular data of each of the plurality of sample objects under the candidate risk index to obtain a plurality of characteristic binning data of each of the candidate risk indexes; and carrying out feature coding processing on a plurality of feature binning data of each candidate risk index to form sample risk data of each candidate risk index.
The data normalization is to normalize the data which is not beneficial to identification in the sample source data. Data warping may be non-numerical data processing, such as converting text in sample source data into numerical values; data warping may be padding for missing values; or performing data statistics on the sample source data according to the candidate risk indexes, so that each target object under each candidate risk index corresponds to one sample structured data, or other data structured modes.
The feature binning is to discretize the sample risk data of each of the plurality of candidate risk indicators to distribute the sample risk data of each of the plurality of sample objects of each candidate risk indicator in a discretized data interval. The characteristic binning can reduce the complexity of sample risk data of multiple candidate risk indexes, reduce the influence of abnormal data in the sample risk data on a subsequent training model, and improve the robustness of the training model.
The feature binning data is a plurality of data intervals after feature binning. The feature encoding is processing for mapping sample normalized data Of each Of the plurality Of candidate risk indicators to a bin Weight value Of feature bin data, and the bin Weight value may be a WOE (Weight Of Evidence) value. And after the characteristic binning data is obtained, the characteristic binning data of the candidate risk index can be evaluated by obtaining an IV Value (information Value, for evaluating the risk index or the capability of the characteristic binning data to predict the abnormal risk) of the characteristic binning data so as to adjust the characteristic binning data. For each candidate risk indicator in the plurality of candidate risk indicators, the WOE value may be calculated by the following formula (3), and the IV value may be calculated by the following formulas (4) and (5):
Figure BDA0003935006190000091
Figure BDA0003935006190000092
Figure BDA0003935006190000093
wherein, WOE i And characterizing the WOE value of the ith bin, namely the ith characteristic bin data, in the aimed candidate risk indicator. Bad i Representing the number of Bad sample objects, bad in the ith sub-box in the targeted candidate risk indicator T Number of bad sample objects, good, in all bins characterizing the targeted candidate risk indicator i The number of Good sample objects in the ith sub-box in the candidate risk index for characterization, good T The number of good sample objects in all bins that characterize the targeted candidate risk index. IV i Is the IV value of the ith bin of the targeted candidate risk indicator, and IV' is the IV value of all bins in the targeted candidate risk indicator. The bad sample object may be a sample object in which an abnormal resource exists, and the good sample object may be a sample object in which an abnormal resource does not exist.
WOE i Is the difference between the logarithm of the proportion of bad sample objects in the ith bin and the logarithm of the proportion of good sample objects in the ith bin of the candidate risk indicator. The sub-box bad sample object proportion is the ratio of the number of the sub-box bad sample objects to the number of the bad sample objects of all sub-boxes of the corresponding candidate risk index, and the sub-box good sample object proportion is the ratio of the number of the sub-box good sample objects to the number of the good sample objects of all sub-boxes of the corresponding candidate risk index.
IV i Is the product of the difference between the good and bad proportion of the bin and the WOE value of the bin in the ith bin of the candidate risk indicator, and the difference between the good and bad proportion of the bin is the difference between the good sample object proportion of the bin and the bad sample object of the bin. IV' is the IV of all bins in the candidate risk index for i The accumulated value of (2).
The value range of the IV value is from 0 to positive infinity, and the larger the IV value is, the better the prediction capability of the candidate risk index is. The correspondence between the IV value of the candidate risk indicator and the predictive power of the candidate risk indicator may be: when the IV value of the candidate risk index is less than 0.02, the risk index is an invalid index; when the IV value of the candidate risk index is greater than or equal to 0.02 and less than 0.1, the risk index is a weak effect index; when the IV value of the candidate risk index is greater than or equal to 0.1 and less than or equal to 0.5, the candidate risk index is an effective index; when the IV value of the candidate risk indicator is greater than 0.5, the candidate risk indicator is a strong indicator, which may have an unreal problem and needs to be determined by combining with business experience. When the feature binning is performed, feature binning data of each candidate risk index of the multiple binning types divided by different methods or standards can be obtained first, an IV value of each candidate risk index of the multiple binning types can be calculated, and the feature binning data with the optimal IV value is selected as the feature binning data of the candidate risk index.
In one embodiment, the data server may perform feature binning processing on each candidate risk indicator in the plurality of candidate risk indicators to obtain a plurality of feature binning data of the candidate risk indicator, calculate a WOE value of each feature binning data of the candidate risk indicator through formula (3) for the plurality of feature binning data of the candidate risk indicator, and use the WOE value of each feature binning data as sample risk data of the candidate risk indicator.
In the embodiment, the characteristic binning processing is performed on the sample normalized data of each of the multiple candidate risk indicators, so that the data discretization is realized, the data volume of the sample normalized data of each of the multiple candidate risk indicators is reduced, the stability of the data is ensured, and the overfitting problem of a training model obtained by subsequent training can be avoided; and a plurality of characteristic sub-box data of a plurality of candidate risk indexes are subjected to characteristic coding processing, each characteristic sub-box data has a weight, the data has certain complexity, the model effect of a training model can be improved, and the accuracy of predicting the abnormal risk scoring level of a target object through a target scoring model is further improved.
In one embodiment, the plurality of target risk indicators includes a model operation indicator and a condition judgment indicator; inputting the risk data of each target risk index into a target scoring model for processing, and acquiring the abnormal risk scoring level of the target object output by the target scoring model, wherein the step comprises the following steps of: inputting the risk data of the model operation index and the risk data of the condition judgment index into a target scoring model for processing; when the risk data of the condition judgment index do not accord with the preset condition, processing the risk data of the model operation index through the target scoring model to obtain the scoring data of the target object; and determining the abnormal risk scoring level to which the scoring data of the target object belongs according to a preset grade dividing condition.
Wherein, the scoring data is the scoring data output by the target scoring model for rating the target object. The higher the score data, the lower the risk of abnormality for the target object. The preset grade dividing condition is a condition for mapping the grade data into an abnormal risk grade.
The preset grade dividing condition may be that preset grade data is used as a dividing point to divide the abnormal risk grade of the grade data; for example, the abnormal risk score grades comprise a high risk grade and a low risk grade, and when the score data of the target object is greater than or equal to the preset score data, the abnormal risk score grade of the target object is determined to be the low risk grade; and when the score data of the target object is smaller than the preset score data, determining the abnormal risk score level of the target object as a low risk level.
The preset grading condition may be that a KS curve is constructed based on the scoring data of the plurality of sample objects to obtain a KS value, and may also be determined based on a cumulative probability distribution function of the scoring data of a bad sample object in the plurality of sample objects based on the KS value of the scoring data of the plurality of sample objects, where the scoring data of the plurality of sample objects is obtained by inputting sample risk data of a target risk index of each sample object in the plurality of sample objects into a target scoring model. For example, in a specific application scenario, as shown in fig. 4, the abnormal risk score levels include a high risk level, a medium risk level, and a low risk level, and the score data when the cumulative probability of a bad sample object is 60% and the cumulative proportion of the sample object is 25% is rounded up and used as a first demarcation point, that is, 500; rounding the score data when the constructed KS value of the score data of the plurality of sample objects is maximum to be used as a symmetrical point, namely 600, and obtaining score data symmetrical to a second boundary point symmetrical to the first boundary point to be used as a third boundary point, namely 700; the abnormal risk score of the target object is a high risk score when the score data of the target object is less than or equal to 500, a medium risk score when the score data of the target object is greater than 500 and less than 700, and a low risk score when the score data of the target object is greater than or equal to 700.
In one embodiment, when a plurality of condition judgment indexes and a plurality of preset conditions exist, the data server may compare the risk data of each of the plurality of condition judgment indexes of the target object with the plurality of preset conditions one by one, and when the risk data of each of the plurality of condition judgment indexes of the target object meets any one of the preset conditions, output preset abnormal risk score data representing that the target object has an abnormal risk, and output a preset abnormal risk score level representing that the target object has an abnormal risk.
In one embodiment, when there are a plurality of model operation indexes and the risk data of each of the plurality of condition judgment indexes of the target object does not meet the preset condition, the data server may obtain, for each of the plurality of model operation indexes of the target object, the score data of the corresponding model operation index from the risk data of the corresponding model operation index to obtain the score data of each of the plurality of model operation indexes, and obtain the score data of the target object based on the score data of each of the plurality of model operation indexes.
In the embodiment, the target risk index of the target object is input into the constructed target scoring model for processing, the abnormal risk scoring level of the target object is automatically output, and the evaluation efficiency is high; and when the target scoring model is processed, the risk data of the condition judgment indexes of the target object are preferentially processed, when the risk data of the condition judgment indexes of the target object accord with preset conditions, the fact that the target object has abnormal risks is directly known, when the risk data of the condition judgment indexes of the target object do not accord with the preset conditions, the abnormal risk scoring grade of the target object is obtained through the risk data of the model operation indexes, and the evaluation efficiency is further improved.
In one embodiment, the abnormal risk score rating of the target object is stored in a model output result set file; the abnormal risk data processing method also comprises the step of reading and displaying the abnormal risk scoring grade and the abnormal resource prediction quantity of the service object, and the step comprises the following steps: predicting the abnormal resource prediction amount of the target object through an abnormal resource prediction model, and storing the abnormal resource prediction amount in an abnormal resource prediction amount file; responding to a trigger event of an abnormal prediction function triggered by a target identity, and displaying a service object having a service relation with the target identity; when the service object belongs to the target object, reading the abnormal risk scoring level of the service object from the model output result set file, and displaying the abnormal risk scoring level of the service object; and when the abnormal risk grade of the service object is a preset abnormal risk grade representing that the abnormal risk exists, reading and displaying the abnormal resource prediction of the service object from the abnormal resource prediction file.
The model output result set file is a file for recording respective scoring data and respective abnormal risk scoring grades output by the plurality of target objects through the target scoring model. The abnormal resource amount prediction model is a mathematical model for obtaining an abnormal resource prediction amount of the target object. An abnormal resource prediction quantity is the amount of abnormal resources that the target object is expected to produce. The amount of unusual resources such as bad account amounts. The abnormal resource prediction amount file is a file that stores abnormal resource prediction amounts of a plurality of target objects for reading.
The target identity is a computer identity with the authority to trigger the exception prediction function. The service relationship is that the target identity provides services for the service object, for example, help the service object apply for resources, initiate risk reminding for the service object with abnormal risk, recycle resources to the service object, and the like. The service object is an object for acquiring a service from the target identity or an organization in which the target identity is located, and the service object may be a resource demander.
The abnormal prediction function is a software function for predicting the abnormal risk of the service object, and can predict at least the grade of the abnormal risk score of the service object and obtain the abnormal resource prediction amount of the service object. The trigger event of the abnormality prediction function is an event for enabling the abnormality prediction function to predict the risk of abnormality of the service object, and may be an automatic trigger event or a manual trigger operation. The automatic trigger event may be an automatic trigger when the target identity starts an application where the abnormality prediction function is located, or an automatic trigger when the abnormality prediction function is automatically started after being turned off by mistake. The manual trigger operation may be a click operation of the target identity on identification information of the abnormality prediction function, and the identification information may be an icon or a character representing the abnormality prediction function.
In one embodiment, the terminal may display a service object having a service relationship with the target object in response to a trigger event of the abnormal prediction function triggered by the target identity, and when the service object belongs to the target object, the terminal may instruct the data server so that the data server queries the service object from an object fast lookup table in which the target object is recorded, and when the data server queries the service object, read an abnormal risk score level of the service object from the model output result set file and transmit the abnormal risk score level of the service object to the terminal for display.
In one embodiment, when the abnormal risk score level of the service object is a preset abnormal risk level representing that an abnormal risk exists, the terminal may instruct the data server, so that the data server reads the abnormal resource prediction amount of the service object from the abnormal resource prediction amount file, and the data server sends the abnormal resource prediction amount of the service object to the terminal for display.
In the embodiment, the abnormal risk score level of the target object is stored in the model output result set file, and when the abnormal risk score level of the service object is required to be displayed and the service object belongs to the target object, the abnormal risk score level is directly read and displayed from the model output result set file, so that the response speed is high; and when the abnormal risk grade of the service object indicates that the service object has the abnormal risk, the abnormal resource prediction quantity of the service object is read from the abnormal resource prediction quantity file, so that the related information quantity of the abnormal risk is more, and the target identity can know the abnormal degree of the service object.
In one embodiment, the step of predicting the abnormal resource prediction amount of the target object by the abnormal resource amount prediction model includes: responding to an abnormal resource prediction event aiming at a target object, and acquiring the quantity of resources to be recovered within a plurality of preset abnormal time length ranges in a current time period and a preset historical statistical period; acquiring the resource migration proportion of each of a plurality of preset abnormal time length ranges in each historical time period in a preset historical statistic period; the resource migration proportion is determined based on the quantity of the resources to be recovered in each preset abnormal time length range of each historical time period and the quantity of the resources to be recovered in the last preset abnormal time length range of the previous historical time period; determining the predicted loss proportion of each preset abnormal time length range in a preset history statistical period based on the resource migration proportion of each preset abnormal time length range in each history time period; and determining the abnormal resource prediction amount of the target object according to the resource amount to be recovered in each preset abnormal time length range of the current time period and the prediction loss proportion of each preset abnormal time length range in the preset historical statistical period.
The abnormal resource prediction event is an event for predicting the abnormal resource prediction amount of the target object, and may be an automatic trigger event or a manual trigger event. The automatic triggering event can be regularly triggered or automatically triggered when a target object is newly added in the system; the manual trigger event may be a click operation on identification information of the abnormal resource prediction function key.
The current period is a period of time in which the prediction of the abnormal resource of the target object is performed, for example, the current month. The preset historical statistic cycle is a period of time within a preset time period before the current time period, for example, the current month is 1 month 2021 year, and the preset time period is 6 months, and the preset historical statistic cycle is 7 months in 2020 to 12 months in 2020.
The preset abnormal duration range is a preset duration range in which the resources to be recovered of the target object exceed the recovery deadline. Each of the preset abnormal duration ranges is arranged continuously in time. For example, when the abnormal resource prediction amount is the bad account amount prediction amount, the resource to be recovered may be accounts receivable or balances receivable, the preset abnormal duration range may be a plurality of overdue days, each overdue days range corresponds to an overdue stage, as shown in fig. 5, the overdue stages include 8 overdue stages including 0 to 7, the overdue days range corresponding to the overdue stage 0 is 0 day, the overdue days range corresponding to the overdue stage 1 is 1 to 30 days, the overdue days range corresponding to the overdue stage 2 is 31 to 60 days, the overdue days range corresponding to the overdue stage 3 is 61 to 90 days, the overdue days range corresponding to the overdue stage 4 is 91 to 120 days, the overdue days range corresponding to the overdue stage 5 is 120 to 150 days, the overdue days range corresponding to the overdue stage 6 is 151 to 180 days, and the overdue days range corresponding to the overdue stage 7 is 181 or more.
The historical period is a time period divided by a fixed time length within a preset historical period, for example, the preset historical period is 2020 year 7 month to 2020 year 12 month, and each historical period may be 2020 year 7 month, 2020 year 8 month, 2020 year 9 month, 2020 year 10 month, 2020 year 11 month, and 2020 year 12 month. The resource migration proportion is the proportion of the to-be-recovered resource in each preset abnormal time length range of each history time period, which is migrated from the last preset abnormal time length range of the last history time period. The resource migration ratio can be calculated by formula (6):
Figure BDA0003935006190000121
wherein, M (i- [ i +1 ]) is the resource to be recovered, which is migrated from the ith preset abnormal time length range of the previous history period, of the i +1 th preset abnormal time length range of each history period, mi is the resource to be recovered of the ith preset abnormal time length range of the previous history period, FR is the resource migration proportion of the i +1 th preset abnormal time length range of each history period, and FR represents the proportion of the resource to be recovered, which is migrated from the ith preset abnormal time length range of the previous history period, of the i +1 th preset abnormal time length range of each history period, to the resource to be recovered of the ith preset abnormal time length range of the previous history period.
The predicted loss proportion is the proportion of the predicted loss resources in each preset abnormal duration range. The predicted loss ratio can be calculated by equation (7):
Figure BDA0003935006190000122
among them, LR i Is the predicted loss ratio of the ith preset abnormal time length range,
Figure BDA0003935006190000123
is the average migration ratio of the ith preset abnormal time length range,
Figure BDA0003935006190000124
is the average migration proportion of the (i + 1) th preset abnormal time length range,
Figure BDA0003935006190000125
is the average migration ratio of the last preset abnormal duration range in the plurality of preset abnormal duration ranges. The predicted loss proportion of the ith preset abnormal time length range is a value obtained by accumulating the average migration proportion of the ith preset abnormal time length range to the average migration proportion of the last preset abnormal time length range.
The abnormal resource prediction amount can be calculated by the formula (8):
BD=∑Mi'*LR i formula (8)
BD is the abnormal resource prediction amount of the target object in the current time period, mi' is the amount of the resource to be recycled in the ith preset abnormal time length range of the target object in the current time period, and LR i Is the predicted loss ratio, LR, of the ith preset abnormal duration range i Calculated by the formula (7). The abnormal resource prediction amount of the target object in the current time period is the accumulated value of the product of the quantity of the resources to be recovered in each preset abnormal time range and the prediction loss proportion of the preset abnormal time range in the current time period.
For example, in the case where the resource to be recovered is an adoptive balance and the overdue period is as shown in fig. 5, the adoptive balances in each of the overdue periods of 7 months to 12 months in 2020 are as shown in fig. 6, except for the overdue period 7, the adoptive balances in the overdue period 1 to the overdue period 6 in each of 8 months to 12 months in 2020 can only be shifted from the last overdue period in the previous month, for example, the adoptive balance in the overdue period 2 in 8 months in 2020 is shifted from the adoptive balance in the overdue period 1 in 7 months in 2020. The overcast phase 7 accounts for the respective months 8 and 12 in 2020, including accounts for transitions from the overcast phase 6 of the previous month to the overcast phase 7 of the current month, and accounts for transitions from the overcast phase 7 of the previous month to more than the overcast phase 7 of the current month. For calculating the portions of the receivable balances of overdue stage 7 in each of months 2020 to 2020 and 12, which are shifted from overdue stage 6 in the previous month and overdue stage 7 in the previous month, reference may be made to the auxiliary calculation rows in fig. 6. The receivables balance from the last overdue stage of the last month in each of the 8 th to 12 th months in 2020 is shown in fig. 7, for example, the receivables balance from the overdue stage 1 of 8 th month transferred from the overdue stage 0 of 7 th in 2020 is 62600, and the transfer amount of the transfer stage 0-1 marked as 8 th month in 2020 is 62600.
Referring to fig. 6 and 7, the resource migration ratio and the average mobility of the overdue stages 0 to 7 from 8 months to 12 months in 2020 can be calculated, as shown in fig. 8, the resource migration ratio of the overdue stages 0 to 7 respectively corresponds to the migration stages 0-1 to 7 or more, for example, the resource migration ratio of the overdue stages 0 of 8 months in 2020 (corresponding to the migration stages 0-1 of 8 months, i.e., the overdue stages 0 to 1 of 2020 and 7 to 8 months in 2020) is 62600/7000=89.4%, the average mobility ratio is an average value of the resource migration ratio of each month in each overdue stage, and the average mobility ratio is an average mobility ratio of the overdue stages 2020 and 8 to 12 months in fig. 8.
The predicted overdue loss ratio is the predicted overdue loss ratio, the calculation formulas of the predicted overdue loss ratios of the overdue stage 0 to the overdue stage 7 are shown in fig. 9, and the predicted overdue loss ratios of the overdue stage 0 to the overdue stage 7 shown in fig. 10 are obtained according to the calculation formulas of the predicted overdue loss ratios of the overdue stage 0 to the overdue stage 7 and the average migration ratio of the migration stage 0-1 to the migration stage 7 shown in fig. 8. Where the migration phase 0-1 in FIG. 8 may be considered to correspond to the overdue phase 0 in FIG. 9, and so on. The abnormal resource prediction amount can be predicted bad account amount, and the predicted bad account amount from overdue stage 0 to overdue stage 7 and the predicted bad account amount of the target object are calculated and obtained according to the predicted overdue loss proportion from overdue stage 0 to overdue stage 7 and the receivable balance from overdue stage 0 to overdue stage 7 of the current month shown in fig. 10 and are shown in fig. 11. The predicted bad account amount of the target object is an accumulated value of the predicted bad account amounts from overdue stage 0 to overdue stage 7.
In one embodiment, the data server may obtain a resource migration proportion of each preset abnormal duration range of each historical period of the target object, obtain an average migration proportion of each preset abnormal duration range according to the resource migration proportion of each preset abnormal duration range of each historical period, and calculate the average migration proportion of each preset abnormal duration range according to formula (7) to obtain a predicted loss proportion of each preset abnormal duration range within a preset historical statistical period of the target object. The method comprises the steps of obtaining the amount of resources to be recovered in a plurality of preset abnormal time length ranges of a current time period of a target object, calculating the prediction loss proportion of each preset abnormal time length range in a preset historical statistic period of the target object and the amount of resources to be recovered in the plurality of preset abnormal time length ranges of the current time period of the target object according to a formula (8), and obtaining the abnormal resource prediction amount of the target object.
In the embodiment, the abnormal resource prediction amount of the target object is obtained through a scientific method, and the method has a reference value. Specifically, a predicted loss proportion is obtained based on the respective to-be-recovered resource amount of a plurality of preset abnormal time length ranges of the target object in each history time period in a preset history statistical period and the transferred to-be-recovered resource amount, the predicted loss proportion of each preset abnormal time length range in the preset history statistical period is used as the predicted loss proportion of each preset abnormal time length range in the current time period, and the finally obtained abnormal resource predicted amount of the target object is accurate.
In a specific application scenario, the resource may be a financial resource, the resource to be recycled may be an account receivable balance, the abnormal resource may be a bad account, and the grade of the risk score for predicting the abnormal risk of the target object may be a grade of a risk for predicting the bad account of the target enterprise. The specific steps of the abnormal risk data processing method may be as shown in fig. 12, and the specific steps are as follows:
the data server may mine data from resource exchange systems such as ERP systems, public information sources such as judicial websites, etc. to obtain multi-source heterogeneous data, for example, fig. 13 is a list of data types that can be mined by the public information sources. A plurality of sample objects may be screened from the mined data, and sample source data for each of a plurality of candidate risk indicators may be obtained for each of the plurality of sample objects. Among them, the multi-source heterogeneous data is data with various sources, and complex types and forms. The plurality of candidate risk indicators may be part of the risk indicators in a bad account risk indicator library. The bad account risk index database can be constructed after data integration, classification and reconstruction are carried out on the mined data. The sample object is an enterprise.
The data server may store the mined data in a special data pool, and perform a structural process, where the structural process establishes a corresponding relationship between the risk indicator and field information included in the risk data of the risk indicator, for example, as shown in fig. 14, for a data service function interface of the ERP system, when different risk indicators are clicked, field information corresponding to the clicked risk indicators is displayed. The data server may perform sample preprocessing on sample source data of each of the multiple candidate risk indicators, for example, perform data statistics according to each candidate risk indicator to obtain sample structured data of each of the multiple candidate risk indicators, and define whether each of the multiple sample objects is a bad account client, so as to form 5000 pieces of training data, where a part of the training data is shown in fig. 15. Wherein, 5000 pieces of training data comprise 5000 enterprises and comprise 44 candidate risk indicators.
The data server can perform characteristic binning processing on sample regular data of each of multiple candidate risk indicators in 5000 pieces of training data to obtain multiple characteristic binning data of each of the multiple candidate risk indicators, and perform characteristic coding processing on the multiple characteristic binning data of each of the multiple candidate risk indicators to obtain a WOE value of each of the multiple characteristic binning data under each candidate risk indicator.
The data server can perform feature binning processing and feature coding processing for multiple times according to respective sample structured data of multiple candidate risk indexes, calculate respective IV values of the multiple candidate risk indexes each time the feature binning processing and the feature coding processing are performed, take the multiple feature binning data with the respective IV values of the multiple candidate risk indexes being optimal as the multiple feature binning data of the respective candidate risk indexes according to each candidate risk index of the multiple candidate risk indexes, and obtain respective WOE values of the multiple feature binning data of the respective candidate risk indexes at that time, thereby forming respective sample risk data of the multiple candidate risk indexes.
The data server can carry out iterative training based on sample risk data of each of the multiple candidate risk indexes, each iteration traverses each currently remaining candidate risk index, training is carried out based on sample risk data of the traversed candidate risk index and sample risk data of the screened target risk index to obtain a training model, model effect is evaluated, the candidate risk index corresponding to the training model with the optimal model effect is screened out to serve as the target risk index, and iteration is stopped until the effect of each remaining candidate risk index on the corresponding training model is invalid. The candidate risk indexes screened after the iteration is stopped are target risk indexes serving as model operation indexes, and the model operation indexes can include indexes such as the number of days of average overdue of 365 scales, the total amount of overdue of 30 days, the total amount of overdue, the total number of overdue and the like.
The data server can construct a target scoring model based on preset condition judgment indexes and a target training model obtained after iteration is stopped, so that the target model performs scoring processing on data which do not accord with preset conditions and outputs an abnormal risk scoring grade when the target model inputs the data which do not accord with the preset conditions, and the preset abnormal risk scoring grade which represents that abnormal risks exist is directly output when the data which accord with the preset conditions are input.
When the data server maps the scoring data output by the target scoring model into the abnormal risk level, a KS curve can be constructed by using the scoring data of a plurality of sample objects output by the target scoring model, the KS curve constructed according to the scoring data of the plurality of sample objects is shown in FIG. 16, wherein a good-bad curve is the KS curve, each point on the KS curve represents the absolute value of the difference value of the probability of a good sample object and the probability of a bad sample object, when the absolute value is the maximum, the KS value is obtained, and the scoring data is 629, so that the capacity of distinguishing the good sample object from the bad sample object is strongest. When the abnormal risk score levels include a high risk level, a medium risk level and a low risk level, and a cumulative probability distribution function diagram of each of the bad sample object (bad user) and the good sample object (good user) is plotted on the basis of fig. 16 as shown in fig. 17, among three vertical lines in fig. 17, the abscissa of the middle vertical line is 629, i.e., score data with the strongest ability to distinguish the good and bad sample objects, the abscissa of the leftmost vertical line is 585, score data when the cumulative probability of the bad sample object is 60% and the cumulative sample proportion is 25%, and the abscissa of the rightmost vertical line is 673, which are obtained when the middle vertical line is taken as a symmetry axis and the leftmost vertical line is symmetric. Based on fig. 17, the score data is rounded, and the score data is set to 500 and 700 as boundary points to classify the abnormal risk score, as shown in fig. 4.
The data server may deploy an abnormal risk prediction project file as shown in fig. 18 to achieve the purpose of obtaining the abnormal risk level of the target enterprise through the target scoring model, and an explanatory diagram of the abnormal risk prediction project file is shown in fig. 19. Specifically, the data server may, in response to an abnormal risk assessment event for a target enterprise, connect the database to pull source data of each of a plurality of target risk indicators from the resource exchange recording system and the at least one public information source, perform preprocessing to obtain risk data of each of the plurality of target risk indicators, store the risk data into the model input data set file, execute the model master file to run a target scoring model, read risk data of each of the plurality of target risk indicators from the model input data set file, input risk data of each of the plurality of target risk indicators into the target scoring model to process, store scoring data obtained through risk data of each of the plurality of target risk indicators into the rule scoring card file, store scoring data obtained through risk data of each of the plurality of model operation indicators in the plurality of target risk indicators into the model scoring card file, and form a risk scoring file based on data in the rule scoring card file and the model scoring card file, store respective scoring data of the plurality of target risk scoring cards, the model scoring card file, and output scoring results of each of the plurality of target risk scoring data in the enterprise into the abnormal risk scoring card file.
When the data server predicts whether the target enterprise has bad accounts, the condition judgment index can be an operation state index, an overdue days index, an account age index and a settlement type index, when the risk data of the condition judgment index can meet any one of the following three preset conditions, the score data of the target enterprise is output to be 300 points, the abnormal risk level is given to be a high risk level, the preset conditions can be that the risk data of the operation state index represents any one of suspension sale, logout, stop business or settlement, or the risk data of the overdue days index represents that the overdue days exceed 180 days, or the risk data of the account age index represents that the account age exceeds 5 years and the risk data of the settlement type index represents that the target enterprise is a non-internal settlement type.
The data server may predict the bad-account amount for the target enterprise, as shown in fig. 20, which is a schematic diagram illustrating a change flow from overdue to bad-account of receivable accounts, when the seller generates receivable accounts for the target enterprise, and the target enterprise does not repay within the account period, the receivable accounts are converted into overdue clients, and the overdue clients may be converted into bad-account clients, so as to predict the bad-account amount of the bad-account clients. The receivable balances of the target enterprise at the plurality of overdue stages can be stored in a receivable account age analysis table of the ERP system.
The data server may respond to an abnormal resource prediction event for the target enterprise by acquiring receivable balances for a plurality of overdue stages in the current month and the past 6 months; the method comprises the steps of obtaining respective migration rates of a plurality of overdue stages of each month in the past 6 months, determining respective predicted loss rates of the plurality of overdue stages in the past 6 months based on the respective migration rates of the plurality of overdue stages of each month in the past 6 months, calculating predicted bad account amount of a target enterprise according to receivable balances of the plurality of overdue stages of the current month and the respective predicted loss rates of the plurality of overdue stages in the past 6 months, and storing the predicted bad account amount of the target enterprise in an abnormal resource prediction amount file.
The terminal can operate the enterprise management system, when the enterprise management account subscribes the receivable bad account risk prediction function in the enterprise management system and has the authority of the receivable bad account risk prediction function, the terminal can respond to the enterprise management account to start the receivable bad account risk prediction function, displays service customers served by the enterprise management account, when the service customers belong to target objects, reads abnormal risk score grades of the service customers from the model output result set file and displays the abnormal risk score grades on the receivable bad account risk prediction interface, when the abnormal risk score grades of the service customers are high risk grades, reads the predicted bad account amount of the service customers from the abnormal resource prediction amount file and displays the predicted bad account amount on the receivable bad account risk prediction interface, and the receivable bad account risk prediction interface is shown in fig. 21.
It should be understood that, although the steps in the flowcharts related to the embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides an abnormal risk data processing apparatus for implementing the above-mentioned abnormal risk data processing method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the method, so the specific limitations in one or more embodiments of the abnormal risk data processing apparatus provided below may refer to the limitations on the abnormal risk data processing method in the foregoing, and details are not described here.
In one embodiment, as shown in fig. 22, there is provided an abnormality risk data processing apparatus 2200 including: an input module 2210 and an execution module 2220, wherein:
an input module 2210, configured to respond to an abnormal risk assessment event for a target object, obtain source data of a plurality of target risk indicators of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indicators used by a target scoring model, perform preprocessing to obtain risk data of the plurality of target risk indicators, and store the risk data into a model input data set file.
An executing module 2220, configured to execute the model master file to run the target scoring model, and read risk data of each of the multiple target risk indicators from the model input data set file; inputting the risk data of each of the target risk indexes into a target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from the plurality of candidate risk indicators during training.
In one embodiment, the target scoring model is obtained through a training step, and the abnormal risk data processing apparatus 2200 further includes a model training module, configured to, for each sample object in the plurality of sample objects, obtain sample source data of each of the plurality of candidate risk indicators of each sample object from the resource exchange recording system and the at least one public information source; for each candidate risk index in the candidate risk indexes, sample preprocessing is carried out on sample source data of a plurality of sample objects under the candidate risk index, and sample risk data of the candidate risk indexes are obtained; performing iterative training based on a plurality of candidate risk indexes, iterating each current remaining candidate risk index each time, training to obtain a training model based on sample risk data of the traversed candidate risk indexes and sample risk data of the screened target risk indexes, evaluating the effect of the model, screening out the candidate risk indexes corresponding to the training model with the optimal model effect as the target risk indexes, and stopping the iteration until the effect of each remaining candidate risk index on the corresponding training model is invalid; and constructing a target scoring model based on the target training model obtained after iteration is stopped.
In one embodiment, the screened candidate risk indexes are target risk indexes serving as model operation indexes, and the training module is further configured to construct a target scoring model based on preset condition judgment indexes and a target training model obtained after iteration is stopped, so that the target scoring model performs scoring processing on data which do not meet preset conditions and outputs an abnormal risk scoring level when the target training model inputs the data which do not meet the preset conditions, and directly outputs a preset abnormal risk scoring level representing that abnormal risk exists when the data which meet the preset conditions are input.
In one embodiment, the training module is further configured to perform data warping on sample source data of each of the multiple candidate risk indicators of each sample object, and obtain sample warping data of each of the multiple candidate risk indicators of each sample object; for each candidate risk index in the candidate risk indexes, carrying out characteristic binning processing on the sample regular data of each of the plurality of sample objects under the candidate risk index to obtain a plurality of characteristic binning data of each of the candidate risk indexes; and performing feature coding processing on a plurality of feature binning data of each of the plurality of candidate risk indicators to form sample risk data of each of the plurality of candidate risk indicators.
In one embodiment, the plurality of target risk indicators include a model operation indicator and a condition judgment indicator, and the executing module 2220 further includes inputting the risk data of the model operation indicator and the risk data of the condition judgment indicator into the target scoring model for processing; when the risk data of the condition judgment index do not accord with the preset condition, processing the risk data of the model operation index through the target scoring model to obtain the scoring data of the target object; and determining the abnormal risk scoring level to which the scoring data of the target object belongs according to a preset grade dividing condition.
In one embodiment, the abnormal risk score level of the target object is stored in the model output result set file, and the executing module 2220 is further configured to predict the abnormal resource prediction amount of the target object through the abnormal resource amount prediction model and store the abnormal resource prediction amount in the abnormal resource prediction amount file; responding to a trigger event of an abnormal prediction function triggered by a target identity, and displaying a service object having a service relation with the target identity; when the service object belongs to the target object, reading the abnormal risk grade of the service object from the model output result set file, and displaying the abnormal risk grade of the service object; and when the abnormal risk grade of the service object is a preset abnormal risk grade representing that the abnormal risk exists, reading and displaying the abnormal resource prediction of the service object from the abnormal resource prediction file.
In one embodiment, the executing module 2220 is further configured to, in response to the abnormal resource prediction event for the target object, obtain the amount of the resource to be recovered in the current time period and the preset abnormal time duration range within the preset historical statistics period; acquiring the resource migration proportion of each of a plurality of preset abnormal time length ranges in each historical time period in a preset historical statistic period; the resource migration proportion is determined based on the quantity of the resources to be recovered in each preset abnormal time length range of each historical time period and the quantity of the resources to be recovered in the last preset abnormal time length range of the previous historical time period; determining the predicted loss proportion of each preset abnormal duration range in a preset historical statistical period based on the resource migration proportion of each preset abnormal duration range in each historical time period; and determining the abnormal resource prediction amount of the target object according to the resource amount to be recovered in each preset abnormal time length range of the current time period and the prediction loss proportion of each preset abnormal time length range in the preset historical statistical period.
The modules in the above-mentioned abnormal risk data processing device may be implemented wholly or partially by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 23. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data to be stored when the abnormal risk data processing method is executed. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an abnormal risk data processing method.
Those skilled in the art will appreciate that the architecture shown in fig. 23 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the various embodiments provided herein may be, without limitation, general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, or the like.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. An abnormal risk data processing method, characterized in that the method comprises:
responding to an abnormal risk assessment event aiming at a target object, acquiring source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to acquire risk data of the target risk indexes, and storing the risk data to a model input data set file;
executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file;
inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators when trained.
2. The method of claim 1, wherein the goal scoring model is obtained through a training step comprising:
for each sample object in a plurality of sample objects, obtaining sample source data of each of the plurality of candidate risk indicators of the each sample object from a resource exchange recording system and at least one public information source;
for each candidate risk index in the candidate risk indexes, performing sample preprocessing on sample source data of the sample objects under the candidate risk index to obtain sample risk data of the candidate risk indexes;
performing iterative training based on the candidate risk indexes, iterating and traversing each currently remaining candidate risk index every time, training to obtain a training model based on sample risk data of the traversed candidate risk index and sample risk data of the screened target risk index, evaluating the effect of the model, screening out the candidate risk index corresponding to the training model with the optimal model effect as the target risk index, and stopping iteration until the effect of each remaining candidate risk index on the corresponding training model is invalid;
and constructing a target scoring model based on the target training model obtained after iteration is stopped.
3. The method according to claim 2, wherein the candidate risk indicators screened out are target risk indicators as model operation indicators;
the method for constructing the target scoring model based on the target training model obtained after iteration is stopped comprises the following steps:
and constructing a target scoring model based on a preset condition judgment index and a target training model obtained after iteration is stopped, so that the target scoring model scores and outputs an abnormal risk scoring grade when data which do not accord with the preset condition are input, and directly outputs a preset abnormal risk scoring grade representing that abnormal risk exists when the data which accord with the preset condition are input.
4. The method of claim 2, wherein the sample preprocessing, for each of the plurality of candidate risk indicators, sample source data of each of the plurality of sample objects under the candidate risk indicator for which sample source data is sample source data, obtaining sample risk data for each of the plurality of candidate risk indicators, comprises:
performing data normalization on sample source data of each of the candidate risk indicators of each sample object to obtain sample normalization data of each of the candidate risk indicators of each sample object;
for each candidate risk index in the candidate risk indexes, performing characteristic binning processing on the sample regular data of the sample objects under the candidate risk index to obtain a plurality of characteristic binning data of the candidate risk indexes;
and carrying out feature coding processing on a plurality of feature binning data of the plurality of candidate risk indexes to form sample risk data of the plurality of candidate risk indexes.
5. The method according to any one of claims 1 to 4, wherein the plurality of target risk indicators include a model operation indicator and a conditional judgment indicator; the step of inputting the risk data of each of the multiple target risk indicators into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model includes:
inputting the risk data of the model operation index and the risk data of the condition judgment index into the target scoring model for processing;
when the risk data of the condition judgment indexes do not accord with preset conditions, processing the risk data of the model operation indexes through the target scoring model to obtain scoring data of the target object;
and determining the abnormal risk scoring level to which the scoring data of the target object belongs according to a preset grading condition.
6. The method of any one of claims 1 to 4, wherein the abnormal risk score level of the target object is stored in a model output result set file; the method further comprises the following steps:
predicting the abnormal resource prediction amount of the target object through an abnormal resource prediction model, and storing the abnormal resource prediction amount in an abnormal resource prediction amount file;
in response to a trigger event of an abnormal prediction function triggered by a target identity, displaying a service object having a service relationship with the target identity;
when the service object belongs to the target object, reading the abnormal risk grade of the service object from the model output result set file, and displaying the abnormal risk grade of the service object;
and when the abnormal risk grade of the service object is a preset abnormal risk grade representing that abnormal risk exists, reading and displaying the abnormal resource prediction of the service object from the abnormal resource prediction file.
7. The method of claim 6, wherein predicting the abnormal resource prediction amount of the target object by the abnormal resource prediction model comprises:
responding to the abnormal resource prediction event aiming at the target object, and acquiring the quantity of resources to be recovered within a plurality of preset abnormal time length ranges within the current time period and a preset historical statistic period;
acquiring the resource migration proportion of each of the plurality of preset abnormal time length ranges in each historical time period in the preset historical statistical period; the resource migration proportion is determined based on the quantity of the resources to be recovered in each preset abnormal time length range of each historical time period and the quantity of the resources to be recovered in the last preset abnormal time length range of the last historical time period;
determining the predicted loss proportion of each preset abnormal duration range in the preset historical statistical period based on the resource migration proportion of each preset abnormal duration range in each historical time period;
and determining the abnormal resource prediction amount of the target object according to the resource amount to be recovered of each preset abnormal duration range of the current time period and the prediction loss proportion of each preset abnormal duration range in the preset historical statistical period.
8. An abnormal risk data processing apparatus, characterized in that the apparatus comprises:
the system comprises an input module, a model input data set file and a data processing module, wherein the input module is used for responding to an abnormal risk evaluation event aiming at a target object, acquiring source data of a plurality of target risk indexes of the target object from a resource exchange recording system and at least one public information source according to the plurality of target risk indexes used by a target scoring model score, preprocessing the source data to acquire risk data of the target risk indexes, and storing the risk data to the model input data set file;
an execution module for executing a model master file to run the target scoring model, reading the risk data for each of the plurality of target risk indicators from the model input dataset file; inputting the risk data of each target risk index into the target scoring model for processing to obtain the abnormal risk scoring level of the target object output by the target scoring model; the target scoring model screens out at least a portion of the plurality of target risk indicators from a plurality of candidate risk indicators when trained.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202211399582.6A 2022-11-09 2022-11-09 Abnormal risk data processing method and device, computer equipment and storage medium Pending CN115689713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211399582.6A CN115689713A (en) 2022-11-09 2022-11-09 Abnormal risk data processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211399582.6A CN115689713A (en) 2022-11-09 2022-11-09 Abnormal risk data processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115689713A true CN115689713A (en) 2023-02-03

Family

ID=85049842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211399582.6A Pending CN115689713A (en) 2022-11-09 2022-11-09 Abnormal risk data processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115689713A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596336A (en) * 2023-05-16 2023-08-15 合肥联宝信息技术有限公司 State evaluation method and device of electronic equipment, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596336A (en) * 2023-05-16 2023-08-15 合肥联宝信息技术有限公司 State evaluation method and device of electronic equipment, electronic equipment and storage medium
CN116596336B (en) * 2023-05-16 2023-10-31 合肥联宝信息技术有限公司 State evaluation method and device of electronic equipment, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
US20200192894A1 (en) System and method for using data incident based modeling and prediction
CN110807700A (en) Unsupervised fusion model personal credit scoring method based on government data
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN113609193A (en) Method and device for training prediction model for predicting customer transaction behavior
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN115983900A (en) Method, apparatus, device, medium, and program product for constructing user marketing strategy
CN115689713A (en) Abnormal risk data processing method and device, computer equipment and storage medium
CN112232945B (en) Method and device for determining personal client credit
CN114626940A (en) Data analysis method and device and electronic equipment
CN111177657B (en) Demand determining method, system, electronic device and storage medium
CN114219611A (en) Loan amount calculation method and device, computer equipment and storage medium
CN113011596A (en) Method, device and system for automatically updating model and electronic equipment
Yeh et al. Predicting failure of P2P lending platforms through machine learning: The case in China
CN117036008B (en) Automatic modeling method and system for multi-source data
CN116452313B (en) Method and device for calculating customer value in bank game customer group and electronic equipment
CN116485446A (en) Service data processing method and device, processor and electronic equipment
CN117934150A (en) Personal credit assessment method and device based on improved class unbalance
CN115187353A (en) Data processing method, data processing device, computer equipment and storage medium
Ozgulbas et al. Application of Data Mining Method for Financial Profiling
CN114662824A (en) Wind control strategy switching method and device, computer equipment and storage medium
CN117788133A (en) Method for constructing retail credit risk prediction model and retail credit score model
CN117575772A (en) Abnormal user detection method and device, computer equipment and storage medium
CN115270631A (en) Material circulation degree prediction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination