CN113837863A - Business prediction model creation method and device and computer readable storage medium - Google Patents

Business prediction model creation method and device and computer readable storage medium Download PDF

Info

Publication number
CN113837863A
CN113837863A CN202111138614.2A CN202111138614A CN113837863A CN 113837863 A CN113837863 A CN 113837863A CN 202111138614 A CN202111138614 A CN 202111138614A CN 113837863 A CN113837863 A CN 113837863A
Authority
CN
China
Prior art keywords
data set
sample
prediction model
modeling
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111138614.2A
Other languages
Chinese (zh)
Other versions
CN113837863B (en
Inventor
顾凌云
谢旻旗
张涛
黄以增
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai IceKredit Inc
Original Assignee
Shanghai IceKredit Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai IceKredit Inc filed Critical Shanghai IceKredit Inc
Priority to CN202111138614.2A priority Critical patent/CN113837863B/en
Publication of CN113837863A publication Critical patent/CN113837863A/en
Application granted granted Critical
Publication of CN113837863B publication Critical patent/CN113837863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the business prediction model establishing method, the business prediction model establishing device and the computer readable storage medium, firstly, a plurality of auxiliary data sets similar to a target data set are found; then, sampling is carried out from a plurality of auxiliary data sets to obtain a sample data set, and a service state model is obtained through training of the sample data set; then, obtaining default probability through a business state model, and determining a modeling data set based on the default probability; then, determining a weight parameter based on the target data set and the modeling data set; and finally, establishing a business prediction model by the modeling data set and the weight parameters. According to the scheme, the auxiliary data set similar to the target data set is used, the modeling data set is screened out in a quantification mode, and the weight of the sample in the modeling data set is adjusted, so that the sample in the modeling data set is closer to the sample of the service corresponding to the service prediction model to be created, and the created service prediction model has stronger prediction capability and stability.

Description

Business prediction model creation method and device and computer readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for creating a service prediction model, and a computer-readable storage medium.
Background
In model development, a large amount of sample data is generally needed, and at the stage of a business development just beginning, situations such as few sample data (business objects and business state labels) can be met, so that model development cannot be performed based on the existing current sample data, or the developed model has deviation of prediction capability and unstable effect.
Disclosure of Invention
In order to overcome at least the above-mentioned deficiencies in the prior art, the present application aims to provide a method, an apparatus and a computer-readable storage medium for creating a traffic prediction model, which are used to solve the above-mentioned technical problems.
In a first aspect, an embodiment of the present application provides a method for creating a business prediction model, which is applied to a computer device, and the method includes:
acquiring a target data set of a service prediction model to be created;
acquiring a plurality of auxiliary data sets meeting preset service similar conditions with the target data set based on the target data set;
extracting sample data from the plurality of auxiliary data sets to obtain a sample data set;
training according to the sample data set to obtain a service state model for predicting the service state of the service object in the sample data;
predicting the target data set and a plurality of auxiliary data sets by adopting the service state model to obtain default probabilities of the target data set and each auxiliary data set;
determining a modeling data set from the sample data set based on the default probabilities of the target data set and each auxiliary data set;
determining a weight parameter according to the target data set and the modeling data set;
creating the business prediction model based on the modeling dataset and the weight parameters.
According to the scheme, firstly, a target data set of a service prediction model to be created is obtained, and a plurality of auxiliary data sets similar to the target data set are found; then, sampling is carried out from a plurality of auxiliary data sets to obtain a sample data set, and a service state model is obtained through training of the sample data set; then, obtaining the default probability of the target data set and each auxiliary data set through the business state model, and determining a modeling data set based on the default probability; then, determining a weight parameter based on the target data set and the modeling data set; and finally, establishing a business prediction model by the modeling data set and the weight parameters. According to the scheme, the auxiliary data set similar to the target data set is used, the modeling data set is screened out in a quantification mode, and the weight of the samples in the modeling data set is adjusted, so that the samples in the modeling data set are closer to the samples of the business corresponding to the business prediction model to be created, the business prediction model can be created under the condition that the data volume of the target data set is less, and the created business prediction model has stronger prediction capability and stability.
In a possible implementation manner, in the step of obtaining, based on the target data set, a plurality of auxiliary data sets that satisfy a preset traffic similarity condition with the target data set, the preset traffic similarity condition includes:
each auxiliary data set has the same predictor variables available for creating the business prediction model as the target data set; and the combination of (a) and (b),
the sample data of each auxiliary data set comprises a business state label of the business object.
In a possible implementation manner, the step of extracting sample data from the plurality of auxiliary data sets to obtain a sample data set includes:
extracting the same preset amount of sample data from each auxiliary data set to obtain the sample data set;
wherein the step of extracting the same preset amount of sample data from each auxiliary data set comprises:
detecting whether the number of sample data in each auxiliary data set is greater than the preset number;
if the number is larger than or equal to the preset number, extracting the sample data of the preset number from each auxiliary data set by adopting a non-return sampling mode;
and if the number of the auxiliary data sets is smaller than the preset number, extracting the sample data of the preset number from each auxiliary data set by adopting a sample-back-sampling mode.
In a possible implementation manner, the step of determining a modeling data set from the sample data set based on the default probability of the target data set and each auxiliary data set includes:
taking the default probability of the target data set as basic data, taking the default probabilities of the multiple auxiliary data sets as test data, and calculating the group stability index of each auxiliary data set according to the basic data and the test data;
and taking the auxiliary data set with the minimum index value in the population stability index as the modeling data set.
In a possible implementation manner, in the step of calculating the population stability indicator of each auxiliary data set according to the basic data and the test data, the basic data are grouped, and the test data are grouped according to a threshold standard of the grouping of the basic data, wherein the number of the groups of the basic data is the same as the number of the groups of the test data;
the population stability indicator psi is calculated as follows:
Figure 597298DEST_PATH_IMAGE001
where n is the number of packets, i is the serial number of the packet, AiIs the proportion of the sample in the group of the i-th group in the test data, EiIs the proportion of the sample in the group of the ith group in the basic data.
In one possible implementation manner, in the step of determining the weight parameter according to the target data set and the modeling data set, the formula for determining the weight parameter is as follows:
Figure 820469DEST_PATH_IMAGE002
wherein, beta is a one-dimensional weight parameter array, and the one-dimensional weight parameter array comprises a weight parameter beta1、β2…βj M is the number of samples, x 'of the modeling dataset'jFor the jth sample of the modeling data set, n is the number of samples of the target data set, xiFor the ith sample of the target data set, phi represents Euler's formula, and the constraint condition of quadratic programming is beta1、β2…βj0 or more and beta1、β2…βjThe sum is 1.
In a possible implementation manner, the step of creating the business prediction model based on the modeling data set and the weight parameter includes:
and taking the sample data in the modeling data set as a modeling sample, and taking the weight parameter as the weight of the sample data in the modeling data set to perform model creation to obtain the business prediction model.
In one possible implementation, the business state model and the business prediction model are logistic regression models.
In a second aspect, an embodiment of the present application further provides a device for creating a business prediction model, which is applied to a computer device, where the device includes:
the first acquisition module is used for acquiring a target data set of a service prediction model to be created;
the second acquisition module is used for acquiring a plurality of auxiliary data sets which meet the preset service similarity conditions with the target data set based on the target data set;
the sample extraction module is used for extracting sample data from the plurality of auxiliary data sets to obtain a sample data set;
the model training module is used for training according to the sample data set to obtain a service state model used for predicting the service state of the service object in the sample data;
the default probability prediction module is used for predicting the target data set and the plurality of auxiliary data sets by adopting the business state model to obtain default probabilities of the target data set and each auxiliary data set;
a modeling data set determination module for determining a modeling data set from the sample data set based on the default probabilities of the target data set and each auxiliary data set;
the weight parameter determining module is used for determining weight parameters according to the target data set and the modeling data set;
and the model creating module is used for creating the business prediction model based on the modeling data set and the weight parameters.
In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed, the computer is caused to execute the method for creating the service prediction model in the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, an embodiment of the present application further provides a computer device, where the computer device includes a processor, a computer-readable storage medium, and a communication unit, where the computer-readable storage medium, the communication unit, and the processor are connected through a bus system, the communication unit is configured to be communicatively connected to at least one terminal device, the computer-readable storage medium is configured to store a program, an instruction, or a code, and the processor is configured to execute the program, the instruction, or the code in the computer-readable storage medium, so as to implement the method for creating a traffic prediction model in the first aspect or any possible implementation manner of the first aspect.
Based on any one of the above aspects, firstly, a target data set of a service prediction model to be created is obtained, and a plurality of auxiliary data sets similar to the target data set are found; then, sampling is carried out from a plurality of auxiliary data sets to obtain a sample data set, and a service state model is obtained through training of the sample data set; then, obtaining the default probability of the target data set and each auxiliary data set through the business state model, and determining a modeling data set based on the default probability; then, determining a weight parameter based on the target data set and the modeling data set; and finally, establishing a business prediction model by the modeling data set and the weight parameters. According to the scheme, the auxiliary data set similar to the target data set is used, the modeling data set is screened out in a quantification mode, and the weight of the samples in the modeling data set is adjusted, so that the samples in the modeling data set are closer to the samples of the business corresponding to the business prediction model to be created, the business prediction model can be created under the condition that the data volume of the target data set is less, and the created business prediction model has stronger prediction capability and stability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that need to be called in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a method for creating a business prediction model according to an embodiment of the present application;
fig. 2 is a functional module schematic diagram of a service prediction model creation apparatus according to an embodiment of the present application;
fig. 3 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application will now be described in detail with reference to the drawings, and the specific operations in the method embodiments may also be applied to the apparatus embodiments or the system embodiments.
In the prior art, in order to solve the technical problems in the background art, one possible solution is to use sample data of other relatively mature services for modeling, however, due to differences in service contents, a service prediction model formed by directly using sample data of other relatively mature services for modeling has the problems of poor prediction capability and poor stability.
Taking credit business development of a financial institution as an example, a business prediction model is usually used to predict the default (business state) probability of a business object (customer), the prediction business model used in the credit approval stage is usually called an application scoring model, and the scoring result of the prediction business model is generally used as a basis for approval to pass or reject. However, the development of a prediction business model generally requires a large amount of sample data, and in an early stage of a new credit business (such as a large loan business) which is just developed, the number of samples of a group of meeting business objects is small, the repayment performance after the loan is not sufficient (the prediction label of the sample is not clear), and the available samples after the loan are lacked, so that the model development cannot be performed, or the developed model has a deviation in prediction capability and an unstable effect.
In order to overcome the deficiencies in the foregoing technical solutions, the inventor provides the following solutions, please refer to fig. 1, and fig. 1 is a schematic flow chart of a business prediction model creation method provided in an embodiment of the present application, where the business prediction model creation method provided in this embodiment may be executed by a computer device, and for convenience of explaining the technical solution of the present application, the business prediction model creation method is described in detail below in conjunction with a possible application scenario, where the possible application scenario may be used in a financial loan scenario, and it is understood that the technical solution provided in the present application may also be applied to other scenarios, for example, product information popularization based on big data. The following describes a method for creating a business prediction model provided by the present application, taking a financial loan scenario as an example.
The flow steps of the business prediction model creation method are explained in detail with reference to fig. 1.
And step S11, acquiring a target data set of the service prediction model to be created.
In this step, the service prediction model to be created may be a model for performing service prediction on a new service, where the new service refers to a service in which the service development time is less than a preset time (e.g., 3 months), and the new service may also refer to a service in which the number of sample data generated in the service scene is less than a preset number (e.g., 1000). The target data set refers to a set of sample data generated in a new service scenario.
And step S12, acquiring a plurality of auxiliary data sets meeting the preset service similarity conditions with the target data set based on the target data set.
In this embodiment of the present application, the presetting of the service similarity condition may include:
each auxiliary data set has the same predictor variables available for creating the business prediction model as the target data set; and the combination of (a) and (b),
the sample data of each auxiliary data set comprises a business state label of the business object.
Using financial loan scenario as an example, the auxiliary data set S1、S2…SnAnd a target data set S0The conditions for similar services that are satisfied may be as follows:
auxiliary data set S1、S2…SnAnd a target data set S0Have some of the same independent variable (also called predictor) fields available for modeling, such as the borrower's basic information, the derivative fields of the people's behavioral credit report, etc.; and the combination of (a) and (b),
auxiliary data set S1、S2…SnHas the function of generating good and bad customer labels for modeling according to the repayment performance after loan (business state), namely dependent variable (also called response variable and target variable) due to the target data set S0The service development time is short, the repayment performance after the loan is not enough, and the target data set S0May have good and bad customer labels.
Step S13, sample data is extracted from the plurality of auxiliary data sets to obtain a sample data set.
In the embodiment of the present application, it is possible to select from each of the auxiliary data sets (S)1、S2…Sn) And extracting the sample data with the same preset quantity to obtain a sample data set S.
In particular toFrom each auxiliary data set (S)1、S2…Sn) The step of extracting the sample data with the same preset quantity to obtain a sample data set S comprises the following steps:
detecting each helper data set (S)1、S2…Sn) Whether the number of sample data in (a) is greater than the preset number (e.g., 10000 pieces);
if it is detected that the number is greater than or equal to the preset number, then sampling from each of the sets of auxiliary data (S) in a non-playback sampling manner1、S2…Sn) Extracting the sample data of the preset quantity;
if less than said predetermined number is detected, then a back-sampling mode is used to extract from each of said sets of auxiliary data (S)1、S2…Sn) And extracting the sample data of the preset quantity.
The non-put-back sampling mode is that each time one unit is extracted from the population, the unit is not put back into the population after investigation and recording, therefore, the number of the total units is reduced by one every time one unit is extracted, and the probability of being drawn in each unit is different. The back sampling method is a sampling method in which, when the individuals are extracted one by one, the extracted individual is put back into the population each time and then the next extraction is performed.
And step S14, training according to the sample data set to obtain a business state model for predicting the business state of the business object in the sample data.
In the embodiment of the application, the business state model is trained by using the sample data set S, so as to obtain a business state model which can predict payment of a business object (such as a loan customer) (predicting whether the customer is overdue for payment).
Specifically, in the model training process, model parameters may be adjusted according to a difference between a tag of input sample data and a tag of the input sample data output by the model, and when the tag of the input sample data is substantially consistent with the tag of the input sample data output by the model, the model training is ended, and a trained business state model is obtained.
Step S15, predicting the target data set and a plurality of auxiliary data sets by adopting a business state model to obtain default probabilities of the target data set and each auxiliary data set.
Specifically, the default probability of the target data set may be used as basic data, the default probabilities of the plurality of auxiliary data sets may be used as test data, and a population stability indicator of each auxiliary data set may be calculated according to the basic data and the test data;
and taking the auxiliary data set with the minimum index value in the population stability indexes as the modeling data set, wherein the population stability indexes are used for measuring the indexes of the deviation between the predicted value and the actual value of the model.
In the embodiment of the application, the basic data are grouped, and the test data are grouped according to the threshold standard of the grouping of the basic data, wherein the grouping number of the basic data is the same as the grouping number of the test data;
the population stability indicator psi is calculated as follows:
Figure 977387DEST_PATH_IMAGE001
where n is the number of packets, i is the serial number of the packet, AiIs the proportion of the sample in the group of the i-th group in the test data, EiIs the proportion of the sample in the group of the ith group in the basic data. Recording the probability of breach for each secondary data set as psi1、psi2...psin
Step S16, based on the target data set and the default probability of each auxiliary data set, a modeling data set is determined from the sample data set.
Will psi1、psi2...psinAnd taking the auxiliary data set corresponding to the medium minimum value as a modeling data set T.
And step S17, determining a weight parameter according to the target data set and the modeling data set.
In the embodiment of the present application, the formula for determining the weight parameter is as follows:
Figure 969614DEST_PATH_IMAGE002
wherein, beta is a one-dimensional weight parameter array, and the one-dimensional weight parameter array comprises a weight parameter beta1、β2…βj M is the number of samples, x ', of the modeling dataset T'jFor the jth sample of the modeling data set T, n is the number of samples of the target data set S0, xiFor the sample of the ith target data set S0, phi represents Euler' S formula, and the constraint condition of quadratic programming is beta1、β2…βj0 or more and beta1、β2…βjThe sum is 1.
Step S18, creating the business prediction model based on the modeling data set and the weight parameter.
In the embodiment of the application, sample data in the modeling data set is used as a modeling sample, and a weight parameter is used as the weight of the sample data in the modeling data set for model creation, so that the business prediction model is obtained.
According to the business prediction model creation method provided by the embodiment of the application, the auxiliary data set similar to the target data set is used, the modeling data set is screened out in a quantitative mode (the group stability index is adopted to determine the modeling data set), the sample weight in the modeling data set is adjusted (the weighted modeling sample data is closer to the target customer group, the sample deviation is reduced, and the model prediction capability and stability are improved), so that the samples in the modeling data set are closer to the samples of the business corresponding to the business prediction model to be created, the business prediction model can be created under the condition that the data volume of the target data set is less, and the created business prediction model has stronger prediction capability and stability.
Further, in the embodiment of the present application, the service state model and the service prediction model may be a logistic regression model, a binary classification model, a random forest model, a gradient boosting iterative decision tree model, or the like. Preferably, the service state model and the service prediction model may be a logistic regression model, and the logistic regression model is used for the service state model and the service prediction model, and compared with other models, the model has stronger interpretability and can reduce the risk of overfitting.
Referring to fig. 2, fig. 2 is a schematic diagram of functional modules of a service prediction model creation apparatus according to an embodiment of the present disclosure, in this embodiment, functional modules of the service prediction model creation apparatus 20 may be divided according to a method embodiment executed by a computer device, that is, the following functional modules corresponding to the service prediction model creation apparatus 20 may be used to execute the method embodiments executed by the computer device. The business prediction model-based creating device 20 may include a first obtaining module 21, a second obtaining module 22, a sample sampling module 23, a model training module 24, a default probability prediction module 25, a modeling data set determination module 26, a weight parameter determination module 27, and a model creating module 28, and the functions of the functional modules of the business prediction model creating device 20 are described in detail below.
The first obtaining module 21 is configured to obtain a target data set of a service prediction model to be created.
The service prediction model to be created may be a model for performing service prediction on a new service, where the new service refers to a service in which the service development time is less than a preset time (e.g., 3 months), and the new service may also refer to a service in which the number of sample data generated in the service scenario is less than a preset number (e.g., 1000). The target data set refers to a set of sample data generated in a new service scenario.
A second obtaining module 22, configured to obtain, based on the target data set, multiple auxiliary data sets that satisfy a preset service similarity condition with the target data set.
In this embodiment of the present application, the presetting of the service similarity condition may include:
each auxiliary data set has the same predictor variables available for creating the business prediction model as the target data set; and the combination of (a) and (b),
the sample data of each auxiliary data set comprises a business state label of the business object.
Using financial loan scenario as an example, the auxiliary data set S1、S2…SnAnd a target data set S0The conditions for similar services that are satisfied may be as follows:
auxiliary data set S1、S2…SnAnd a target data set S0Have some of the same independent variable (also called predictor) fields available for modeling, such as the borrower's basic information, the derivative fields of the people's behavioral credit report, etc.; and the combination of (a) and (b),
auxiliary data set S1、S2…SnHas the function of generating good and bad customer labels for modeling according to the repayment performance after loan (business state), namely dependent variable (also called response variable and target variable) due to the target data set S0The service development time is short, the repayment performance after the loan is not enough, and the target data set S0May have good and bad customer labels.
And a sample extracting module 23, configured to extract sample data from the multiple auxiliary data sets to obtain a sample data set.
In the embodiment of the present application, it is possible to select from each of the auxiliary data sets (S)1、S2…Sn) And extracting the sample data with the same preset quantity to obtain a sample data set S.
In particular, from each auxiliary data set (S)1、S2…Sn) The step of extracting the sample data with the same preset quantity to obtain a sample data set S comprises the following steps:
detecting each helper data set (S)1、S2…Sn) Whether the number of sample data in (a) is greater than the preset number (e.g., 10000 pieces);
if it is detected that the number is greater than or equal to the preset number, then sampling from each of the sets of auxiliary data (S) in a non-playback sampling manner1、S2…Sn) Extracting the sample data of the preset quantity;
if less than the detected valueA predetermined number of samples are taken from each of said sets of auxiliary data (S) in a sample with put back manner1、S2…Sn) And extracting the sample data of the preset quantity.
The non-put-back sampling mode is that each time one unit is extracted from the population, the unit is not put back into the population after investigation and recording, therefore, the number of the total units is reduced by one every time one unit is extracted, and the probability of being drawn in each unit is different. The back sampling method is a sampling method in which, when the individuals are extracted one by one, the extracted individual is put back into the population each time and then the next extraction is performed.
And the model training module 24 is configured to train according to the sample data set to obtain a service state model for predicting a service state of a service object in the sample data.
In the embodiment of the application, the business state model is trained by using the sample data set S, so as to obtain a business state model which can predict payment of a business object (such as a loan customer) (predicting whether the customer is overdue for payment).
Specifically, in the model training process, model parameters may be adjusted according to a difference between a tag of input sample data and a tag of the input sample data output by the model, and when the tag of the input sample data is substantially consistent with the tag of the input sample data output by the model, the model training is ended, and a trained business state model is obtained.
And a default probability prediction module 25, configured to predict the target data set and the multiple auxiliary data sets by using the service state model, so as to obtain default probabilities of the target data set and each auxiliary data set.
Specifically, the default probability of the target data set may be used as basic data, the default probabilities of the plurality of auxiliary data sets may be used as test data, and a population stability indicator of each auxiliary data set may be calculated according to the basic data and the test data;
and taking the auxiliary data set with the minimum index value in the population stability indexes as the modeling data set, wherein the population stability indexes are used for measuring the indexes of the deviation between the predicted value and the actual value of the model.
In the embodiment of the application, the basic data are grouped, and the test data are grouped according to the threshold standard of the grouping of the basic data, wherein the grouping number of the basic data is the same as the grouping number of the test data;
the population stability indicator psi is calculated as follows:
Figure 868300DEST_PATH_IMAGE001
where n is the number of packets, i is the serial number of the packet, AiIs the proportion of the sample in the group of the i-th group in the test data, EiIs the proportion of the sample in the group of the ith group in the basic data. Recording the probability of breach for each secondary data set as psi1、psi2...psin
A modeling data set determining module 26, configured to determine a modeling data set from the sample data set based on the default probability of the target data set and each auxiliary data set.
Will psi1、psi2...psinAnd taking the auxiliary data set corresponding to the medium minimum value as a modeling data set T.
A weight parameter determining module 27, configured to determine a weight parameter according to the target data set and the modeling data set.
In this embodiment, the formula for determining the weight parameter by the weight parameter determining module 270 may be as follows:
Figure 375505DEST_PATH_IMAGE002
wherein, beta is a one-dimensional weight parameter array, and the one-dimensional weight parameter array comprises a weight parameter beta1、β2…βj M is the number of samples, x ', of the modeling dataset T'jFor the jth sample of said modeling data set T, n is said targetData set S0Number of samples of (1), xiFor the ith said target data set S0Phi represents the Euler formula, and the constraint condition of quadratic programming is beta1、β2…βj0 or more and beta1、β2…βjThe sum is 1.
A model creation module 28, configured to create the business prediction model based on the modeling dataset and the weight parameter.
In the embodiment of the application, sample data in the modeling data set is used as a modeling sample, and a weight parameter is used as the weight of the sample data in the modeling data set for model creation, so that the business prediction model is obtained.
It should be noted that the division of the modules in the above apparatus or system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these modules can be implemented in the form of software (e.g., open source software) that can be invoked by a processor; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by a processor, and part of the modules can be realized in the form of hardware. For example, the model creating module 28 may be implemented by a single processor, for example, the model creating module may be stored in a memory of the device or system in the form of program code, and a certain processor of the device or system calls and executes the functions of the model creating module 28, and the implementation of other modules is similar and will not be described herein again. In addition, the modules can be wholly or partially integrated together or can be independently realized. The processor described herein may be an integrated circuit with signal processing capability, and in the implementation process, each step or each module in the above technical solutions may be implemented in the form of an integrated logic circuit in the processor or a software program executed.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a hardware structure of a computer device 10 for implementing the business prediction model creating method according to the embodiment of the present disclosure, where the computer device 10 may be implemented on a cloud server. As shown in fig. 3, computer device 10 may include a processor 11, a computer-readable storage medium 12, a bus 13, and a communication unit 14.
In a specific implementation process, at least one processor 11 executes computer-executable instructions stored in a computer-readable storage medium 12 (for example, various modules included in the traffic prediction model creation apparatus 20 shown in fig. 2), so that the processor 11 may execute the traffic prediction model creation method according to the above method embodiment, where the processor 11, the computer-readable storage medium 12, and the communication unit 14 are connected through a bus 13, and the processor 11 may be used to control data reception and transmission of the communication unit 14.
For the specific implementation process of the processor 11, reference may be made to the above-mentioned method embodiments executed by the computer device 10, which implement the principle and the technical effect similarly, and the detailed description of the embodiment is omitted here.
Computer-readable storage medium 12 may include random access memory and may also include non-volatile storage, such as at least one disk storage.
The bus 13 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
In addition, an embodiment of the present application further provides a readable storage medium, where the readable storage medium stores computer-executable instructions, and when a processor executes the computer-executable instructions, the method for creating a business prediction model as above is implemented.
To sum up, according to the method, the apparatus, and the computer-readable storage medium for creating a service prediction model provided in the embodiments of the present application, first, a target data set of a service prediction model to be created is obtained, and a plurality of auxiliary data sets similar to the target data set are found; then, sampling is carried out from a plurality of auxiliary data sets to obtain a sample data set, and a service state model is obtained through training of the sample data set; then, obtaining the default probability of the target data set and each auxiliary data set through the business state model, and determining a modeling data set based on the default probability; then, determining a weight parameter based on the target data set and the modeling data set; and finally, establishing a business prediction model by the modeling data set and the weight parameters. According to the scheme, the auxiliary data set similar to the target data set is used, the modeling data set is screened out in a quantification mode, and the weight of the samples in the modeling data set is adjusted, so that the samples in the modeling data set are closer to the samples of the business corresponding to the business prediction model to be created, the business prediction model can be created under the condition that the data volume of the target data set is less, and the created business prediction model has stronger prediction capability and stability.
The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided in the accompanying drawings is not intended to limit the scope of the application, but is merely representative of selected embodiments of the application. Based on this, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A business prediction model creation method applied to a computer device, the method comprising:
acquiring a target data set of a service prediction model to be created;
acquiring a plurality of auxiliary data sets meeting preset service similar conditions with the target data set based on the target data set;
extracting sample data from the plurality of auxiliary data sets to obtain a sample data set;
training according to the sample data set to obtain a service state model for predicting the service state of the service object in the sample data;
predicting the target data set and a plurality of auxiliary data sets by adopting the service state model to obtain default probabilities of the target data set and each auxiliary data set;
determining a modeling data set from the sample data set based on the default probabilities of the target data set and each auxiliary data set;
determining a weight parameter according to the target data set and the modeling data set;
creating the business prediction model based on the modeling dataset and the weight parameters.
2. The traffic prediction model creation method according to claim 1, wherein in the step of obtaining a plurality of auxiliary data sets satisfying a preset traffic similarity condition with the target data set based on the target data set, the preset traffic similarity condition includes:
each auxiliary data set has the same predictor variables available for creating the business prediction model as the target data set; and the combination of (a) and (b),
the sample data of each auxiliary data set comprises a business state label of the business object.
3. The method for creating a traffic prediction model according to claim 1, wherein the step of extracting sample data from the plurality of auxiliary data sets to obtain a sample data set comprises:
extracting the same preset amount of sample data from each auxiliary data set to obtain the sample data set;
wherein the step of extracting the same preset amount of sample data from each auxiliary data set comprises:
detecting whether the number of sample data in each auxiliary data set is greater than the preset number;
if the number is larger than or equal to the preset number, extracting the sample data of the preset number from each auxiliary data set by adopting a non-return sampling mode;
and if the number of the auxiliary data sets is smaller than the preset number, extracting the sample data of the preset number from each auxiliary data set by adopting a sample-back-sampling mode.
4. The method for creating a traffic prediction model according to claim 1, wherein the step of determining a modeling data set from the sample data set based on the probability of breach for the target data set and each auxiliary data set comprises:
taking the default probability of the target data set as basic data, taking the default probabilities of the multiple auxiliary data sets as test data, and calculating the group stability index of each auxiliary data set according to the basic data and the test data;
and taking the auxiliary data set with the minimum index value in the population stability index as the modeling data set.
5. The traffic prediction model creation method according to claim 4, wherein in the step of calculating the population stability indicator for each set of auxiliary data based on the base data and the test data, the base data are grouped, and the test data are grouped according to a threshold standard for grouping of the base data, wherein the number of groups of the base data is the same as the number of groups of the test data;
the population stability indicator psi is calculated as follows:
Figure 871304DEST_PATH_IMAGE001
where n is the number of packets, i is the serial number of the packet, AiIs the proportion of the sample in the group of the i-th group in the test data, EiIs the proportion of the sample in the group of the ith group in the basic data.
6. The traffic prediction model creation method according to claim 5, wherein in the step of determining a weight parameter from the target data set and the modeling data set, the formula for determining the weight parameter is as follows:
Figure 258292DEST_PATH_IMAGE002
wherein, beta is a one-dimensional weight parameter array, and the one-dimensional weight parameter array comprises a weight parameter beta1、β2…βj M is the number of samples, x 'of the modeling dataset'jFor the jth sample of the modeling data set, n is the number of samples of the target data set, xiFor the ith sample of the target data set, phi represents Euler's formula, and the constraint condition of quadratic programming is beta1、β2…βj0 or more and beta1、β2…βjThe sum is 1.
7. The business prediction model creation method of claim 6, wherein the step of creating the business prediction model based on the modeling dataset and the weight parameter comprises:
and taking the sample data in the modeling data set as a modeling sample, and taking the weight parameter as the weight of the sample data in the modeling data set to perform model creation to obtain the business prediction model.
8. The method of creating a business prediction model of claim 7, wherein the business state model and the business prediction model are logistic regression models.
9. An apparatus for creating a business prediction model, applied to a computer device, the apparatus comprising:
the first acquisition module is used for acquiring a target data set of a service prediction model to be created;
the second acquisition module is used for acquiring a plurality of auxiliary data sets which meet the preset service similarity conditions with the target data set based on the target data set;
the sample extraction module is used for extracting sample data from the plurality of auxiliary data sets to obtain a sample data set;
the model training module is used for training according to the sample data set to obtain a service state model used for predicting the service state of the service object in the sample data;
the default probability prediction module is used for predicting the target data set and the plurality of auxiliary data sets by adopting the business state model to obtain default probabilities of the target data set and each auxiliary data set;
a modeling data set determination module for determining a modeling data set from the sample data set based on the default probabilities of the target data set and each auxiliary data set;
the weight parameter determining module is used for determining weight parameters according to the target data set and the modeling data set;
and the model creating module is used for creating the business prediction model based on the modeling data set and the weight parameters.
10. A computer-readable storage medium having stored therein instructions that, when executed, cause a computer device to perform the traffic prediction model creation method of any of claims 1-8.
CN202111138614.2A 2021-09-27 2021-09-27 Business prediction model creation method and device and computer readable storage medium Active CN113837863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111138614.2A CN113837863B (en) 2021-09-27 2021-09-27 Business prediction model creation method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111138614.2A CN113837863B (en) 2021-09-27 2021-09-27 Business prediction model creation method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113837863A true CN113837863A (en) 2021-12-24
CN113837863B CN113837863B (en) 2023-12-29

Family

ID=78970723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111138614.2A Active CN113837863B (en) 2021-09-27 2021-09-27 Business prediction model creation method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113837863B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015168250A2 (en) * 2014-04-30 2015-11-05 Battelle Memorial Institute Decision support system for hospital quality assessment
CN109636243A (en) * 2019-01-03 2019-04-16 深圳壹账通智能科技有限公司 Model fault detection method, device, computer equipment and storage medium
CN110349012A (en) * 2019-07-12 2019-10-18 腾讯科技(深圳)有限公司 Data predication method and computer readable storage medium
CN110689427A (en) * 2019-10-12 2020-01-14 杭州绿度信息技术有限公司 Consumption stage default probability model based on survival analysis
CN110837931A (en) * 2019-11-08 2020-02-25 中国农业银行股份有限公司 Customer churn prediction method, device and storage medium
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN112200667A (en) * 2020-11-30 2021-01-08 上海冰鉴信息科技有限公司 Data processing method and device and computer equipment
CN112241916A (en) * 2020-10-22 2021-01-19 北京大学 Personal credit risk default early warning method, device, equipment and storage medium
CN112288572A (en) * 2020-12-24 2021-01-29 上海冰鉴信息科技有限公司 Service data processing method and computer equipment
CN112488817A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Financial default risk assessment method and system based on refusal inference
CN112785005A (en) * 2021-01-22 2021-05-11 中国平安人寿保险股份有限公司 Multi-target task assistant decision-making method and device, computer equipment and medium
CN112884092A (en) * 2021-04-28 2021-06-01 深圳索信达数据技术有限公司 AI model generation method, electronic device, and storage medium
CN113052512A (en) * 2021-05-12 2021-06-29 中国工商银行股份有限公司 Risk prediction method and device and electronic equipment
CN113051317A (en) * 2021-04-09 2021-06-29 上海云从企业发展有限公司 Data exploration method and system and data mining model updating method and system
CN113139687A (en) * 2021-04-25 2021-07-20 中国工商银行股份有限公司 Method and device for predicting default of credit card user

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015168250A2 (en) * 2014-04-30 2015-11-05 Battelle Memorial Institute Decision support system for hospital quality assessment
CN109636243A (en) * 2019-01-03 2019-04-16 深圳壹账通智能科技有限公司 Model fault detection method, device, computer equipment and storage medium
CN110349012A (en) * 2019-07-12 2019-10-18 腾讯科技(深圳)有限公司 Data predication method and computer readable storage medium
CN110689427A (en) * 2019-10-12 2020-01-14 杭州绿度信息技术有限公司 Consumption stage default probability model based on survival analysis
CN110837931A (en) * 2019-11-08 2020-02-25 中国农业银行股份有限公司 Customer churn prediction method, device and storage medium
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN112488817A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Financial default risk assessment method and system based on refusal inference
CN112241916A (en) * 2020-10-22 2021-01-19 北京大学 Personal credit risk default early warning method, device, equipment and storage medium
CN112200667A (en) * 2020-11-30 2021-01-08 上海冰鉴信息科技有限公司 Data processing method and device and computer equipment
CN112288572A (en) * 2020-12-24 2021-01-29 上海冰鉴信息科技有限公司 Service data processing method and computer equipment
CN112785005A (en) * 2021-01-22 2021-05-11 中国平安人寿保险股份有限公司 Multi-target task assistant decision-making method and device, computer equipment and medium
CN113051317A (en) * 2021-04-09 2021-06-29 上海云从企业发展有限公司 Data exploration method and system and data mining model updating method and system
CN113139687A (en) * 2021-04-25 2021-07-20 中国工商银行股份有限公司 Method and device for predicting default of credit card user
CN112884092A (en) * 2021-04-28 2021-06-01 深圳索信达数据技术有限公司 AI model generation method, electronic device, and storage medium
CN113052512A (en) * 2021-05-12 2021-06-29 中国工商银行股份有限公司 Risk prediction method and device and electronic equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
周翔;张文宇;江业峰;: "个人信贷违约预测模型的研究", 辽宁科技大学学报, no. 03 *
张涛: "不同分类模型下网络信贷违约识别的比较验证", 《中国优秀硕士学位论文全文数据库 (基础科学辑)》, no. 7 *
张涛: "基于样本依赖代价矩阵的小微企业信用评估方法", 《同济大学学报(自然科学版)》, vol. 48, no. 1 *
童佳庆: "基于机器学习的消费信贷违约概率预测模型研究", 《中国优秀硕士学位论文全文数据库 (基础科学辑)》, no. 2 *

Also Published As

Publication number Publication date
CN113837863B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN110674188A (en) Feature extraction method, device and equipment
CN112711578B (en) Big data denoising method for cloud computing service and cloud computing financial server
CN110688536A (en) Label prediction method, device, equipment and storage medium
CN112232944B (en) Method and device for creating scoring card and electronic equipment
CN111428217A (en) Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN111652661B (en) Mobile phone client user loss early warning processing method
CN111160959A (en) User click conversion estimation method and device
CN112434884A (en) Method and device for establishing supplier classified portrait
US20220229854A1 (en) Constructing ground truth when classifying data
CN113344079B (en) Image tag semi-automatic labeling method, system, terminal and medium
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN113837863B (en) Business prediction model creation method and device and computer readable storage medium
CN116245630A (en) Anti-fraud detection method and device, electronic equipment and medium
CN117523218A (en) Label generation, training of image classification model and image classification method and device
CN113610175A (en) Service strategy generation method and device and computer readable storage medium
CN111324732A (en) Model training method, text processing device and electronic equipment
CN111612023A (en) Classification model construction method and device
CN115953248B (en) Wind control method, device, equipment and medium based on saprolitic additivity interpretation
CN116028880B (en) Method for training behavior intention recognition model, behavior intention recognition method and device
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
US20230222579A1 (en) Method and Apparatus for Iterating Credit Scorecard Model, Electronic Device and Storage Medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant