CN114139595A - Grading card model training method and device - Google Patents

Grading card model training method and device Download PDF

Info

Publication number
CN114139595A
CN114139595A CN202111163339.XA CN202111163339A CN114139595A CN 114139595 A CN114139595 A CN 114139595A CN 202111163339 A CN202111163339 A CN 202111163339A CN 114139595 A CN114139595 A CN 114139595A
Authority
CN
China
Prior art keywords
preset number
variables
training
card model
scoring card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111163339.XA
Other languages
Chinese (zh)
Inventor
李琨
郑方兰
田江
向小佳
丁永建
李璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Everbright Technology Co ltd
Original Assignee
Everbright Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Everbright Technology Co ltd filed Critical Everbright Technology Co ltd
Priority to CN202111163339.XA priority Critical patent/CN114139595A/en
Publication of CN114139595A publication Critical patent/CN114139595A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a method and a device for training a rating card model, wherein the method comprises the following steps: acquiring a preset number of training samples; respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number; in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model, and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model, so that the problem that the modeling method of the scoring card model added with transfer learning in the related technology limits the applicable range of the model seriously, and the established scoring card model is more stable and the applicable range of the model is wider by adjusting the model parameters and training the weights of the samples in the process of training the model.

Description

Grading card model training method and device
Technical Field
The invention relates to the field of data processing, in particular to a method and a device for training a rating card model.
Background
The existing financial practice application has an explanatory standard scoring card model, has a long application history in the field of wind control credit, and is also widely applied to the scene of operation marketing prediction. The modeling process of the grading card model is as follows, and comprises a set of complete processes including variable discretization processing such as continuous variable binning, evidence weight coding preprocessing, feature selection, logistic regression, interpretability verification and the like. The resulting model has good model interpretability: for all samples, the same feature value has the same interpretation of the model prediction results.
The standard scoring card model with good interpretability, especially the key steps, the logical regression, or the attempt to find the correlation between the input features and the output results, and the correlation features are not a reliable basis for judgment.
In fact, of the correlations caused by the sample selection bias, only the correlations caused by causality are guaranteed to be stable and established under various environments and can be explained.
A scoring card model modeling method added with transfer learning is provided in the correlation technique, and the prior knowledge brought by an application scene data set can be utilized to reduce the correlation influence caused by sample selection deviation to a certain extent. But there is no correlation resolution mechanism for the disturbance variable causes and the help of the scene data set has to be applied, which in practice severely limits the scope of applicability of the modeling approach.
Aiming at the problem that the grading card model modeling method added with the transfer learning in the related technology limits the modeling method and seriously limits the applicable range of the model, no solution is provided.
Disclosure of Invention
The embodiment of the invention provides a scoring card model training method and device, which at least solve the problem that the application range of a model is severely limited by a modeling method limited by a scoring card model modeling method added with transfer learning in the related technology.
According to an embodiment of the present invention, there is provided a score card model training method, including:
acquiring a preset number of training samples;
respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number;
and in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model, and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model.
Optionally, in the process of training the scoring card model according to the preset number of feature variables, adjusting parameters of the scoring card model, and adjusting sample weights of the preset number of feature variables at the same time to obtain the trained scoring card model includes:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting parameters of the scoring card model in an iterative mode, and adjusting sample weights of the preset number of characteristic variables at the same time until the following conditions are met, and stopping iteration to obtain the trained scoring card model:
Figure BDA0003290589670000021
Figure BDA0003290589670000022
Figure BDA0003290589670000023
Figure BDA0003290589670000024
wherein, X-j=X/XjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
Optionally, before the feature selection is performed on the preset number of samples respectively to obtain the preset number of feature variables, the method further includes:
carrying out variable discretization processing on the continuous variables in the preset number of training samples in a box dividing mode to obtain discrete variables after box dividing;
and not processing the discrete/classified variables in the preset number of training samples.
Optionally, before the feature selection is performed on the preset number of samples respectively to obtain the preset number of feature variables, the method further includes:
and respectively carrying out evidence weight coding on the training samples with the preset number.
Optionally, the performing evidence weight coding on the preset number of training samples respectively includes:
according to two classification labels on samples corresponding to target variables under different regions or different values, respectively determining a first sample number ratio that labels under a single value of the characteristics are respectively 0 and 1 and a second sample number ratio that labels on the whole samples are respectively 0 and 1;
and determining the coding value of the evidence weight coding according to the ratio of the first sample number proportion to the second sample number proportion.
Optionally, respectively performing feature selection on the preset number of samples, and obtaining the preset number of feature variables includes:
carrying out binarization processing on the preset number of characteristic variables to obtain the preset number of binary variables;
and respectively carrying out feature selection on the preset number of the binary variables to obtain the preset number of the feature variables.
According to another embodiment of the present invention, there is also provided a score card model training apparatus including:
the acquisition module is used for acquiring a preset number of training samples;
the characteristic selection module is used for respectively carrying out characteristic selection on the samples with the preset number to obtain the characteristic variables with the preset number;
and the training module is used for adjusting the parameters of the scoring card model and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model in the process of training the scoring card model according to the preset number of characteristic variables.
Optionally, the training module is further configured to:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting parameters of the scoring card model in an iterative mode, and adjusting sample weights of the preset number of characteristic variables at the same time until the following conditions are met, and stopping iteration to obtain the trained scoring card model:
Figure BDA0003290589670000041
Figure BDA0003290589670000042
Figure BDA0003290589670000043
Figure BDA0003290589670000044
wherein, X-j=X/XjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
Optionally, the apparatus further comprises:
and the discrete processing module is used for carrying out variable discretization on the continuous variables in the preset number of training samples in a box dividing mode to obtain the boxed discrete variables, and not processing the discrete/classified variables in the preset number of training samples.
Optionally, the apparatus further comprises:
and the evidence weight coding module is used for respectively carrying out evidence weight recoding on the preset number of training samples.
Optionally, the evidence weight encoding module is further configured to:
according to two classification labels on samples corresponding to target variables under different regions or different values, respectively determining a first sample number ratio that labels under a single value of the characteristics are respectively 0 and 1 and a second sample number ratio that labels on the whole samples are respectively 0 and 1;
and determining the coding value of the evidence weight coding according to the ratio of the first sample number proportion to the second sample number proportion.
Optionally, the feature selection module is further configured to:
carrying out binarization processing on the preset number of characteristic variables to obtain the preset number of binary variables;
and respectively carrying out feature selection on the preset number of the binary variables to obtain the preset number of the feature variables.
According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above-described method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
According to the invention, a preset number of training samples are obtained; respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number; in the process of training the scoring card model according to the preset number of characteristic variables, adjusting parameters of the scoring card model, and simultaneously adjusting sample weights of the preset number of characteristic variables to obtain the trained scoring card model, so that the problem that the modeling method of the scoring card model added with transfer learning in the related art limits the applicable range of the model seriously, can be solved, and the established scoring card model is more stable and the applicable range of the model is wider by adjusting the model parameters and training the weights of the samples in the process of training the model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention to a proper form. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal of a scorecard model training method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a scorecard model training method according to an embodiment of the invention;
FIG. 3 is a flow diagram of stationary learning modeling based on causal inference according to an embodiment of the present invention;
fig. 4 is a block diagram of a scorecard model training device according to an embodiment of the present invention.
Detailed Description
The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Example 1
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal of the scorecard model training method according to the embodiment of the present invention, and as shown in fig. 1, the mobile terminal may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication function and an input/output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the scorecard model training method in the embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to execute various functional applications and data processing, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In this embodiment, a scoring card model training method operating in the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the scoring card model training method according to the embodiment of the present invention, as shown in fig. 2, the flowchart includes the following steps:
step S202, obtaining a preset number of training samples;
step S204, respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number;
in an embodiment of the present invention, the step S204 may specifically include: carrying out binarization processing on the preset number of characteristic variables to obtain the preset number of binarization variables; and respectively carrying out feature selection on the preset number of the binary variables to obtain the preset number of the feature variables.
And step S206, in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model, and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model.
In an embodiment of the present invention, the step S206 may specifically include:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting parameters of the scoring card model in an iterative mode, and adjusting sample weights of the preset number of characteristic variables at the same time until the following conditions are met, and stopping iteration to obtain the trained scoring card model:
Figure BDA0003290589670000081
Figure BDA0003290589670000082
Figure BDA0003290589670000083
Figure BDA0003290589670000084
wherein, X-j=X/XjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
Through the steps S202 to S206, the problem that the application range of the model is severely limited by the method for modeling the score card model added with the transfer learning in the related technology is solved, and the weight of the training sample is adjusted while the model parameters are adjusted in the process of training the model, so that the established score card model is more stable and the application range of the model is wider.
In an optional embodiment, before the step S204, performing variable discretization on the continuous variables in the preset number of training samples in a binning manner to obtain binned discrete variables; and not processing the discrete/classified variables in the preset number of training samples.
In another optional embodiment, before the step S204, evidence weight coding is performed on the preset number of training samples, and further, according to two classification labels on samples corresponding to the target variable under the interval or different values, a first sample number ratio that labels under a single value of the feature are 0 and 1, respectively, and a second sample number ratio that labels on the whole sample are 0 and 1, respectively, are determined; and determining the coding value of the evidence weight coding according to the ratio of the first sample number proportion to the second sample number proportion.
According to the embodiment of the invention, the stable scoring card modeling method based on causal inference is adopted, and the input variable which is causally linked with the output variable is the characteristic on which the decision making of the model should depend. According to the theory of causal inference in statistics, a variable A has a causal relationship with a variable B, and if and only if all other variables are kept unchanged, the value of B is changed if the value of A is changed. "the regression is logically implemented in the key step of the scoring card process, and the regression is adjusted to solve a new optimization problem of stable learning. Variables that can be adjusted in this problem include newly introduced sample weights in addition to the model parameters designed in the original logistic regression problem. The optimization problem adds a new limiting condition, and the adjustment of the sample weight is hopeful to ensure that other characteristics are randomly distributed under the two conditions of the intervention corresponding to each binary characteristic and the non-dry intervention. Thereby eliminating as much as possible the correlation caused by non-causal relationships. FIG. 3 is a flow diagram of stable learning modeling based on causal inference, as shown in FIG. 3, according to an embodiment of the present invention, including:
step S301, performing box separation processing on the continuous variable to obtain a discrete variable, wherein the box separation process is to set a range- ∞ ≦ x for the continuous variable x1≤x2≤...≤xmAnd (5) less than or equal to + ∞, and unifying values in the interval. By the box separation processing, the influence of factors such as abnormal values, noise and the like on the model can be avoided, and the reliability of the model is improved. Here, the interval (x)j,xj+1]The selection of the upper and lower boundaries is a key in the box separation process, and various different selection methods can be used.
Step S302, Evidence Weight coding is carried out on discrete variables, wherein the Evidence Weight coding (Weight of Evidence (WOE)) is an important step in a standard scoring card model, and the coding value is obtained by comparing sample number ratios of 0 and 1 respectively under single values of features and sample number ratios of 0 and 1 respectively under the labels on the whole samples according to two classification labels on the samples corresponding to the target variables (y) under intervals (after continuous variable binning) or different values (discrete/classified variables) and taking logs. The method directly embodies the characterization capability of a single characteristic value to the target variable. When the value of the feature x is the ith value of the classified/discrete variable or the ith binning zone belonging to the continuous variable, the specific calculation formula is as follows:
Figure BDA0003290589670000091
wherein bad represents the number of samples with 1 in the total group seed label, and good represents the number of samples with 0 in the total sample label. badiThe number of samples, good, representing 1 for the total population of seed labelsiThe number of samples with 0 tag in the total sample is shown.
Step S303, carrying out binary encoding on the variables after the certificate authority recoding to obtain binary variables;
step S304, selecting the characteristics of the binary variables;
s305, solving an optimization problem II of the variables after the feature selection through stable learning, specifically, adjusting parameters of the scoring card model and sample weight of the variables in the process of training the scoring card model based on the variables;
step S306, determining whether an iteration end condition is met (i.e., whether the scorecard model training is ended, which may be determined according to the loss function, specifically, determining whether the loss function meets a preset condition), if the determination result is yes, executing step S307, otherwise, returning to step S304;
and step S307, outputting the scoring card model.
For the dataset D ═ { X, Y }, X is the sample feature and Y is the sample label corresponding to the target variable. The standard scoring card logistic regression step solves the following optimization problem I:
Figure BDA0003290589670000101
wherein the content of the first and second substances,beta is the model parameter to be solved, gamma3And gamma4Is a predefined hyper-parameter. I.e. with respect to beta satisfaction
Figure BDA0003290589670000102
Is minimized
Figure BDA0003290589670000103
The optimization problem can be transformed by a standard penalty function method, and is solved by an iterative method by a gradient descent method which is a standard solving method of the unconstrained optimization problem. After the sample weight W is introduced, the following optimization problem II is solved:
Figure BDA0003290589670000104
Figure BDA0003290589670000105
Figure BDA0003290589670000111
Figure BDA0003290589670000112
wherein IjDenotes the jth column, X, of the identity matrixjJ-th column of X, X-j=X\XjIndicating that the jth column of the set to 0 and the other elements remain unchanged. As indicates the Hadamard product of the matrix, i.e. the multiplication of the corresponding elements. Where W, β are the model parameters to be solved, γ1、γ2、γ3、γ4And gamma5Is a pre-given hyper-parameter. That is, in the case where W, β satisfies the above condition, the objective function is minimized
Figure RE-GDA0003491939970000113
The solution of the optimization problem II can be converted by a standard penalty function method, and the gradient descent method which is a standard solution method of the constraint-free optimization problem is used for iterative solution. Because the environment difference between intervention and non-intervention can be measured for the variables after binarization in the definition of the causal relationship, the optimization problem II is only effective for the feature set X as the binarization variables, so that the whole flow of the existing stable scoring card modeling method based on causal inference is adjusted to be the following process.
Example 2
According to another embodiment of the present invention, there is also provided a scorecard model training device, and fig. 4 is a block diagram of the scorecard model training device according to the embodiment of the present invention, as shown in fig. 4, including:
an obtaining module 42, configured to obtain a preset number of training samples;
a feature selection module 44, configured to perform feature selection on the preset number of samples respectively to obtain the preset number of feature variables;
and the training module 46 is configured to, during the process of training the scoring card model according to the preset number of feature variables, adjust parameters of the scoring card model, and adjust sample weights of the preset number of feature variables to obtain the trained scoring card model.
Optionally, the training module 46 is further configured to:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting parameters of the scoring card model in an iterative mode, and adjusting sample weights of the preset number of characteristic variables at the same time until the following conditions are met, and stopping iteration to obtain the trained scoring card model:
Figure BDA0003290589670000121
Figure BDA0003290589670000122
Figure BDA0003290589670000123
Figure BDA0003290589670000124
wherein, X-j=X/XjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
Optionally, the apparatus further comprises:
and the discrete processing module is used for carrying out variable discretization on the continuous variables in the preset number of training samples in a box dividing mode to obtain the boxed discrete variables, and not processing the discrete/classified variables in the preset number of training samples.
Optionally, the apparatus further comprises:
and the evidence weight coding module is used for respectively carrying out evidence weight recoding on the preset number of training samples.
Optionally, the evidence weight encoding module is further configured to:
according to two classification labels on samples corresponding to target variables under different regions or different values, respectively determining a first sample number ratio that labels under a single value of the characteristics are respectively 0 and 1 and a second sample number ratio that labels on the whole samples are respectively 0 and 1;
and determining the coding value of the evidence weight coding according to the ratio of the first sample number proportion to the second sample number proportion.
Optionally, the feature selection module is further configured to:
carrying out binarization processing on the preset number of characteristic variables to obtain the preset number of binary variables;
and respectively carrying out feature selection on the preset number of the binary variables to obtain the preset number of the feature variables.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the following steps:
s1, obtaining a preset number of training samples;
s2, respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number;
and S3, adjusting the parameters of the scoring card model and the sample weights of the preset number of feature variables to obtain the trained scoring card model in the process of training the scoring card model according to the preset number of feature variables.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Example 4
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, obtaining a preset number of training samples;
s2, respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number;
and S3, adjusting the parameters of the scoring card model and the sample weights of the preset number of feature variables to obtain the trained scoring card model in the process of training the scoring card model according to the preset number of feature variables.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and alternatively, they may be implemented in program code that is executable by a computing device, such that it may be stored in a memory device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that shown or described herein, or separately fabricated into individual integrated circuit modules, or multiple ones of them fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present invention shall be included in the protection scope of the present invention.

Claims (10)

1. A scoring card model training method is characterized by comprising the following steps:
acquiring a preset number of training samples;
respectively selecting the characteristics of the samples with the preset number to obtain the characteristic variables with the preset number;
and in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model, and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model.
2. The method of claim 1, wherein during the training of the scorecard model according to the preset number of feature variables, adjusting parameters of the scorecard model and simultaneously adjusting sample weights of the preset number of feature variables to obtain the trained scorecard model comprises:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model in an iterative mode, and simultaneously adjusting the sample weights of the preset number of characteristic variables until the following conditions are met, stopping iteration, and obtaining the trained scoring card model:
Figure FDA0003290589660000011
Figure FDA0003290589660000012
Figure FDA0003290589660000013
Figure FDA0003290589660000014
wherein, X-j=XXjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
3. The method of claim 1, wherein before the feature selection is performed on the preset number of samples respectively to obtain the preset number of feature variables, the method further comprises:
carrying out variable discretization processing on the continuous variables in the preset number of training samples in a box dividing mode to obtain discrete variables after box dividing;
and not processing the discrete/classified variables in the preset number of training samples.
4. The method of claim 3, wherein before the feature selection is performed on the preset number of samples respectively to obtain the preset number of feature variables, the method further comprises:
and respectively carrying out evidence weight coding on the training samples with the preset number.
5. The method of claim 4, wherein the evidence weight encoding the preset number of training samples respectively comprises:
according to two classification labels on samples corresponding to target variables under different regions or different values, respectively determining a first sample number ratio that labels under a single value of the characteristics are respectively 0 and 1 and a second sample number ratio that labels on the whole samples are respectively 0 and 1;
and determining the coding value of the evidence weight coding according to the ratio of the first sample number proportion to the second sample number proportion.
6. The method according to any one of claims 1 to 5, wherein the feature selection is performed on the preset number of samples, and obtaining the preset number of feature variables comprises:
carrying out binarization processing on the preset number of characteristic variables to obtain the preset number of binarization variables;
and respectively carrying out feature selection on the preset number of the binary variables to obtain the preset number of the feature variables.
7. A scoring card model training device, comprising:
the acquisition module is used for acquiring a preset number of training samples;
the characteristic selection module is used for respectively carrying out characteristic selection on the samples with the preset number to obtain the characteristic variables with the preset number;
and the training module is used for adjusting the parameters of the scoring card model and simultaneously adjusting the sample weights of the preset number of characteristic variables to obtain the trained scoring card model in the process of training the scoring card model according to the preset number of characteristic variables.
8. The apparatus of claim 7, wherein the training module is further configured to:
in the process of training the scoring card model according to the preset number of characteristic variables, adjusting the parameters of the scoring card model in an iterative mode, and simultaneously adjusting the sample weights of the preset number of characteristic variables until the following conditions are met, stopping iteration, and obtaining the trained scoring card model:
Figure FDA0003290589660000031
Figure FDA0003290589660000032
Figure FDA0003290589660000041
Figure FDA0003290589660000042
wherein, X-j=XXjX is the characteristic variable, XjColumn j of X, IjIs the jth column of the identity matrix, beta is the parameter of the scoring card model, W is the sample weight, gamma12345Is a preset constant.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 6 when executed.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.
CN202111163339.XA 2021-09-30 2021-09-30 Grading card model training method and device Pending CN114139595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111163339.XA CN114139595A (en) 2021-09-30 2021-09-30 Grading card model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111163339.XA CN114139595A (en) 2021-09-30 2021-09-30 Grading card model training method and device

Publications (1)

Publication Number Publication Date
CN114139595A true CN114139595A (en) 2022-03-04

Family

ID=80394128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111163339.XA Pending CN114139595A (en) 2021-09-30 2021-09-30 Grading card model training method and device

Country Status (1)

Country Link
CN (1) CN114139595A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018014786A1 (en) * 2016-07-21 2018-01-25 阿里巴巴集团控股有限公司 Modeling method and device for evaluation model
CN113159461A (en) * 2021-05-24 2021-07-23 天道金科股份有限公司 Small and medium-sized micro-enterprise credit evaluation method based on sample transfer learning
CN113393320A (en) * 2021-06-22 2021-09-14 中国工商银行股份有限公司 Enterprise financial service risk prediction method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018014786A1 (en) * 2016-07-21 2018-01-25 阿里巴巴集团控股有限公司 Modeling method and device for evaluation model
CN113159461A (en) * 2021-05-24 2021-07-23 天道金科股份有限公司 Small and medium-sized micro-enterprise credit evaluation method based on sample transfer learning
CN113393320A (en) * 2021-06-22 2021-09-14 中国工商银行股份有限公司 Enterprise financial service risk prediction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙权;赵金涛;: "基于数据挖掘的商户风险评分方法和系统", 软件产业与工程, no. 01, 10 January 2016 (2016-01-10) *

Similar Documents

Publication Publication Date Title
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
CN109544197B (en) User loss prediction method and device
US11151480B1 (en) Hyperparameter tuning system results viewer
CN105159910A (en) Information recommendation method and device
CN111611488B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN109214374A (en) Video classification methods, device, server and computer readable storage medium
US10963802B1 (en) Distributed decision variable tuning system for machine learning
CN112910690A (en) Network traffic prediction method, device and equipment based on neural network model
CN115577152B (en) Online book borrowing management system based on data analysis
CN113869521A (en) Method, device, computing equipment and storage medium for constructing prediction model
Rajan et al. Crowdcontrol: An online learning approach for optimal task scheduling in a dynamic crowd platform
CN110610346A (en) Intelligent office automation system workflow instance time prediction analysis
CN111797320A (en) Data processing method, device, equipment and storage medium
EP3745317A1 (en) Apparatus and method for analyzing time series data based on machine learning
CN113962160A (en) Internet card user loss prediction method and system based on user portrait
AU2020103207A4 (en) A novel method of introducing basic elementary disturbances for testing machine learning models
CN112231299A (en) Method and device for dynamically adjusting feature library
US20210356920A1 (en) Information processing apparatus, information processing method, and program
CN114139595A (en) Grading card model training method and device
CN116915710A (en) Traffic early warning method, device, equipment and readable storage medium
CN115660101A (en) Data service providing method and device based on service node information
CN110766231A (en) Crime prediction method and system based on multi-head neural network
CN113837481A (en) Financial big data management system based on block chain
CN116432776A (en) Training method, device, equipment and storage medium of target model
CN117332923B (en) Weighting method and system for netlike index system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination