CN113538131A - Method and device for modeling modular scoring card, storage medium and electronic equipment - Google Patents

Method and device for modeling modular scoring card, storage medium and electronic equipment Download PDF

Info

Publication number
CN113538131A
CN113538131A CN202110833580.2A CN202110833580A CN113538131A CN 113538131 A CN113538131 A CN 113538131A CN 202110833580 A CN202110833580 A CN 202110833580A CN 113538131 A CN113538131 A CN 113538131A
Authority
CN
China
Prior art keywords
model
score
modeling
variables
card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110833580.2A
Other languages
Chinese (zh)
Inventor
徐建华
石坤豪
朱珊珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN202110833580.2A priority Critical patent/CN113538131A/en
Publication of CN113538131A publication Critical patent/CN113538131A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for modeling a modular score card, a storage medium and electronic equipment. Wherein, the method comprises the following steps: the sub-model is established for the score data source, modeling is carried out on a coverage sample, and a machine learning model can be generally adopted in consideration of the fact that the IV of single variables of some weak data sources is very low and the score card model is difficult to manufacture; synthesizing a wide table by the sub-model scores, and adding basic information variables such as gender, province, age and the like in addition to the sub-model scores; only one sub-model score is generated by one data source, so that the number of the input features is greatly compressed; and (3) establishing a scoring card master model, processing a score-missing sample by using WOE coding, and trying to put the score-missing sample into a box independently, wherein the score-missing sample is determined according to the risk expression of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.

Description

Method and device for modeling modular scoring card, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for modeling a modular score card, a storage medium and electronic equipment.
Background
Credit scoring techniques originally originated in the risk management area of the united states. Risk management was first advocated by the american society for regulatory agencies' insurance in 1930 and was then rapidly spreading in the banking industry. The bank predicts and obtains the risk score of the customer through a logistic regression algorithm by collecting credit investigation data of the system and various data of the user on the internet, including interpersonal relationship, historical consumption behavior, identity characteristics and the like, wherein the total score of the individual is equal to the sum of scores of modeled feature variables, the scores of the feature variables are determined by the scores of feature items with different risk characteristics of the feature variables, and generally, the scores of the feature items are determined by the risk performance of the feature items.
The scoring technique is a method of calculating a risk score for an applicant or an existing client by using a statistical model, and the statistical model used in this method is called a scoring card. The theoretical basis of the scoring card is that under the condition that the risk characteristics of the historical client are basically consistent with the future application, on the basis of a majority rule, the good-bad probability of the future client is predicted by analyzing various information of the historical client according to the relation between the attributes of the historical client and the good-bad event rate, and the probability is converted into the score, so that the business application is facilitated.
The credit scoring card is established based on statistical results of a large amount of data, and has high accuracy and reliability. To characterize client risk in an all-round manner, we generally use multidimensional data sources for modeling. The common method is to use a sample table as a seating table, associate each data source feature table and synthesize a feature large-width table. But because the coverage rate of each data source is different, the real modeling sample is the sample covered by any data source.
But there will still be some samples that are not associated with any features and then these samples need to be culled before feeding the model. For missing parts, special values may be padded to distinguish from normal values. The modeling method has the advantages of simplicity and directness, and can quickly form a large-width table after preparing each data source feature table and feed the model for training. The general characteristic screening only focuses on indexes such as stability and characteristic importance, and complex steps such as WOE (weighted average) binning, correlation analysis and collinearity analysis do not need to be performed like a traditional scoring card. The technical defect is that if the characteristics of all data fields are directly input into a machine learning model and then are roughly screened according to the importance of the characteristics, a common problem can be found: some data field variables with relatively strong financial attributes will firmly occupy the Top N position, resulting in some data field variables not being able to enter the modulus at all. And if monitoring that a variable is no longer valid or a sudden loss of a certain type of information occurs, all variables need to be re-modeled.
Based on the method, the submodels are established through the sub data sources and then are fused into the comprehensive model, different modules adopt different algorithms according to data expression, the model algorithms are more diversified, the model effect is more accurate, the submodels are flexibly assembled and the modeling efficiency is improved in a collaborative sub-mode.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
In view of the above problems, it is necessary to provide a method, an apparatus, a storage medium and an electronic device for modeling a modular score card, so as to establish a sub-model by using a score data source and then fuse the sub-model into a comprehensive model, different modules adopt different algorithms according to data expression, the model algorithms are more diversified, the model effect is more precise, and the sub-model is flexibly assembled and cooperates with the sub-model to model and improve the modeling efficiency.
In order to solve the technical problems, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for modeling a modular score card, comprising the steps of:
the sub-model is established for the data sources, the model is only established on the coverage sample, and considering that the IV of single variables of some weak data sources is very low, the scoring card model is difficult to manufacture, a machine learning model can be generally adopted, so that the model plays the roles of information extraction and feature enhancement;
the sub-model is further established by the following steps:
dividing columns of characteristic variables and primarily screening, statistically and generally measuring and calculating the discrimination and risk performance of good clients and bad clients among different columns of the characteristic variables by using WOE (fact weight), measuring and calculating the prediction capability of the characteristic variables after column division by using IV (information value), wherein the general IV value is lower than 0.02 of the characteristic variables, and the mode entry is not considered;
after the modeling data set is prepared, firstly, the whole value of each characteristic variable needs to be taken, the characteristic variables are divided into different sections (characteristic items) according to actual risk performance and a service application mode, the section is called as column division (also called as characteristic variable grouping or binning), the column division and the primary screening of the variables of the characteristic variables are determined, and the following four factors need to be balanced:
the testing of a predictive statistical index shows that the WOE divided by each column shows that the risk attributes of each group of the characteristic variables have obvious difference, the change trend of the WOE among the groups keeps consistent with the business experience (taking account age as an example, the risk is reduced along with the increase of the account age of a client), and the IV value is not lower than 0.02;
grouping stability inspection is carried out, the number of columns of the characteristic variables is between 2 and 8 columns, and if the columns are divided finely, distribution change of each group of people is too sensitive and is not beneficial to the stability of the model; if the field division is too coarse, the effective information of the change of the characteristic variable information is lost, and the grouping stability can be checked through a PSI (stability index);
performing characteristic variable correlation and multiple collinearity tests, preferentially selecting 1-3 characteristic variables with optimal predictability from the similar variables to enter a Model through univariate analysis, and screening characteristic variable combinations with high Model entering efficiency by combining with an MC (marginal contribution) value in a reference Model Builder;
and (3) service application inspection: grouping breakpoints, grouping combination of characteristic values and screening of incoming-mode variables are required to meet the requirements of service habits, policy regulations and application; for example, the basic term of product design is considered for the account age of a customer, and although WOE may show that the risk performance after admission is relatively good, the WOE needs to be forced to be combined with the worst WOE in consideration of policy effect during grouping and combining;
establishing a WOE initialization model, and establishing the WOE model on the basis of characteristic variable field division and variable primary screening;
score fitting and calibration, after the relation between the goodness-to-goodness ratio logarithm and the total score is fitted, the total score is converted into a value range which is used by a user through a calibration process. Generally, the score calculated by the model is called an uncalibrated score, and the calibrated score is called a calibrated score;
the user may make provisions for the relevant requirements of the calibration, such as: defining a standard quality ratio corresponding to the standard score (for example, the quality ratio corresponding to the 600 score is 20: 1);
defining a standard PDO (PDO is the value of the increased value required by doubling the quality ratio, when the PDO is 40, if the quality ratio corresponding to 600 points is 20:1, the quality ratio corresponding to 640 points is 40: 1);
the calibration score of each column of the characteristic variables is a positive value;
the variable is subjected to a mode selection standard, the variable IV is more than 0.02, and the marginal contribution value MC ranges over a certain index, generally 0.03;
grouping the characteristic variables into 2-8 columns, wherein the grouping risk trend (WOE) reflects the business experience characteristics;
selecting different types of variables to reflect different risk traits;
the number of characteristic variables of each module selected to be inserted into the module is generally between 3 and 15.
Synthesizing a wide table by the sub-model scores, and adding basic information variables such as gender, province, age and the like in addition to the sub-model scores; since only one submodel score is generated for one data source, the number of in-mode features is heavily compressed.
And (3) establishing a scoring card master model, processing a score-missing sample by using WOE coding, and trying to put the score-missing sample into a box independently, wherein the score-missing sample is determined according to the risk expression of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
According to another aspect of the present invention, there is provided an apparatus for modeling a modular score card, the apparatus including:
the sub-model module is established for the data source, modeling is only carried out on the coverage sample, and considering that the IV of a single variable of some weak data sources is very low, a scoring card model is difficult to manufacture, a machine learning model can be generally adopted, so that the model plays a role in information extraction and feature enhancement;
the sub-model is further established by the following steps:
dividing columns of characteristic variables and primarily screening, statistically and generally measuring and calculating the discrimination and risk performance of good clients and bad clients among different columns of the characteristic variables by using WOE (fact weight), measuring and calculating the prediction capability of the characteristic variables after column division by using IV (information value), wherein the general IV value is lower than 0.02 of the characteristic variables, and the mode entry is not considered;
after the modeling data set is prepared, firstly, the whole value of each characteristic variable needs to be taken, the characteristic variables are divided into different sections (characteristic items) according to actual risk performance and a service application mode, the section is called as column division (also called as characteristic variable grouping or binning), the column division and the primary screening of the variables of the characteristic variables are determined, and the following four factors need to be balanced:
the testing of a predictive statistical index shows that the WOE divided by each column shows that the risk attributes of each group of the characteristic variables have obvious difference, the change trend of the WOE among the groups keeps consistent with the business experience (taking account age as an example, the risk is reduced along with the increase of the account age of a client), and the IV value is not lower than 0.02;
grouping stability inspection is carried out, the number of columns of the characteristic variables is between 2 and 8 columns, and if the columns are divided finely, distribution change of each group of people is too sensitive and is not beneficial to the stability of the model; if the field division is too coarse, the effective information of the change of the characteristic variable information is lost, and the grouping stability can be checked through a PSI (stability index);
performing characteristic variable correlation and multiple collinearity tests, preferentially selecting 1-3 characteristic variables with optimal predictability from the similar variables to enter a Model through univariate analysis, and screening characteristic variable combinations with high Model entering efficiency by combining with an MC (marginal contribution) value in a reference Model Builder;
and (3) service application inspection: grouping breakpoints, grouping combination of characteristic values and screening of incoming-mode variables are required to meet the requirements of service habits, policy regulations and application; for example, the basic term of product design is considered for the account age of a customer, and although WOE may show that the risk performance after admission is relatively good, the WOE needs to be forced to be combined with the worst WOE in consideration of policy effect during grouping and combining;
establishing a WOE initialization model, and establishing the WOE model on the basis of characteristic variable field division and variable primary screening;
score fitting and calibration, after the relation between the goodness-to-goodness ratio logarithm and the total score is fitted, the total score is converted into a value range which is used by a user through a calibration process. Generally, the score calculated by the model is called an uncalibrated score, and the calibrated score is called a calibrated score;
the user may make provisions for the relevant requirements of the calibration, such as: defining a standard quality ratio corresponding to the standard score (for example, the quality ratio corresponding to the 600 score is 20: 1);
defining a standard PDO (PDO is the value of the increased value required by doubling the quality ratio, when the PDO is 40, if the quality ratio corresponding to 600 points is 20:1, the quality ratio corresponding to 640 points is 40: 1);
the calibration score of each column of the characteristic variables is a positive value;
the variable is subjected to a mode selection standard, the variable IV is more than 0.02, and the marginal contribution value MC ranges over a certain index, generally 0.03;
grouping the characteristic variables into 2-8 columns, wherein the grouping risk trend (WOE) reflects the business experience characteristics;
selecting different types of variables to reflect different risk traits;
the number of characteristic variables of each module selected to be inserted into the module is generally between 3 and 15.
The submodel scores are combined into a wide table module, and basic information variables such as gender, province, age and the like can be added in addition to the submodel scores; since only one submodel score is generated for one data source, the number of in-mode features is heavily compressed.
And establishing a scoring card main model module, processing a sample with a missing score by using WOE coding, and trying to put the sample into a box independently, wherein the sample is determined according to the risk performance of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
According to still another aspect of the present invention, there is provided an electronic apparatus including: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the modular scoring card modeling method.
According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, where the executable instruction causes a processor to perform operations corresponding to the above modular scorecard modeling method.
The invention has the beneficial effects that:
according to the method, sub models are established through data sources and then are fused into a comprehensive model. The method provided by the embodiment of the invention enables the characteristics of a plurality of data fields to be in a module, and the information dimensionality is richer and more comprehensive. Different modules adopt different algorithms according to data expression, so that the model algorithm is more diversified, and the model effect is more accurate. The sub-models are flexibly assembled and are modeled in a collaborative division mode, so that the modeling efficiency is improved. When a certain submodel changes, only the submodel needs to be modeled again, and the submodels are independent and do not interfere with each other. The priority setting of the data source is supported, and the purpose of controlling the cost of the data source is achieved. If the data source is adjusted, the standby data source can be used for replacing, and plug-in quick online is realized.
The above description of the present invention is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly illustrated and to make the implementation possible according to the content of the description, and in order to make the above and other objects, features and advantages of the present invention more clearly understandable, the following description of the embodiments of the present invention is given.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a flow chart diagram illustrating a method for modeling a modular scorecard according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an apparatus structure of a method for modeling a modular score card according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device of a method for modeling a modular score card according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Fig. 1 is a schematic flow chart illustrating a method for modeling a modular score card according to an embodiment of the present invention, and referring to fig. 1, the method includes:
step S1, the sub-models are established for the data sources, modeling is only carried out on the coverage samples, and considering that the IV of single variables of some weak data sources is very low, the scoring card models are difficult to manufacture, machine learning models can be generally adopted, so that the models have the functions of information extraction and feature enhancement;
the sub-model is further established by the following steps:
dividing columns of characteristic variables and primarily screening, statistically and generally measuring and calculating the discrimination and risk performance of good clients and bad clients among different columns of the characteristic variables by using WOE (fact weight), measuring and calculating the prediction capability of the characteristic variables after column division by using IV (information value), wherein the general IV value is lower than 0.02 of the characteristic variables, and the mode entry is not considered;
after the modeling data set is prepared, firstly, the whole value of each characteristic variable needs to be taken, the characteristic variables are divided into different sections (characteristic items) according to actual risk performance and a service application mode, the section is called as column division (also called as characteristic variable grouping or binning), the column division and the primary screening of the variables of the characteristic variables are determined, and the following four factors need to be balanced:
the testing of a predictive statistical index shows that the WOE divided by each column shows that the risk attributes of each group of the characteristic variables have obvious difference, the change trend of the WOE among the groups keeps consistent with the business experience (taking account age as an example, the risk is reduced along with the increase of the account age of a client), and the IV value is not lower than 0.02;
grouping stability inspection is carried out, the number of columns of the characteristic variables is between 2 and 8 columns, and if the columns are divided finely, distribution change of each group of people is too sensitive and is not beneficial to the stability of the model; if the field division is too coarse, the effective information of the change of the characteristic variable information is lost, and the grouping stability can be checked through a PSI (stability index);
performing characteristic variable correlation and multiple collinearity tests, preferentially selecting 1-3 characteristic variables with optimal predictability from the similar variables to enter a Model through univariate analysis, and screening characteristic variable combinations with high Model entering efficiency by combining with an MC (marginal contribution) value in a reference Model Builder;
and (3) service application inspection: grouping breakpoints, grouping combination of characteristic values and screening of incoming-mode variables are required to meet the requirements of service habits, policy regulations and application; for example, the basic term of product design is considered for the account age of a customer, and although WOE may show that the risk performance after admission is relatively good, the WOE needs to be forced to be combined with the worst WOE in consideration of policy effect during grouping and combining;
establishing a WOE initialization model, and establishing the WOE model on the basis of characteristic variable field division and variable primary screening;
score fitting and calibration, after the relation between the goodness-to-goodness ratio logarithm and the total score is fitted, the total score is converted into a value range which is used by a user through a calibration process. Generally, the score calculated by the model is called an uncalibrated score, and the calibrated score is called a calibrated score;
the user may make provisions for the relevant requirements of the calibration, such as: defining a standard quality ratio corresponding to the standard score (for example, the quality ratio corresponding to the 600 score is 20: 1);
defining a standard PDO (PDO is the value of the increased value required by doubling the quality ratio, when the PDO is 40, if the quality ratio corresponding to 600 points is 20:1, the quality ratio corresponding to 640 points is 40: 1);
the calibration score of each column of the characteristic variables is a positive value;
the variable is subjected to a mode selection standard, the variable IV is more than 0.02, and the marginal contribution value MC ranges over a certain index, generally 0.03;
grouping the characteristic variables into 2-8 columns, wherein the grouping risk trend (WOE) reflects the business experience characteristics;
selecting different types of variables to reflect different risk traits;
the number of characteristic variables of each module selected to be inserted into the module is generally between 3 and 15.
Step S2, synthesizing a broad table by the sub-model scores, and adding basic information variables such as gender, province, age and the like in addition to the sub-model scores; since only one submodel score is generated for one data source, the number of in-mode features is heavily compressed.
Step S3, a scoring card master model is established, a sample with a missing score is processed by using WOE coding, and the sample can be put into a box independently and is specifically determined according to the risk performance of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
Fig. 2 shows a schematic structural diagram of an apparatus 20 for modeling a modular score card according to an embodiment of the present invention, including:
the method comprises the steps that a sub-model module is established for 201 data sources, modeling is only carried out on a coverage sample, and considering that the IV of single variables of some weak data sources is very low, a scoring card model is difficult to manufacture, a machine learning model can be generally adopted, so that the model plays a role in information extraction and feature enhancement;
202, a submodel score synthesis wide table module, which can add basic information variables such as gender, province, age and the like in addition to the submodel score; only one sub-model score is generated by one data source, so that the number of the input features is greatly compressed;
203, establishing a scoring card main model module, processing a sample with a missing score by using WOE coding, and trying to put the sample into a box independently, wherein the sample is determined according to the risk performance of the missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
Fig. 3 is a schematic structural diagram of an electronic device of a method for modeling a modular score card according to an embodiment of the present invention. The electronic device 1100 may be a host server with computing capabilities, a personal computer PC, or a portable computer or terminal that is portable, or the like. The specific embodiment of the present invention does not limit the specific implementation of the electronic device.
The electronic device 1100 includes at least one processor (processor)1110, a Communications Interface 1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 communicate with each other via the bus 1140.
The communication interface 1120 is used for communicating with network elements including, for example, virtual machine management centers, shared storage, etc.
Processor 1110 is configured to execute programs. Processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The memory 1130 is used for executable instructions. The memory 1130 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1130 may also be a memory array. The storage 1130 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules. The instructions stored by the memory 1130 are executable by the processor 1110 to enable the processor 1110 to perform the required matching method of any of the above-described method embodiments.
An embodiment of the present invention further provides a storage medium, where the storage medium stores computer-executable instructions, which include a program for executing the above-mentioned required matching method, and the computer-executable instructions can execute the method in any of the above-mentioned method embodiments.
The storage medium may be any available medium or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, nonvolatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of modular scorecard modeling, the method comprising:
the sub-model is established for the score data source, modeling is carried out on a coverage sample, and a machine learning model can be generally adopted in consideration of the fact that the IV of single variables of some weak data sources is very low and the score card model is difficult to manufacture;
synthesizing a wide table by the sub-model scores, and adding basic information variables such as gender, province, age and the like in addition to the sub-model scores; only one sub-model score is generated by one data source, so that the number of the input features is greatly compressed;
and (3) establishing a scoring card master model, processing a score-missing sample by using WOE coding, and trying to put the score-missing sample into a box independently, wherein the score-missing sample is determined according to the risk expression of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
2. The modeling method of a modular grading card as claimed in claim 1, wherein the field division and the preliminary screening of the feature variables are statistically measured by using WOE (fact weight) to measure the discrimination and risk performance of good or bad clients among different fields of the feature variables, and the prediction ability of the feature variables after the field division is measured by IV (information value), and the feature variables with IV value lower than 0.02 are generally measured without considering the modeling.
3. The modular grading card modeling method according to claim 1, wherein after the modeling data set is prepared, the whole value of each feature variable is first taken, the feature variable is divided into different attribute segments (feature items) according to actual risk performance and a service application mode, the segment segments are called field partitions (also called feature variable grouping or binning), and the field partitions of the feature variables and the primary screening of the variables are determined.
4. The modeling method of a modular scoring card as claimed in claim 1, wherein a WOE initialization model is constructed, and the WOE model is constructed based on the feature variable field division and the preliminary screening of variables.
5. The modeling method of a modular rating card of claim 1, wherein score fitting and calibration are performed, after the relation between the goodness-to-goodness log and the total score is fitted, the total score is converted to a value range to which a user is accustomed, generally, the score calculated by the model is called an uncalibrated score, and the score after calibration is a calibrated score.
6. The modular scorecard modeling method of claim 1, wherein a user may specify calibration related requirements.
7. A modular scorecard modeling method according to claim 6, wherein a number of feature variables to be modeled is generally selected for each module to be between 3 and 15.
8. An apparatus for modular scorecard modeling, comprising:
the sub-model module is established for the data source, modeling is only carried out on the coverage sample, and considering that the IV of a single variable of some weak data sources is very low, a scoring card model is difficult to manufacture, a machine learning model can be generally adopted, so that the model plays a role in information extraction and feature enhancement;
the submodel scores are combined into a wide table module, and basic information variables such as gender, province, age and the like can be added in addition to the submodel scores; only one sub-model score is generated by one data source, so that the number of the input features is greatly compressed;
and establishing a scoring card main model module, processing a sample with a missing score by using WOE coding, and trying to put the sample into a box independently, wherein the sample is determined according to the risk performance of missing data. Then according to the standard modeling process of the scoring card, a main scoring card is manufactured; the main scoring card model is generally established by adopting a simpler and direct logistic regression algorithm, and the scoring interpretability is stronger.
9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the modular scoring card modeling method according to any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform operations corresponding to the modular scorecard modeling method of any of claims 1-7.
CN202110833580.2A 2021-07-23 2021-07-23 Method and device for modeling modular scoring card, storage medium and electronic equipment Pending CN113538131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110833580.2A CN113538131A (en) 2021-07-23 2021-07-23 Method and device for modeling modular scoring card, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110833580.2A CN113538131A (en) 2021-07-23 2021-07-23 Method and device for modeling modular scoring card, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113538131A true CN113538131A (en) 2021-10-22

Family

ID=78088740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110833580.2A Pending CN113538131A (en) 2021-07-23 2021-07-23 Method and device for modeling modular scoring card, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113538131A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997419A (en) * 2022-07-18 2022-09-02 北京芯盾时代科技有限公司 Updating method and device of rating card model, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424018A (en) * 2013-08-23 2015-03-18 阿里巴巴集团控股有限公司 Distributed calculating transaction processing method and device
CN109583651A (en) * 2018-12-03 2019-04-05 焦点科技股份有限公司 A kind of method and apparatus for insuring electric business platform user attrition prediction
CN110322335A (en) * 2019-04-15 2019-10-11 梵界信息技术(上海)股份有限公司 A kind of credit customer qualification classification method passing through machine learning based on WOE conversion
CN110956273A (en) * 2019-11-07 2020-04-03 中信银行股份有限公司 Credit scoring method and system integrating multiple machine learning models
CN111583031A (en) * 2020-05-15 2020-08-25 上海海事大学 Application scoring card model building method based on ensemble learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424018A (en) * 2013-08-23 2015-03-18 阿里巴巴集团控股有限公司 Distributed calculating transaction processing method and device
CN109583651A (en) * 2018-12-03 2019-04-05 焦点科技股份有限公司 A kind of method and apparatus for insuring electric business platform user attrition prediction
CN110322335A (en) * 2019-04-15 2019-10-11 梵界信息技术(上海)股份有限公司 A kind of credit customer qualification classification method passing through machine learning based on WOE conversion
CN110956273A (en) * 2019-11-07 2020-04-03 中信银行股份有限公司 Credit scoring method and system integrating multiple machine learning models
CN111583031A (en) * 2020-05-15 2020-08-25 上海海事大学 Application scoring card model building method based on ensemble learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997419A (en) * 2022-07-18 2022-09-02 北京芯盾时代科技有限公司 Updating method and device of rating card model, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Kalliovirta et al. Gaussian mixture vector autoregression
CN107168995B (en) Data processing method and server
CN111612039A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN111104453A (en) Data query method, device, terminal and storage medium
CN108717496B (en) Radar antenna array surface fault detection method and system
CN111091276A (en) Enterprise risk scoring method and device, computer equipment and storage medium
CN111475494A (en) Mass data processing method, system, terminal and storage medium
Hlávka et al. Change-point methods for multivariate time-series: paired vectorial observations
CN113538131A (en) Method and device for modeling modular scoring card, storage medium and electronic equipment
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN112882956B (en) Method and device for automatically generating full-scene automatic test cases through data combination calculation, storage medium and electronic equipment
Garcia‐Jorcano et al. Volatility specifications versus probability distributions in VaR forecasting
CN114140152A (en) Cloud platform customer management system and method
Joseph A PD validation framework for Basel II internal ratings-based systems
Audrino et al. Oracle Properties, Bias Correction, and Bootstrap Inference for Adaptive Lasso for Time Series M‐Estimators
CN116756494A (en) Data outlier processing method, apparatus, computer device, and readable storage medium
CN116975520A (en) Reliability evaluation method, device, equipment and storage medium for AB experiment
CN109542947B (en) Data statistical method, device, computer equipment and storage medium
JPH11175602A (en) Credit risk measuring device
CN113095604B (en) Fusion method, device and equipment of product data and storage medium
CN113850523A (en) ESG index determining method based on data completion and related product
CN109933579B (en) Local K neighbor missing value interpolation system and method
Chen et al. Estimating time-varying networks for high-dimensional time series
CN109670976B (en) Feature factor determination method and device
CN111429232A (en) Product recommendation method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination