CN116308722A - Cold start modeling method and device for wind control model and storage medium - Google Patents

Cold start modeling method and device for wind control model and storage medium Download PDF

Info

Publication number
CN116308722A
CN116308722A CN202211093009.2A CN202211093009A CN116308722A CN 116308722 A CN116308722 A CN 116308722A CN 202211093009 A CN202211093009 A CN 202211093009A CN 116308722 A CN116308722 A CN 116308722A
Authority
CN
China
Prior art keywords
modeling
data
target
screening
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211093009.2A
Other languages
Chinese (zh)
Inventor
周波
任咪咪
林敏�
陈蓓珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huifu Network Technology Co ltd
Original Assignee
Zhejiang Huifu Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huifu Network Technology Co ltd filed Critical Zhejiang Huifu Network Technology Co ltd
Priority to CN202211093009.2A priority Critical patent/CN116308722A/en
Publication of CN116308722A publication Critical patent/CN116308722A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Mathematics (AREA)
  • Marketing (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Educational Administration (AREA)
  • Mathematical Analysis (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)

Abstract

The embodiment of the application discloses a wind control model cold start modeling method, a device and a storage medium, wherein the wind control model cold start modeling method comprises the following steps: acquiring first field data and second field data, respectively acquiring data distribution conditions of the first field data and the second field data about first characteristics through box division, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions to meet preset requirements from the first characteristics, screening the second field data based on the second characteristics, combining the first field data and the screened second field data, and further acquiring target modeling sample data; acquiring initial modeling characteristics, and screening the initial modeling characteristics based on target modeling sample data to obtain target modeling characteristics; based on the target modeling characteristics, a target model is obtained through modeling by a scoring card modeling method, and is used for scoring the input client data and screening bad clients through scoring.

Description

Cold start modeling method and device for wind control model and storage medium
Technical Field
The application relates to the technical field of computer information processing, in particular to a method and a device for modeling a cold start of a wind control model and a storage medium.
Background
The home decoration industry in 2020 has reached 2.61 trillion yuan (about 4 trillion in home decoration), and the rapidly growing home decoration consumption is about to release the financial demand of trillion yuan. However, the current home decoration industry has lower financial permeability, enough user data is not accumulated in the current home decoration stage, no product with the same mode can be referred, the existing wind control modeling algorithm in the market is mainly applied to a cold start stage of a new service online and lacking a label sample, and is suitable for a cold start scene without a label sample, and a processing method is lacking for a scene with a label but a small data volume. The existing method cannot directly build a credit application admittance model through home decoration data with labels but smaller data volume, so that the credit risk of a customer cannot be reduced better, the credit risk management level of the customer is improved, and the financial permeability of home decoration is effectively improved.
Disclosure of Invention
An embodiment of the application aims to provide a cold start modeling method, a cold start modeling device and a storage medium for a wind control model, which are used for solving the problems that in the prior art, a credit application admittance model cannot be directly built through home decoration data with labels but smaller data volume, so that the credit risk of a customer cannot be reduced better, the credit risk management level of the customer is improved, and the financial permeability of the home decoration is effectively improved.
In order to achieve the above objective, an embodiment of the present application provides a method for modeling a wind control model in cold start, including the steps of: acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data;
acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics;
and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
Optionally, based on the second feature, the method for screening the second domain data includes:
performing equal frequency box division operation on the second characteristics, and respectively combining different boxes to obtain a plurality of different types of third characteristics;
and distinguishing the first field data based on the third feature to obtain the proportion of the data corresponding to the third feature in the first field data, and screening the second field data based on the third feature according to the same proportion.
Optionally, the method for screening the initial modeling feature based on the target modeling sample data to obtain the target modeling feature includes:
performing equal frequency bin division on the initial modeling characteristics, and performing first screening on the initial modeling characteristics through information value indexes, group stability indexes and/or correlation indexes;
carrying out chi-square classification on the initial modeling characteristics remained after the first screening, and carrying out second screening on the initial modeling characteristics through information value indexes, population stability indexes and/or correlation indexes;
performing a third screening of the initial modeled features remaining after the performing of the second screening, including: deleting the missing value and the unique value, converting the category type feature into a numerical type feature, and removing the redundant feature to obtain the target modeling feature.
Optionally, based on the target modeling feature, the method for modeling by using the score card modeling method to obtain the target model includes:
screening the target modeling features by using the reject ratio index, and converting the screened target modeling features into target bin data serving as modeling input features;
preprocessing the target bin data, using the target modeling sample data, determining definitions of good samples and bad samples, and training a logistic regression model;
after the logistic regression model is evaluated to reach the standard, converting the prediction result of the logistic regression model into the score;
and determining a score threshold value for judging the bad client, screening the bad client based on the score, and finally obtaining the target model.
Optionally, after obtaining the target model, the method further includes:
and evaluating the effect and stability of the target model through model evaluation indexes, verifying the effect of the target model by using the target modeling sample data to obtain a verification result, and adjusting the target model based on the verification result.
Optionally, the first feature includes: age, credit rating, current credit amount, amount of credits, number of credits and/or whether there is an own house;
the second feature includes: age, credit rating and/or whether there is an own housing;
and the initial modeling feature is selected from secondary pedestrian credit derivative features.
Optionally, the method for determining the definition of the good sample and the bad sample comprises:
and analyzing the post-credit data according to the flow meter and the bill meter, and determining the definition of the good sample and the bad sample through account age analysis, rolling rate and mobility.
Optionally, the method for deleting the missing value and the unique value includes: deleting the initial modeling features with the deletion proportion larger than a preset value, filling the rest initial modeling features with modes and average values according to feature meanings, and deleting a list of initial modeling features with only one value;
the method for removing the redundant features comprises the following steps: removing the redundant features in the initial modeling features through correlation verification and multiple collinearity calculation.
In order to achieve the above object, the present application further provides a wind control model cold start modeling apparatus, including: a memory; and
a processor coupled to the memory, the processor configured to:
acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data;
acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics;
and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
To achieve the above object, the present application also provides a computer storage medium having stored thereon a computer program which, when executed by a machine, implements the steps of the method as described above.
The embodiment of the application has the following advantages:
1. the embodiment of the application provides a cold start modeling method of a wind control model, which comprises the following steps: acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data; acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics; and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
According to the method, modeling samples are migrated from the second field by means of the data analysis method so as to have enough modeling samples, the defect of small data size in the first field is overcome, modeling is performed based on target modeling sample data obtained after migration and combination, and the model is applied to the home decoration field with labels but small data size, so that the credit risk of a customer is reduced better, the credit risk management level of the customer is improved, and the financial permeability of the home decoration is effectively improved; the method solves the problem of cold starting of home decoration data, enables the home decoration field to have enough data to analyze user risks in an admission stage by means of data in other financial fields, further opens home decoration markets, and improves financial permeability of home decoration industries; the method adopts the scoring card model, can provide better service interpretation for the effect of the follow-up access model, is helpful for intuitively finding out the effect of each index, provides basis for the follow-up adjustment of the model effect, and provides an intelligent and digital direction for building the credit application access model in the home decoration field.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
FIG. 1 is a flowchart of a method for modeling a cold start of a wind control model according to an embodiment of the present application;
fig. 2 is a block diagram of a cold start modeling device for a wind control model according to an embodiment of the present application.
Detailed Description
Other advantages and advantages of the present application will become apparent to those skilled in the art from the following description of specific embodiments, which is to be read in light of the present disclosure, wherein the present embodiments are described in some, but not all, of the several embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In addition, the technical features described below in the different embodiments of the present application may be combined with each other as long as they do not collide with each other.
An embodiment of the present application provides a method for modeling a wind control model cold start, referring to fig. 1, fig. 1 is a flowchart of a method for modeling a wind control model cold start provided in an embodiment of the present application, and it should be understood that the method may further include additional blocks not shown and/or may omit the blocks shown, and the scope of the present application is not limited in this respect.
In the embodiment of the application, the home decoration field is taken as a first field, and the automobile field is taken as a second field for explanation, and it should be understood that the method and the device of the application can be applied to other financial fields, can solve the same technical problems, and have the same technical effects.
At step 101, first domain data and second domain data are acquired, data distribution conditions of the first domain data and the second domain data about first features are acquired through binning, second features enabling similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements are screened out from the first features, the second domain data are screened out based on the second features, and the first domain data and the screened second domain data are combined, so that target modeling sample data are obtained.
In some embodiments, the first feature comprises: age, credit rating, current credit amount, amount of credits, number of credits and/or whether there is an own house; the second feature includes: age, credit rating, and/or whether there is a living accommodation.
In some embodiments, based on the second feature, the method of screening the second domain data comprises: performing equal frequency box division operation on the second characteristics, and respectively combining different boxes to obtain a plurality of different types of third characteristics; and distinguishing the first field data based on the third feature to obtain the proportion of the data corresponding to the third feature in the first field data, and screening the second field data based on the third feature according to the same proportion.
Specifically, data migration is performed, and data distribution conditions are analyzed by classifying data (first field data and second field data) of the home decoration field and the automobile field respectively according to first characteristics such as age, credit score, current credit sum, house credit amount, house credit period number and whether the home is owned or not. Screening second features enabling home decoration and automobile distribution to be consistent (the similarity of data distribution proportion meets preset requirements), and screening data in automobile samples by using the second features.
And finally screening out second characteristic dimensions including age, credit score, whether the characteristics of the living houses exist or not, and the like, and based on the second characteristics, screening out the second field data by the method: firstly, the second features are divided into boxes, wherein equal frequency boxes are adopted for the boxes, then different boxes are respectively combined to form n (18 in the embodiment) different types of third features, the home decoration data are distinguished, and finally, the data screening is carried out according to the n types of the duty ratios in the home decoration field data and the same duty ratio in the automobile field data and the same third features.
And after screening, combining the home decoration field data and the screened automobile field data to form a final target modeling sample.
At step 102, initial modeling features are obtained, and the initial modeling features are screened based on the target modeling sample data to obtain target modeling features.
In some embodiments, the initial modeling feature is selected from secondary pedestrian credit derivative features.
In some embodiments, the method for screening the initial modeling feature based on the target modeling sample data to obtain the target modeling feature includes: performing equal frequency bin division on the initial modeling characteristics, and performing first screening on the initial modeling characteristics through information value indexes, group stability indexes and/or correlation indexes; carrying out chi-square classification on the initial modeling characteristics remained after the first screening, and carrying out second screening on the initial modeling characteristics through information value indexes, population stability indexes and/or correlation indexes; performing a third screening of the initial modeled features remaining after the performing of the second screening, including: deleting the missing value and the unique value, converting the category type feature into a numerical type feature, and removing the redundant feature to obtain the target modeling feature.
In some embodiments, the method of deleting missing values and unique values includes: deleting the initial modeling features with the deletion proportion larger than a preset value, filling the rest initial modeling features with modes and average values according to feature meanings, and deleting a list of initial modeling features with only one value; the method for removing the redundant features comprises the following steps: removing the redundant features in the initial modeling features through correlation verification and multiple collinearity calculation.
Specifically, initial modeling feature construction and feature screening are performed, wherein the initial modeling features are mainly credit feature, and 25000+ features derived from second-generation pedestrian credit are used for modeling. The batch of features are mainly characterized by dimensions such as personal basic information, information summaries, credit transaction information details, non-credit transaction information details, public information details, other labeling and statement information, query records and the like. The payment capability and the payment willingness of the user are displayed in a multi-dimensional and omnibearing manner, and a foundation is laid for building a credit application admission model to identify the risk of the user and reduce the credit risk of the client.
The method comprises the steps of screening initial modeling features according to home decoration and automobile data (namely target modeling sample data) of existing labels, screening the initial modeling features for multiple times due to rich feature dimensions, firstly using an equal-frequency bin, carrying out initial screening on the initial modeling features through IV (information value index), PSI (group stability index) and correlation, then carrying out chi-square bin classification on the remaining features, and screening again through IV, PSI and correlation.
The missing values, unique values, and class type features are then processed. And deleting initial modeling features with the missing proportion being larger than a preset value (90% in the embodiment), and then filling the mode and the mean value of other initial modeling features according to the meaning of the features. Then deleting a list of initial modeling features with only one value, and finally converting the category type features into numerical values to participate in the final modeling process.
And removing redundant features from the initial modeling features with higher and more stable bid value information through correlation detection and multiple collinearity calculation, and finally screening out target modeling features.
At step 103, a target model is modeled by a scoring card modeling method based on the target modeling features, the target model being used to score the input customer data and to screen bad customers by the scoring.
In some embodiments, the method for modeling the target model by a score card modeling method based on the target modeling features comprises: screening the target modeling features by using the reject ratio index, and converting the screened target modeling features into target bin data serving as modeling input features; preprocessing the target bin data, using the target modeling sample data, determining definitions of good samples and bad samples, and training a logistic regression model; after the logistic regression model is evaluated to reach the standard, converting the prediction result of the logistic regression model into the score; and determining a score threshold value for judging the bad client, screening the bad client based on the score, and finally obtaining the target model.
Specifically, the scoring card modeling process is divided into the following steps:
1. variable analysis and box division: and (3) analyzing which target modeling features are related to the quality of the client by using a badRate (reject ratio index), and converting the target modeling features into box-division data as input features of modeling.
2. Modeling
(1) Data preprocessing: the binned data is transformed woe (evidence weight) and normalized.
(2) Stepwise regression is used to select as few features as possible (while maintaining modeling effectiveness).
(3) A logistic regression model is trained.
3. Model evaluation: check if AUC (Area Under the Curve, the area enclosed by the coordinate axes under the ROC curve) meets the standard, and check if the coefficients are all positive.
4. And converting the prediction result of the logistic regression model into a score.
5. A score threshold is determined for the final decision as bad clients for screening bad clients.
In some embodiments, the method of determining the definition of good and bad samples comprises: and analyzing the post-credit data according to the flow meter and the bill meter, and determining the definition of the good sample and the bad sample through account age analysis, rolling rate and mobility.
Specifically, post-credit data is analyzed according to the flow meter and the billing table, and bad sample definitions are determined through Vintage, rolling rate and mobility.
And Vintage, account age analysis, which is used for analyzing the maturity, change rule and the like of the account, observing the overdue condition of the customer through the paying account age, namely observing the change trend (accumulated value) of overdue rate along with the increase of the account age. Wherein the overdue definition uses the ever caliber.
Mobility, flow Rate, used to define account quality, can display customer loan account in the whole life cycle change track.
Roll Rate, used to analyze the conversion Rate between different overdue states, from the worst state for a period of time (observation period) before a certain observation point to the worst state for a period of time (presentation period) after the observation point.
Wherein Vintage analysis is used to determine the proper performance period, rolling rate analysis is used to define the quality of the customer, where bad samples are defined as samples where performance period is 6 months and the number of days over 15 days, good samples are defined as samples where performance period is 6 months and the number of days over 3 days, and gray samples are defined as samples where performance period is 6 months and the number of days over 15 days over 3 days.
In some embodiments, after obtaining the target model, further comprising: and evaluating the effect and stability of the target model through model evaluation indexes, verifying the effect of the target model by using the target modeling sample data to obtain a verification result, and adjusting the target model based on the verification result.
Specifically, using the screened modeling characteristics of the modeling targets, performing score card model modeling on the target modeling samples, and screening out bad samples according to the final output scores. The effect and stability of the model are evaluated by ROC ((Receiver Operating Characteristic) curve, which is fully called a subject work characteristic curve), KS (kolmokorov-schmidov test), and model PSI (stability index), the effect of the model is verified on a training set sample and a time out sample in a target modeling sample, the model effect is adjusted, and finally the effect of the model is verified on home decoration field data. The method realizes that the data migration on the automobile field data sample completes the admission model of the home decoration field data, and achieves a certain risk control capability.
According to the method, modeling samples are migrated from the second field by means of the data analysis method so as to have enough modeling samples, the defect of small data size in the first field is overcome, modeling is performed based on target modeling sample data obtained after migration and combination, and the model is applied to the home decoration field with labels but small data size, so that the credit risk of a customer is reduced better, the credit risk management level of the customer is improved, and the financial permeability of the home decoration is effectively improved; the method solves the problem of cold starting of home decoration data, enables the home decoration field to have enough data to analyze user risks in an admission stage by means of data in other financial fields, further opens home decoration markets, and improves financial permeability of home decoration industries; the method adopts the scoring card model, can provide better service interpretation for the effect of the follow-up access model, is helpful for intuitively finding out the effect of each index, provides basis for the follow-up adjustment of the model effect, and provides an intelligent and digital direction for building the credit application access model in the home decoration field.
Fig. 2 is a block diagram of a cold start modeling device for a wind control model according to an embodiment of the present application. The device comprises:
a memory 201; and a processor 202 connected to the memory 201, the processor 202 configured to: acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data;
acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics;
and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
In some embodiments, the processor 202 is further configured to: based on the second feature, the method for screening the second domain data comprises the following steps:
performing equal frequency box division operation on the second characteristics, and respectively combining different boxes to obtain a plurality of different types of third characteristics;
and distinguishing the first field data based on the third feature to obtain the proportion of the data corresponding to the third feature in the first field data, and screening the second field data based on the third feature according to the same proportion.
In some embodiments, the processor 202 is further configured to: the initial modeling feature is screened based on the target modeling sample data, and the method for obtaining the target modeling feature comprises the following steps:
performing equal frequency bin division on the initial modeling characteristics, and performing first screening on the initial modeling characteristics through information value indexes, group stability indexes and/or correlation indexes;
carrying out chi-square classification on the initial modeling characteristics remained after the first screening, and carrying out second screening on the initial modeling characteristics through information value indexes, population stability indexes and/or correlation indexes;
performing a third screening of the initial modeled features remaining after the performing of the second screening, including: deleting the missing value and the unique value, converting the category type feature into a numerical type feature, and removing the redundant feature to obtain the target modeling feature.
In some embodiments, the processor 202 is further configured to: based on the target modeling characteristics, the method for modeling to obtain the target model through a score card modeling method comprises the following steps:
screening the target modeling features by using the reject ratio index, and converting the screened target modeling features into target bin data serving as modeling input features;
preprocessing the target bin data, using the target modeling sample data, determining definitions of good samples and bad samples, and training a logistic regression model;
after the logistic regression model is evaluated to reach the standard, converting the prediction result of the logistic regression model into the score;
and determining a score threshold value for judging the bad client, screening the bad client based on the score, and finally obtaining the target model.
In some embodiments, the processor 202 is further configured to: after obtaining the target model, the method further comprises the following steps:
and evaluating the effect and stability of the target model through model evaluation indexes, verifying the effect of the target model by using the target modeling sample data to obtain a verification result, and adjusting the target model based on the verification result.
In some embodiments, the processor 202 is further configured to: the first feature includes: age, credit rating, current credit amount, amount of credits, number of credits and/or whether there is an own house;
the second feature includes: age, credit rating and/or whether there is an own housing;
and the initial modeling feature is selected from secondary pedestrian credit derivative features.
In some embodiments, the processor 202 is further configured to: the method for determining the definition of the good sample and the bad sample comprises the following steps:
and analyzing the post-credit data according to the flow meter and the bill meter, and determining the definition of the good sample and the bad sample through account age analysis, rolling rate and mobility.
In some embodiments, the processor 202 is further configured to: the method for deleting the missing value and the unique value comprises the following steps: deleting the initial modeling features with the deletion proportion larger than a preset value, filling the rest initial modeling features with modes and average values according to feature meanings, and deleting a list of initial modeling features with only one value;
the method for removing the redundant features comprises the following steps: removing the redundant features in the initial modeling features through correlation verification and multiple collinearity calculation.
Reference is made to the foregoing method embodiments for specific implementation methods, and details are not repeated here.
The present application may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing the various aspects of the present application.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which may execute the computer readable program instructions.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Note that all features disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic set of equivalent or similar features. Where used, further, preferably, still further and preferably, the brief description of the other embodiment is provided on the basis of the foregoing embodiment, and further, preferably, further or more preferably, the combination of the contents of the rear band with the foregoing embodiment is provided as a complete construct of the other embodiment. A further embodiment is composed of several further, preferably, still further or preferably arrangements of the strips after the same embodiment, which may be combined arbitrarily.
While the application has been described in detail with respect to the general description and specific embodiments thereof, it will be apparent to those skilled in the art that certain modifications and improvements may be made thereto based upon the application. Accordingly, such modifications or improvements may be made without departing from the spirit of the application and are intended to be within the scope of the invention as claimed.

Claims (10)

1. The wind control model cold start modeling method is characterized by comprising the following steps of:
acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data;
acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics;
and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
2. The method of claim 1, wherein the method of screening the second domain data based on the second feature comprises:
performing equal frequency box division operation on the second characteristics, and respectively combining different boxes to obtain a plurality of different types of third characteristics;
and distinguishing the first field data based on the third feature to obtain the proportion of the data corresponding to the third feature in the first field data, and screening the second field data based on the third feature according to the same proportion.
3. The method for cold start modeling of a wind control model according to claim 1, wherein the method for screening the initial modeling feature based on the target modeling sample data to obtain the target modeling feature comprises:
performing equal frequency bin division on the initial modeling characteristics, and performing first screening on the initial modeling characteristics through information value indexes, group stability indexes and/or correlation indexes;
carrying out chi-square classification on the initial modeling characteristics remained after the first screening, and carrying out second screening on the initial modeling characteristics through information value indexes, population stability indexes and/or correlation indexes;
performing a third screening of the initial modeled features remaining after the performing of the second screening, including: deleting the missing value and the unique value, converting the category type feature into a numerical type feature, and removing the redundant feature to obtain the target modeling feature.
4. The method for modeling a cold start of a wind control model according to claim 1, wherein the method for modeling the target model by a score card modeling method based on the target modeling features comprises:
screening the target modeling features by using the reject ratio index, and converting the screened target modeling features into target bin data serving as modeling input features;
preprocessing the target bin data, using the target modeling sample data, determining definitions of good samples and bad samples, and training a logistic regression model;
after the logistic regression model is evaluated to reach the standard, converting the prediction result of the logistic regression model into the score;
and determining a score threshold value for judging the bad client, screening the bad client based on the score, and finally obtaining the target model.
5. The method for modeling a cold start of a wind control model according to claim 1, further comprising, after obtaining the target model:
and evaluating the effect and stability of the target model through model evaluation indexes, verifying the effect of the target model by using the target modeling sample data to obtain a verification result, and adjusting the target model based on the verification result.
6. The method for modeling a cold start of a wind control model according to claim 1,
the first feature includes: age, credit rating, current credit amount, amount of credits, number of credits and/or whether there is an own house;
the second feature includes: age, credit rating and/or whether there is an own housing;
and the initial modeling feature is selected from secondary pedestrian credit derivative features.
7. The method of wind-controlled model cold-start modeling according to claim 4, wherein the method of determining the definition of good and bad samples comprises:
and analyzing the post-credit data according to the flow meter and the bill meter, and determining the definition of the good sample and the bad sample through account age analysis, rolling rate and mobility.
8. A method for modeling a cold start of a wind control model according to claim 3,
the method for deleting the missing value and the unique value comprises the following steps: deleting the initial modeling features with the deletion proportion larger than a preset value, filling the rest initial modeling features with modes and average values according to feature meanings, and deleting a list of initial modeling features with only one value;
the method for removing the redundant features comprises the following steps: removing the redundant features in the initial modeling features through correlation verification and multiple collinearity calculation.
9. A wind-controlled model cold-start modeling apparatus, comprising:
a memory; and
a processor coupled to the memory, the processor configured to:
acquiring first domain data and second domain data, respectively acquiring data distribution conditions of the first domain data and the second domain data about first characteristics through binning, screening second characteristics which enable similarity of data distribution proportion in the data distribution conditions of the first domain data and the second domain data to meet preset requirements from the first characteristics, screening the second domain data based on the second characteristics, combining the first domain data and the screened second domain data, and further acquiring target modeling sample data;
acquiring initial modeling characteristics, and screening the initial modeling characteristics based on the target modeling sample data to obtain target modeling characteristics;
and modeling by a scoring card modeling method based on the target modeling characteristics to obtain a target model, wherein the target model is used for scoring the input client data and screening bad clients by the scoring.
10. A computer storage medium having stored thereon a computer program, which when executed by a machine performs the steps of the method according to any of claims 1 to 8.
CN202211093009.2A 2022-09-08 2022-09-08 Cold start modeling method and device for wind control model and storage medium Pending CN116308722A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211093009.2A CN116308722A (en) 2022-09-08 2022-09-08 Cold start modeling method and device for wind control model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211093009.2A CN116308722A (en) 2022-09-08 2022-09-08 Cold start modeling method and device for wind control model and storage medium

Publications (1)

Publication Number Publication Date
CN116308722A true CN116308722A (en) 2023-06-23

Family

ID=86822692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211093009.2A Pending CN116308722A (en) 2022-09-08 2022-09-08 Cold start modeling method and device for wind control model and storage medium

Country Status (1)

Country Link
CN (1) CN116308722A (en)

Similar Documents

Publication Publication Date Title
Koh et al. A two-step method to construct credit scoring models with data mining techniques
US10083263B2 (en) Automatic modeling farmer
US8984022B1 (en) Automating growth and evaluation of segmentation trees
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN112541817A (en) Marketing response processing method and system for potential customers of personal consumption loan
DK202370110A1 (en) Environmental, social, and governance (esg) performance trends
CN114078050A (en) Loan overdue prediction method and device, electronic equipment and computer readable medium
CN111369344A (en) Method and device for dynamically generating early warning rule
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN112950359A (en) User identification method and device
CN115660834B (en) Individual loan risk assessment method based on decision tree
CN116257758A (en) Model training method, crowd expanding method, medium, device and computing equipment
CN111177653A (en) Credit assessment method and device
CN116503158A (en) Enterprise bankruptcy risk early warning method, system and device based on data driving
CN111951099B (en) Credit card issuing model and its application method
WO2023114637A1 (en) Computer-implemented system and method of facilitating artificial intelligence based lending strategies and business revenue management
CN116308722A (en) Cold start modeling method and device for wind control model and storage medium
CN114943563A (en) Rights and interests pushing method and device, computer equipment and storage medium
CN115238588A (en) Graph data processing method, risk prediction model training method and device
CN114565450A (en) Overdue common debt-based collection strategy determination method and related equipment
CN114092230A (en) Data processing method and device, electronic equipment and computer readable medium
CN113240513A (en) Method for determining user credit line and related device
Warganegara et al. Tobit Regression Analysis on Factors Influencing Dividend Policy of Indonesian Manufacturing Firms.
CN112990311A (en) Method and device for identifying admitted client
CN112633943A (en) Method for real estate oriented marketing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination