CN117422544A - Method, device, equipment and storage medium for predicting credit card user default probability - Google Patents
Method, device, equipment and storage medium for predicting credit card user default probability Download PDFInfo
- Publication number
- CN117422544A CN117422544A CN202311605497.5A CN202311605497A CN117422544A CN 117422544 A CN117422544 A CN 117422544A CN 202311605497 A CN202311605497 A CN 202311605497A CN 117422544 A CN117422544 A CN 117422544A
- Authority
- CN
- China
- Prior art keywords
- credit card
- target
- information
- characteristic information
- card account
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000007477 logistic regression Methods 0.000 claims abstract description 22
- 238000012216 screening Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000004590 computer program Methods 0.000 claims description 16
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The embodiment of the invention discloses a method, a device, equipment and a storage medium for predicting the default probability of a credit card user. Comprising the following steps: collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training a logistic regression model based on the target feature information to obtain a target default probability prediction model; and predicting the default probability of the target credit card user based on the target default probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.
Description
Technical Field
The embodiment of the invention relates to the technical field of financial data processing, in particular to a method, a device, equipment and a storage medium for predicting default probability of a credit card user.
Background
The credit card is a sign of modernization of the financial industry, has the characteristics of flexibility, innovation, wide prospect and the like, becomes an important field of business innovation and technical innovation in banking industry, and also becomes one of the business fields with the most extensive application of internet finance. The credit card business occupies an important position in the business banking retail business plate, has multiple functions of expanding customers, creating intermediate business, stabilizing deposit, increasing bank income and the like, and is an important component in the banking retail business. In recent years, with the rapid development of economy, the holding capacity and transaction amount of credit card people are continuously increased, so that credit card business becomes an important item of each large bank, but the credit card default risk is also rapidly increased with the expansion of credit card markets, default bad accounts gradually become huge risk hidden dangers of each large commercial bank and financial institutions, and the strengthening of credit card business risk management is particularly important, which also promotes banks and various lending institutions to continuously research and optimize default probability prediction algorithms with a large amount of resources. In general, the credit card account information dataset has a large number of variables and various distributions are extremely unbalanced, and the traditional artificial credit risk assessment model relies on expert rules, so that the prediction result has certain hysteresis and cannot reflect the new user situation under new situation.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for predicting the default probability of a credit card user, which can improve the accuracy of predicting the default probability of the credit card user.
In a first aspect, an embodiment of the present invention provides a method for predicting a probability of default for a credit card user, including:
collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
screening the plurality of characteristic information to obtain target characteristic information;
training a logistic regression model based on the target feature information to obtain a target default probability prediction model;
and predicting the default probability of the target credit card user based on the target default probability prediction model.
In a second aspect, an embodiment of the present invention further provides a device for predicting a probability of default of a credit card user, including:
the credit card account information set dividing module is used for collecting a credit card account information set in a set history period and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the characteristic information screening module is used for screening the plurality of characteristic information to obtain target characteristic information;
the logistic regression module training module is used for training the logistic regression model based on the target characteristic information to obtain a target default probability prediction model;
and the breach probability prediction module is used for predicting the breach probability of the target credit card user based on the target breach probability prediction model.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for predicting the probability of credit card user breach according to the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer instructions, where the computer instructions are configured to cause a processor to implement the method for predicting a probability of default of a credit card user according to the embodiment of the present invention.
The embodiment of the invention discloses a method, a device, equipment and a storage medium for predicting the default probability of a credit card user. Collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training the logistic regression model based on the target feature information to obtain a target default probability prediction model; the target credit card user's probability of breach is predicted based on the target breach probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.
Drawings
FIG. 1 is a flow chart of a method for predicting probability of credit card user breach in accordance with one embodiment of the invention;
FIG. 2 is a schematic diagram of a credit card user default probability prediction apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a method for predicting a credit card user default probability according to an embodiment of the present invention, where the method may be implemented by a device for predicting a credit card user default probability, and the device may be implemented in software and/or hardware, or alternatively, implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like. The method specifically comprises the following steps:
s110, collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set.
Wherein the user card account information includes a plurality of feature information. The set history period may be the latest set period, for example: last half a year or last year, etc. The credit card account information set may be a set of a plurality of credit card account information, and the credit card account information may include: credit balance, stage type, product type, stage principal, number of stages, repayment amount per stage, commission per stage, number of remaining stages, accumulated number of stage balance, etc. The credit card account information may be credit card month data, i.e., credit card account information within each month on a one month period.
In this embodiment, the division of the credit card account information set into the positive sample set and the negative sample set can be understood as: the non-default credit card account information is divided into positive samples and the default credit card account information is divided into negative samples.
Alternatively, the credit card account information set may be divided into a positive sample set and a negative sample set by: for each credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; if the account of the credit card account information is not violated in the next month, the credit card account information is determined to be a positive sample.
Wherein, the default can be understood as that the account of the credit card account information is paid off schedule according to the amount, and the default can be understood as that the account of the credit card account information is paid off schedule according to the amount. In this embodiment, if the account is in a state of going to be violated in the month, the credit card month data of the account in the month is marked as a negative sample; if the account is not breached in the month, the account is judged to be in a state of no breach in the last month, and the credit card month data of the last month of the account is marked as a positive sample. In practical situations, the actual default of the credit card account is far less than the non-default, so that the proportion of positive and negative white samples is extremely unbalanced, and the situation that the proportion difference between the positive and negative samples is large should be avoided when the data set is constructed, and the proportion of the positive and negative samples can be set to be 1:3 in the embodiment of the invention.
S120, screening the plurality of feature information to obtain target feature information.
In this embodiment, filtering the plurality of feature information may be understood as filtering feature information that contributes to predicting the probability of default from the plurality of feature information in the credit card account information, or deleting feature information that does not contribute to predicting the probability of default.
Specifically, the method for screening the plurality of feature information to obtain the target feature information may be: and screening the characteristic information according to at least one of the following indexes in sequence: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
The repeatability index can be understood as the repetition rate of characteristic information in the credit card account information set. The correlation index can be understood as a correlation coefficient between every two pieces of characteristic information. The saliency index can be understood as a saliency verification result of each feature information. The importance index can be understood as the importance of each feature information with respect to the predicted probability of breach.
In this embodiment, the manner of screening the feature information according to the repeatability index may be: determining the repetition rate of each characteristic information in the credit card account information set; and deleting the characteristic information with the repetition rate exceeding the first set threshold value.
The repetition rate of the feature information is understood to be the proportion of the credit card account information set that a certain feature information is the same value. The first set threshold may be preset, for example set to any value between 80% -95%. Specifically, when the repetition rate of certain characteristic information in the credit card account information set exceeds a first set threshold, the characteristic information is deleted. For example, for characteristic information such as daily credit balance, stage type, product type, etc., the repetition rate of the same value in the credit card account information set exceeds 90%, that is, the characteristic information cannot reflect characteristic distinction among samples, so that the characteristic information is deleted, the data processing amount can be reduced, and the subsequent data processing speed is saved.
In this embodiment, the manner of screening the feature information according to the correlation index may be: determining a correlation coefficient between every two feature information in the credit card account information set; and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
The calculation formula of the correlation coefficient between every two feature information can be expressed as:wherein R is a correlation coefficient, n represents the number of samples, X i Characteristic information X, Y for the ith sample i Characteristic information Y, +_for the ith sample>Mean value of characteristic information X in all samples, < >>Representing the average of the characteristic information Y in all samples. Specifically, when the correlation coefficient exceeds the second set threshold, it indicates that there is multiple commonalities between the two feature information, and in order to reduce the calculation amount, one feature information in the two feature information may be deleted.
In this embodiment, the manner of screening the feature information according to the significant index may be: performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once; and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
Wherein the first saliency check may be a T-check and the second saliency check may be an F-check. The first saliency check may check the saliency of individual feature information for determining whether a significant linear relationship exists between the feature information and the predicted probability of breach. The second saliency check can check the saliency of the whole feature information and is used for judging whether a significant linear relation exists between the whole feature information and the predicted default probability. In this embodiment, if a certain feature information does not satisfy the first saliency check and/or does not satisfy the second saliency check, the feature information is deleted.
The feature information is screened according to the importance index in the following manner: determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm; and sorting the feature information based on the importance index, extracting a plurality of feature information with the number of feature information being earlier in sorting, and obtaining a plurality of groups of target feature information.
Wherein the setting algorithm may be LightGBM (Gradient Boosting Decision Tree). The process of determining the importance index of each feature information in the credit card account information set based on the setting algorithm may be: and constructing a decision tree by adopting a histogram algorithm, constructing an iteration tree, and calculating the importance index of each feature information.
The histogram algorithm converts each column of characteristic values into a histogram, the conversion method is to generate k data blocks (bins) for each histogram according to the integer interval where the data is located, and then place continuous floating point characteristic values into corresponding bins according to the integer interval where the data is located, so as to convert all characteristic values of each characteristic, and then obtain the histogram of the original data:
by traversing the bin in each histogram and taking the current bin as a partition point, the gradient S from the left bin to the current bin L Number of samples n to the left L And (3) accumulating:
then and with the total gradient S on the parent node P Total number of samples n P Subtracting to obtain the gradient S of all the bins on the right R Number of samples on right n R :Calculating gain: />
The process of constructing the iteration tree may be: based on training data, a plurality of iterations are performed, and a new tree is re-fitted to join a previous iteration tree by using gradient information at each iteration, and the iteration tree can be regarded as a continuously-changing linear combination process in function space.
For any given tree structure, the total number of times each feature is partitioned in the iteration tree, t_split, and the Gain sum t_gain of the features after being partitioned in all decision trees are used as metrics for measuring the importance of the features:
the importance index of each feature information is characterized by t_gain.
Specifically, after the importance index of each feature information is obtained, the feature information is ordered based on the importance index, and a plurality of feature information with the number being the front of the order are extracted to obtain a plurality of groups of target feature information. For example, the characteristic information of Top5, top8, top10, top15 is selected respectively, so as to obtain four sets of target characteristic information.
And S130, training the logistic regression model based on the target feature information to obtain a target default probability prediction model.
The method for training the logistic regression model based on the target feature information to obtain the target default probability prediction model may be as follows:
training and testing the logistic regression model based on each group of target characteristic information respectively to obtain a plurality of candidate default probability prediction models; a target breach probability model is determined from a plurality of candidate breach probability prediction models.
The method for determining the target default probability model from the plurality of candidate default probability prediction models may be: and determining the candidate default probability prediction model with the highest accuracy as a target default probability prediction model. For example, assuming that the accuracy of the candidate breach probability prediction model trained by the target feature information corresponding to top8 is highest, the candidate breach probability prediction model corresponding to the highest is taken as the target breach probability prediction model.
S140, predicting the default probability of the target credit card user based on the target default probability prediction model.
Specifically, first, feature information required by a target breach probability prediction model is extracted from credit card account information of a current month of a target credit card user, for example: the top8 feature information of the embodiment is input into a target default probability prediction model for prediction, and the default probability of the target credit card user is obtained.
In this embodiment, as time passes, when the time period in which the sample data adopted by the target breach probability prediction model is located exceeds the set time period with the current time distance, the credit card account information set of the latest time period needs to be collected again to retrain the logistic regression model, so as to ensure the accuracy of prediction.
According to the technical scheme, a credit card account information set in a set history period is collected, and the credit card account information set is divided into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training the logistic regression model based on the target feature information to obtain a target default probability prediction model; the target credit card user's probability of breach is predicted based on the target breach probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.
Example two
Fig. 2 is a schematic structural diagram of a credit card user default probability prediction apparatus according to a second embodiment of the present invention, where, as shown in fig. 2, the apparatus includes:
the credit card account information set dividing module 210 is configured to collect a credit card account information set in a set history period, and divide the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the feature information screening module 220 is configured to screen a plurality of feature information to obtain target feature information;
the logistic regression module training module 230 is configured to train the logistic regression model based on the target feature information to obtain a target default probability prediction model;
the breach probability prediction module 240 is configured to predict a breach probability of the target credit card user based on the target breach probability prediction model.
Optionally, the credit card account information is credit card month information; the credit card account information set dividing module 210 is further configured to:
for each credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; if the account of the credit card account information is not violated in the next month, the credit card account information is determined to be a positive sample.
Optionally, the feature information filtering module 220 is further configured to:
and screening the characteristic information according to at least one of the following indexes in sequence: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
Optionally, the feature information filtering module 220 is further configured to:
determining the repetition rate of each characteristic information in the credit card account information set;
and deleting the characteristic information with the repetition rate exceeding the first set threshold value.
Optionally, the feature information filtering module 220 is further configured to:
determining a correlation coefficient between every two feature information in the credit card account information set;
and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
Optionally, the feature information filtering module 220 is further configured to:
performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once;
and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
Optionally, the feature information filtering module 220 is further configured to:
determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm;
sorting the feature information based on the importance index, extracting a plurality of feature information with the number being the front of the sorting, and obtaining a plurality of groups of target feature information;
optionally, the logistic regression module training module 230 is further configured to:
training the logistic regression model based on each group of target characteristic information to obtain a plurality of candidate default probability prediction models;
a target breach probability prediction model is determined from a plurality of candidate breach probability prediction models.
The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided in all the foregoing embodiments of the invention.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the method of predicting the probability of credit card user breach.
In some embodiments, the method of predicting the probability of credit card user breach may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the credit card user breach probability prediction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of predicting the probability of credit card user breach in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for predicting probability of credit card user breach, comprising:
collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
screening the plurality of characteristic information to obtain target characteristic information;
training a logistic regression model based on the target feature information to obtain a target default probability prediction model;
and predicting the default probability of the target credit card user based on the target default probability prediction model.
2. The method of claim 1, wherein the credit card account information is credit card month information; dividing the credit card account information set into a positive sample set and a negative sample set, comprising:
for each piece of credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; and if the account of the credit card account information is not violated in the next month, determining the credit card account information as a positive sample.
3. The method of claim 1, wherein filtering the plurality of feature information to obtain target feature information comprises:
and screening the characteristic information according to at least one index as follows: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
4. A method according to claim 3, wherein screening the characteristic information according to the repeatability index comprises:
determining the repetition rate of each characteristic information in the credit card account information set;
and deleting the characteristic information with the repetition rate exceeding a first set threshold value.
5. A method according to claim 3, wherein screening the characteristic information according to the correlation index comprises:
determining a correlation coefficient between every two feature information in the credit card account information set;
and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
6. A method according to claim 3, wherein screening the characteristic information according to the saliency index comprises:
performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once;
and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
7. A method according to claim 3, wherein screening the characteristic information according to the importance index comprises:
determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm;
sorting the characteristic information based on the importance index, extracting a plurality of types of characteristic information with the front sorting, and obtaining a plurality of groups of target characteristic information;
correspondingly, training the logistic regression model based on the target feature information to obtain a target default probability prediction model, which comprises the following steps:
training the logistic regression model based on each group of target characteristic information to obtain a plurality of candidate default probability prediction models;
a target breach probability prediction model is determined from the plurality of candidate breach probability prediction models.
8. A credit card user breach probability prediction apparatus, comprising:
the credit card account information set dividing module is used for collecting a credit card account information set in a set history period and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the characteristic information screening module is used for screening the plurality of characteristic information to obtain target characteristic information;
the logistic regression module training module is used for training the logistic regression model based on the target characteristic information to obtain a target default probability prediction model;
and the breach probability prediction module is used for predicting the breach probability of the target credit card user based on the target breach probability prediction model.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of predicting a credit card user breach probability of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of predicting the probability of credit card user breach of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311605497.5A CN117422544A (en) | 2023-11-28 | 2023-11-28 | Method, device, equipment and storage medium for predicting credit card user default probability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311605497.5A CN117422544A (en) | 2023-11-28 | 2023-11-28 | Method, device, equipment and storage medium for predicting credit card user default probability |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117422544A true CN117422544A (en) | 2024-01-19 |
Family
ID=89526715
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311605497.5A Pending CN117422544A (en) | 2023-11-28 | 2023-11-28 | Method, device, equipment and storage medium for predicting credit card user default probability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117422544A (en) |
-
2023
- 2023-11-28 CN CN202311605497.5A patent/CN117422544A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114580916A (en) | Enterprise risk assessment method and device, electronic equipment and storage medium | |
CN116739742A (en) | Monitoring method, device, equipment and storage medium of credit wind control model | |
CN114090601B (en) | Data screening method, device, equipment and storage medium | |
CN113642727B (en) | Training method of neural network model and processing method and device of multimedia information | |
CN110930242A (en) | Credibility prediction method, device, equipment and storage medium | |
CN117593115A (en) | Feature value determining method, device, equipment and medium of credit risk assessment model | |
CN117422544A (en) | Method, device, equipment and storage medium for predicting credit card user default probability | |
CN115545909A (en) | Approval method, device, equipment and storage medium | |
CN114999665A (en) | Data processing method and device, electronic equipment and storage medium | |
CN114610953A (en) | Data classification method, device, equipment and storage medium | |
CN114861800A (en) | Model training method, probability determination method, device, equipment, medium and product | |
CN114722941A (en) | Credit default identification method, apparatus, device and medium | |
CN116644372B (en) | Account type determining method and device, electronic equipment and storage medium | |
CN114037058B (en) | Pre-training model generation method and device, electronic equipment and storage medium | |
CN114510584B (en) | Document identification method, document identification device, electronic device, and computer-readable storage medium | |
EP4134834A1 (en) | Method and apparatus of processing feature information, electronic device, and storage medium | |
CN117635342A (en) | Investment portfolio optimization method, device, equipment and storage medium | |
CN114818892A (en) | Credit grade determining method, device, equipment and storage medium | |
CN118045366A (en) | User hierarchy dividing method, device, equipment and medium based on game viscosity | |
CN115600129A (en) | Information identification method and device, electronic equipment and storage medium | |
CN117611324A (en) | Credit rating method, apparatus, electronic device and storage medium | |
CN115017145A (en) | Data expansion method, device and storage medium | |
CN117668596A (en) | Clustering method, device, equipment and storage medium | |
CN115456077A (en) | Feature set determination method and device and electronic equipment | |
CN116188063A (en) | Guest group creation method, apparatus, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |