CN117422544A - Method, device, equipment and storage medium for predicting credit card user default probability - Google Patents

Method, device, equipment and storage medium for predicting credit card user default probability Download PDF

Info

Publication number
CN117422544A
CN117422544A CN202311605497.5A CN202311605497A CN117422544A CN 117422544 A CN117422544 A CN 117422544A CN 202311605497 A CN202311605497 A CN 202311605497A CN 117422544 A CN117422544 A CN 117422544A
Authority
CN
China
Prior art keywords
credit card
target
information
characteristic information
card account
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311605497.5A
Other languages
Chinese (zh)
Inventor
刘腾腾
高翀
白伟仝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311605497.5A priority Critical patent/CN117422544A/en
Publication of CN117422544A publication Critical patent/CN117422544A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for predicting the default probability of a credit card user. Comprising the following steps: collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training a logistic regression model based on the target feature information to obtain a target default probability prediction model; and predicting the default probability of the target credit card user based on the target default probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.

Description

Method, device, equipment and storage medium for predicting credit card user default probability
Technical Field
The embodiment of the invention relates to the technical field of financial data processing, in particular to a method, a device, equipment and a storage medium for predicting default probability of a credit card user.
Background
The credit card is a sign of modernization of the financial industry, has the characteristics of flexibility, innovation, wide prospect and the like, becomes an important field of business innovation and technical innovation in banking industry, and also becomes one of the business fields with the most extensive application of internet finance. The credit card business occupies an important position in the business banking retail business plate, has multiple functions of expanding customers, creating intermediate business, stabilizing deposit, increasing bank income and the like, and is an important component in the banking retail business. In recent years, with the rapid development of economy, the holding capacity and transaction amount of credit card people are continuously increased, so that credit card business becomes an important item of each large bank, but the credit card default risk is also rapidly increased with the expansion of credit card markets, default bad accounts gradually become huge risk hidden dangers of each large commercial bank and financial institutions, and the strengthening of credit card business risk management is particularly important, which also promotes banks and various lending institutions to continuously research and optimize default probability prediction algorithms with a large amount of resources. In general, the credit card account information dataset has a large number of variables and various distributions are extremely unbalanced, and the traditional artificial credit risk assessment model relies on expert rules, so that the prediction result has certain hysteresis and cannot reflect the new user situation under new situation.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for predicting the default probability of a credit card user, which can improve the accuracy of predicting the default probability of the credit card user.
In a first aspect, an embodiment of the present invention provides a method for predicting a probability of default for a credit card user, including:
collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
screening the plurality of characteristic information to obtain target characteristic information;
training a logistic regression model based on the target feature information to obtain a target default probability prediction model;
and predicting the default probability of the target credit card user based on the target default probability prediction model.
In a second aspect, an embodiment of the present invention further provides a device for predicting a probability of default of a credit card user, including:
the credit card account information set dividing module is used for collecting a credit card account information set in a set history period and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the characteristic information screening module is used for screening the plurality of characteristic information to obtain target characteristic information;
the logistic regression module training module is used for training the logistic regression model based on the target characteristic information to obtain a target default probability prediction model;
and the breach probability prediction module is used for predicting the breach probability of the target credit card user based on the target breach probability prediction model.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for predicting the probability of credit card user breach according to the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores computer instructions, where the computer instructions are configured to cause a processor to implement the method for predicting a probability of default of a credit card user according to the embodiment of the present invention.
The embodiment of the invention discloses a method, a device, equipment and a storage medium for predicting the default probability of a credit card user. Collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training the logistic regression model based on the target feature information to obtain a target default probability prediction model; the target credit card user's probability of breach is predicted based on the target breach probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.
Drawings
FIG. 1 is a flow chart of a method for predicting probability of credit card user breach in accordance with one embodiment of the invention;
FIG. 2 is a schematic diagram of a credit card user default probability prediction apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a method for predicting a credit card user default probability according to an embodiment of the present invention, where the method may be implemented by a device for predicting a credit card user default probability, and the device may be implemented in software and/or hardware, or alternatively, implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like. The method specifically comprises the following steps:
s110, collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set.
Wherein the user card account information includes a plurality of feature information. The set history period may be the latest set period, for example: last half a year or last year, etc. The credit card account information set may be a set of a plurality of credit card account information, and the credit card account information may include: credit balance, stage type, product type, stage principal, number of stages, repayment amount per stage, commission per stage, number of remaining stages, accumulated number of stage balance, etc. The credit card account information may be credit card month data, i.e., credit card account information within each month on a one month period.
In this embodiment, the division of the credit card account information set into the positive sample set and the negative sample set can be understood as: the non-default credit card account information is divided into positive samples and the default credit card account information is divided into negative samples.
Alternatively, the credit card account information set may be divided into a positive sample set and a negative sample set by: for each credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; if the account of the credit card account information is not violated in the next month, the credit card account information is determined to be a positive sample.
Wherein, the default can be understood as that the account of the credit card account information is paid off schedule according to the amount, and the default can be understood as that the account of the credit card account information is paid off schedule according to the amount. In this embodiment, if the account is in a state of going to be violated in the month, the credit card month data of the account in the month is marked as a negative sample; if the account is not breached in the month, the account is judged to be in a state of no breach in the last month, and the credit card month data of the last month of the account is marked as a positive sample. In practical situations, the actual default of the credit card account is far less than the non-default, so that the proportion of positive and negative white samples is extremely unbalanced, and the situation that the proportion difference between the positive and negative samples is large should be avoided when the data set is constructed, and the proportion of the positive and negative samples can be set to be 1:3 in the embodiment of the invention.
S120, screening the plurality of feature information to obtain target feature information.
In this embodiment, filtering the plurality of feature information may be understood as filtering feature information that contributes to predicting the probability of default from the plurality of feature information in the credit card account information, or deleting feature information that does not contribute to predicting the probability of default.
Specifically, the method for screening the plurality of feature information to obtain the target feature information may be: and screening the characteristic information according to at least one of the following indexes in sequence: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
The repeatability index can be understood as the repetition rate of characteristic information in the credit card account information set. The correlation index can be understood as a correlation coefficient between every two pieces of characteristic information. The saliency index can be understood as a saliency verification result of each feature information. The importance index can be understood as the importance of each feature information with respect to the predicted probability of breach.
In this embodiment, the manner of screening the feature information according to the repeatability index may be: determining the repetition rate of each characteristic information in the credit card account information set; and deleting the characteristic information with the repetition rate exceeding the first set threshold value.
The repetition rate of the feature information is understood to be the proportion of the credit card account information set that a certain feature information is the same value. The first set threshold may be preset, for example set to any value between 80% -95%. Specifically, when the repetition rate of certain characteristic information in the credit card account information set exceeds a first set threshold, the characteristic information is deleted. For example, for characteristic information such as daily credit balance, stage type, product type, etc., the repetition rate of the same value in the credit card account information set exceeds 90%, that is, the characteristic information cannot reflect characteristic distinction among samples, so that the characteristic information is deleted, the data processing amount can be reduced, and the subsequent data processing speed is saved.
In this embodiment, the manner of screening the feature information according to the correlation index may be: determining a correlation coefficient between every two feature information in the credit card account information set; and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
The calculation formula of the correlation coefficient between every two feature information can be expressed as:wherein R is a correlation coefficient, n represents the number of samples, X i Characteristic information X, Y for the ith sample i Characteristic information Y, +_for the ith sample>Mean value of characteristic information X in all samples, < >>Representing the average of the characteristic information Y in all samples. Specifically, when the correlation coefficient exceeds the second set threshold, it indicates that there is multiple commonalities between the two feature information, and in order to reduce the calculation amount, one feature information in the two feature information may be deleted.
In this embodiment, the manner of screening the feature information according to the significant index may be: performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once; and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
Wherein the first saliency check may be a T-check and the second saliency check may be an F-check. The first saliency check may check the saliency of individual feature information for determining whether a significant linear relationship exists between the feature information and the predicted probability of breach. The second saliency check can check the saliency of the whole feature information and is used for judging whether a significant linear relation exists between the whole feature information and the predicted default probability. In this embodiment, if a certain feature information does not satisfy the first saliency check and/or does not satisfy the second saliency check, the feature information is deleted.
The feature information is screened according to the importance index in the following manner: determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm; and sorting the feature information based on the importance index, extracting a plurality of feature information with the number of feature information being earlier in sorting, and obtaining a plurality of groups of target feature information.
Wherein the setting algorithm may be LightGBM (Gradient Boosting Decision Tree). The process of determining the importance index of each feature information in the credit card account information set based on the setting algorithm may be: and constructing a decision tree by adopting a histogram algorithm, constructing an iteration tree, and calculating the importance index of each feature information.
The histogram algorithm converts each column of characteristic values into a histogram, the conversion method is to generate k data blocks (bins) for each histogram according to the integer interval where the data is located, and then place continuous floating point characteristic values into corresponding bins according to the integer interval where the data is located, so as to convert all characteristic values of each characteristic, and then obtain the histogram of the original data:
by traversing the bin in each histogram and taking the current bin as a partition point, the gradient S from the left bin to the current bin L Number of samples n to the left L And (3) accumulating:
then and with the total gradient S on the parent node P Total number of samples n P Subtracting to obtain the gradient S of all the bins on the right R Number of samples on right n RCalculating gain: />
The process of constructing the iteration tree may be: based on training data, a plurality of iterations are performed, and a new tree is re-fitted to join a previous iteration tree by using gradient information at each iteration, and the iteration tree can be regarded as a continuously-changing linear combination process in function space.
For any given tree structure, the total number of times each feature is partitioned in the iteration tree, t_split, and the Gain sum t_gain of the features after being partitioned in all decision trees are used as metrics for measuring the importance of the features:
the importance index of each feature information is characterized by t_gain.
Specifically, after the importance index of each feature information is obtained, the feature information is ordered based on the importance index, and a plurality of feature information with the number being the front of the order are extracted to obtain a plurality of groups of target feature information. For example, the characteristic information of Top5, top8, top10, top15 is selected respectively, so as to obtain four sets of target characteristic information.
And S130, training the logistic regression model based on the target feature information to obtain a target default probability prediction model.
The method for training the logistic regression model based on the target feature information to obtain the target default probability prediction model may be as follows:
training and testing the logistic regression model based on each group of target characteristic information respectively to obtain a plurality of candidate default probability prediction models; a target breach probability model is determined from a plurality of candidate breach probability prediction models.
The method for determining the target default probability model from the plurality of candidate default probability prediction models may be: and determining the candidate default probability prediction model with the highest accuracy as a target default probability prediction model. For example, assuming that the accuracy of the candidate breach probability prediction model trained by the target feature information corresponding to top8 is highest, the candidate breach probability prediction model corresponding to the highest is taken as the target breach probability prediction model.
S140, predicting the default probability of the target credit card user based on the target default probability prediction model.
Specifically, first, feature information required by a target breach probability prediction model is extracted from credit card account information of a current month of a target credit card user, for example: the top8 feature information of the embodiment is input into a target default probability prediction model for prediction, and the default probability of the target credit card user is obtained.
In this embodiment, as time passes, when the time period in which the sample data adopted by the target breach probability prediction model is located exceeds the set time period with the current time distance, the credit card account information set of the latest time period needs to be collected again to retrain the logistic regression model, so as to ensure the accuracy of prediction.
According to the technical scheme, a credit card account information set in a set history period is collected, and the credit card account information set is divided into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information; screening the plurality of characteristic information to obtain target characteristic information; training the logistic regression model based on the target feature information to obtain a target default probability prediction model; the target credit card user's probability of breach is predicted based on the target breach probability prediction model. According to the prediction method for the default probability of the card user, the logistic regression model is trained based on the screened characteristic information to obtain the target default probability prediction model, and finally the default probability of the target credit card user is predicted based on the target default probability prediction model, so that the accuracy of predicting the default probability of the credit card user can be improved.
Example two
Fig. 2 is a schematic structural diagram of a credit card user default probability prediction apparatus according to a second embodiment of the present invention, where, as shown in fig. 2, the apparatus includes:
the credit card account information set dividing module 210 is configured to collect a credit card account information set in a set history period, and divide the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the feature information screening module 220 is configured to screen a plurality of feature information to obtain target feature information;
the logistic regression module training module 230 is configured to train the logistic regression model based on the target feature information to obtain a target default probability prediction model;
the breach probability prediction module 240 is configured to predict a breach probability of the target credit card user based on the target breach probability prediction model.
Optionally, the credit card account information is credit card month information; the credit card account information set dividing module 210 is further configured to:
for each credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; if the account of the credit card account information is not violated in the next month, the credit card account information is determined to be a positive sample.
Optionally, the feature information filtering module 220 is further configured to:
and screening the characteristic information according to at least one of the following indexes in sequence: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
Optionally, the feature information filtering module 220 is further configured to:
determining the repetition rate of each characteristic information in the credit card account information set;
and deleting the characteristic information with the repetition rate exceeding the first set threshold value.
Optionally, the feature information filtering module 220 is further configured to:
determining a correlation coefficient between every two feature information in the credit card account information set;
and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
Optionally, the feature information filtering module 220 is further configured to:
performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once;
and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
Optionally, the feature information filtering module 220 is further configured to:
determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm;
sorting the feature information based on the importance index, extracting a plurality of feature information with the number being the front of the sorting, and obtaining a plurality of groups of target feature information;
optionally, the logistic regression module training module 230 is further configured to:
training the logistic regression model based on each group of target characteristic information to obtain a plurality of candidate default probability prediction models;
a target breach probability prediction model is determined from a plurality of candidate breach probability prediction models.
The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided in all the foregoing embodiments of the invention.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the method of predicting the probability of credit card user breach.
In some embodiments, the method of predicting the probability of credit card user breach may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the credit card user breach probability prediction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of predicting the probability of credit card user breach in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for predicting probability of credit card user breach, comprising:
collecting a credit card account information set in a set history period, and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
screening the plurality of characteristic information to obtain target characteristic information;
training a logistic regression model based on the target feature information to obtain a target default probability prediction model;
and predicting the default probability of the target credit card user based on the target default probability prediction model.
2. The method of claim 1, wherein the credit card account information is credit card month information; dividing the credit card account information set into a positive sample set and a negative sample set, comprising:
for each piece of credit card account information, if the account of the credit card account information is violated in the next month, determining the credit card account information as a negative sample; and if the account of the credit card account information is not violated in the next month, determining the credit card account information as a positive sample.
3. The method of claim 1, wherein filtering the plurality of feature information to obtain target feature information comprises:
and screening the characteristic information according to at least one index as follows: and obtaining target characteristic information by using the repeatability index, the correlation index, the significance index and the importance index.
4. A method according to claim 3, wherein screening the characteristic information according to the repeatability index comprises:
determining the repetition rate of each characteristic information in the credit card account information set;
and deleting the characteristic information with the repetition rate exceeding a first set threshold value.
5. A method according to claim 3, wherein screening the characteristic information according to the correlation index comprises:
determining a correlation coefficient between every two feature information in the credit card account information set;
and deleting one piece of characteristic information in every two pieces of characteristic information with the correlation coefficient exceeding a second set threshold value.
6. A method according to claim 3, wherein screening the characteristic information according to the saliency index comprises:
performing a first saliency check and a second saliency check on each characteristic information in the credit card account information set once;
and reserving characteristic information which simultaneously satisfies the first saliency check and the second saliency check.
7. A method according to claim 3, wherein screening the characteristic information according to the importance index comprises:
determining importance indexes of the characteristic information in the credit card account information set based on a setting algorithm;
sorting the characteristic information based on the importance index, extracting a plurality of types of characteristic information with the front sorting, and obtaining a plurality of groups of target characteristic information;
correspondingly, training the logistic regression model based on the target feature information to obtain a target default probability prediction model, which comprises the following steps:
training the logistic regression model based on each group of target characteristic information to obtain a plurality of candidate default probability prediction models;
a target breach probability prediction model is determined from the plurality of candidate breach probability prediction models.
8. A credit card user breach probability prediction apparatus, comprising:
the credit card account information set dividing module is used for collecting a credit card account information set in a set history period and dividing the credit card account information set into a positive sample set and a negative sample set; wherein the user card account information includes a plurality of feature information;
the characteristic information screening module is used for screening the plurality of characteristic information to obtain target characteristic information;
the logistic regression module training module is used for training the logistic regression model based on the target characteristic information to obtain a target default probability prediction model;
and the breach probability prediction module is used for predicting the breach probability of the target credit card user based on the target breach probability prediction model.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of predicting a credit card user breach probability of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of predicting the probability of credit card user breach of any one of claims 1-7.
CN202311605497.5A 2023-11-28 2023-11-28 Method, device, equipment and storage medium for predicting credit card user default probability Pending CN117422544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311605497.5A CN117422544A (en) 2023-11-28 2023-11-28 Method, device, equipment and storage medium for predicting credit card user default probability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311605497.5A CN117422544A (en) 2023-11-28 2023-11-28 Method, device, equipment and storage medium for predicting credit card user default probability

Publications (1)

Publication Number Publication Date
CN117422544A true CN117422544A (en) 2024-01-19

Family

ID=89526715

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311605497.5A Pending CN117422544A (en) 2023-11-28 2023-11-28 Method, device, equipment and storage medium for predicting credit card user default probability

Country Status (1)

Country Link
CN (1) CN117422544A (en)

Similar Documents

Publication Publication Date Title
CN114580916A (en) Enterprise risk assessment method and device, electronic equipment and storage medium
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN114090601B (en) Data screening method, device, equipment and storage medium
CN113642727B (en) Training method of neural network model and processing method and device of multimedia information
CN110930242A (en) Credibility prediction method, device, equipment and storage medium
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
CN117422544A (en) Method, device, equipment and storage medium for predicting credit card user default probability
CN115545909A (en) Approval method, device, equipment and storage medium
CN114999665A (en) Data processing method and device, electronic equipment and storage medium
CN114610953A (en) Data classification method, device, equipment and storage medium
CN114861800A (en) Model training method, probability determination method, device, equipment, medium and product
CN114722941A (en) Credit default identification method, apparatus, device and medium
CN116644372B (en) Account type determining method and device, electronic equipment and storage medium
CN114037058B (en) Pre-training model generation method and device, electronic equipment and storage medium
CN114510584B (en) Document identification method, document identification device, electronic device, and computer-readable storage medium
EP4134834A1 (en) Method and apparatus of processing feature information, electronic device, and storage medium
CN117635342A (en) Investment portfolio optimization method, device, equipment and storage medium
CN114818892A (en) Credit grade determining method, device, equipment and storage medium
CN118045366A (en) User hierarchy dividing method, device, equipment and medium based on game viscosity
CN115600129A (en) Information identification method and device, electronic equipment and storage medium
CN117611324A (en) Credit rating method, apparatus, electronic device and storage medium
CN115017145A (en) Data expansion method, device and storage medium
CN117668596A (en) Clustering method, device, equipment and storage medium
CN115456077A (en) Feature set determination method and device and electronic equipment
CN116188063A (en) Guest group creation method, apparatus, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination