CN117011025A

CN117011025A - Credit risk prediction method, apparatus, device, storage medium and program product

Info

Publication number: CN117011025A
Application number: CN202310996118.3A
Authority: CN
Inventors: 阮逸松
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2023-08-08
Filing date: 2023-08-08
Publication date: 2023-11-07

Abstract

The application discloses a credit risk prediction method, a credit risk prediction device, credit risk prediction equipment, a credit risk prediction storage medium and a credit risk prediction program product. The application relates to the technical field of big data processing. The method comprises the following steps: comprising the following steps: extracting initial risk index features from the target credit business data; screening the initial risk index features based on the correlation information to obtain candidate risk index features; screening the candidate risk index features based on a random forest regression model to obtain target risk index features; and processing the target risk index features based on a plurality of heterogeneous regression models to obtain target risk return information of the target credit business data. The credit risk prediction method provided by the embodiment of the application can improve the accuracy of predicting risk benefits, thereby improving the safety of the target credit business.

Description

Credit risk prediction method, apparatus, device, storage medium and program product

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a credit risk prediction method, a credit risk prediction device, credit risk prediction equipment, a credit risk prediction storage medium and a credit risk prediction program product.

Background

Risk capital yield is an important indicator of the level of risk capital benefits. To raise the level of risk capital benefit, the relationship of benefit, capital and risk is handled, which is also a valuable manifestation of risk capital management. In a business to public credit at a financial institution (e.g., a bank), risk is closely related to the credit structure of the customer, business. The relationship among the benefits, the capital and the credit structure is balanced, and the method is an important means for improving the risk value management capability of institutions on the credit business and improving the level of the capital benefits. Therefore, risk prediction for credit services is particularly important.

Disclosure of Invention

The embodiment of the application provides a credit risk prediction method, a device, equipment, a storage medium and a program product, which can predict risk and income information of a credit business and improve the security of the credit business.

In a first aspect, an embodiment of the present application provides a method for predicting credit risk, including:

extracting initial risk index features from the target credit business data;

screening the initial risk index features based on the correlation information to obtain candidate risk index features;

screening the candidate risk index features based on a random forest regression model to obtain target risk index features;

and processing the target risk index features based on a plurality of heterogeneous regression models to obtain target risk return information of the target credit business data.

In a second aspect, an embodiment of the present application further provides a credit risk prediction apparatus, including:

the initial risk index feature acquisition module is used for extracting initial risk index features from the target credit business data;

the candidate risk index feature acquisition module is used for screening the initial risk index features based on the correlation information to obtain candidate risk index features;

the target risk index feature acquisition module is used for screening the candidate risk index features based on a random forest regression model to obtain target risk index features;

and the target risk return information acquisition module is used for processing the target risk index features based on a plurality of heterogeneous regression models to acquire target risk return information of the target credit business data.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for predicting credit risk according to the embodiment of the present application when executing the program.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a credit risk prediction method according to embodiments of the present application.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a credit risk prediction method according to embodiments of the present application.

The embodiment of the application discloses a credit risk prediction method, a device, equipment, a storage medium and a program product, which are used for extracting initial risk index characteristics from target credit business data; screening the initial risk index features based on the correlation information to obtain candidate risk index features; screening candidate risk index features based on a random forest regression model to obtain target risk index features; the target risk index features are processed based on the multiple heterogeneous regression models to obtain target risk benefit information of the target credit business data, and the screened target risk index features of the target credit business data are processed through the multiple heterogeneous regression models to obtain target risk benefit information, so that accuracy of predicting risk benefit can be improved, and safety of the target credit business is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a credit risk prediction method provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a credit risk prediction device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance. The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.

Fig. 1 is a flowchart of a credit risk prediction method provided by an embodiment of the present application, where the method is applicable to predicting credit risk benefits, and the method may be performed by a credit risk prediction apparatus, where the apparatus may be implemented in software and/or hardware, and optionally, implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like. As shown in fig. 1, the method specifically includes the following steps:

s110, extracting initial risk index features from the target credit business data.

The target credit business data may be public credit business data to be issued, and may include credit customer data, public debt items and other related data. The initial risk indicator feature may be feature data obtained after feature encoding the initial risk indicator. The initial risk indicator may include two dimensions, a credit customer dimension and a debt dimension, respectively. The initial risk index included by the credit client dimension may be: industry gate class, strategic emerging industry classification, enterprise scale level, probability of breach, customer credit policy classification, industry credit policy classification, customer breach correlation classification. The initial risk indicator included in the liability dimension may be: whether to offer finance, whether to green credit, credit variety classification, credit balance, exposure to default risk (Exposure at Default, EAD), whether to high credit level guarantor replacement, whether to supply chain financing, guarantee gold/credit balance, mortgage value/credit balance, bond term classification, economic capital occupation, risk cost occupation, etc. In this embodiment, the target credit business data includes data of a plurality of fields, one field corresponding to each initial risk indicator.

Alternatively, the process of extracting the initial risk indicator feature from the target credit business data may be: preprocessing the target credit business data, wherein at least one of the following steps is performed: removing redundant field, processing missing field data and processing abnormal field data; extracting a plurality of risk indexes in the preprocessed target credit business data; and performing feature coding on the multiple risk indexes to obtain initial risk index features.

In this embodiment, the process of eliminating the redundant field of the target credit service data may be: the target credit business data is analyzed, preserving fields useful for predicting risk benefits. The included fields may include: average annual loan balance, average annual credit balance, average annual deposit balance, loan interest income, internal funds transfer expenditure, loan allocation cost, internal funds transfer income, deposit interest expenditure, deposit allocation cost, economic capital occupation estimation, strategic emerging industry classification, general finance, and the like. The processing mode of missing field data for the target credit business data can be as follows: for continuous data, mean value filling and zero filling modes can be adopted; for discrete data, the value with the highest frequency of occurrence of the user is complemented with the missing value; for fields where the missing value exceeds a certain number, the field may be deleted directly. The detection mode of the abnormal field data of the target credit business data can be as follows: and determining a normal range corresponding to the data, and if one data exceeds the normal range, indicating that the data is abnormal data.

In this embodiment, after preprocessing of eliminating redundant fields, processing missing field data, and processing abnormal field data is performed on the target credit service data, specific values of each field are extracted, and a plurality of risk indexes are obtained.

Alternatively, the multiple risk indicators may be feature coded in a discrete feature coding manner. Wherein the discrete feature code may be a Label Encoder, ordered Encoder, or one-hot code. For Label Encoder or Ordered Encoder, the data can be encoded as an integer of [0, n-1], for example: time limit classification: 0-3, and client breach correlation classifications 0-4. The one-hot encoding may be performed using an N-bit status register to encode N states, each with its own register bit, and at any time only one of the bits is active. For example, for a feature containing M classes, after single thermal encoding, M binary features are changed (e.g. enterprise size has three classes, namely large, medium and small, and one-hot is 100,010,001). And, these features are mutually exclusive, with only one activation at a time.

And S120, screening the initial risk index features based on the correlation information to obtain candidate risk index features.

Wherein the correlation information may be correlation information between the initial risk indicator feature and the risk return information. In this embodiment, the risk benefit information may be characterized by one or more of the following: risk cost estimation, capital occupation estimation, loan net benefit, deposit net benefit, intermediate business net benefit, net benefit aggregate, risk capital return rate, economic capital occupation rate, risk cost occupation rate, daily loan income interest rate estimation, daily deposit indication interest rate estimation, annual daily deposit credit ratio, daily deposit internal funds transfer interest rate, daily loan internal funds transfer interest rate.

Optionally, the method for screening the initial risk indicator features based on the feature correlation information to obtain candidate risk indicator features may be: acquiring correlation information between initial risk index features and risk return information; and screening out the initial risk index features of which the correlation information meets the set conditions, and determining the initial risk index features as candidate risk index features.

Wherein correlation information between the initial risk indicator features and the risk return information may be determined based on historical credit business data. Specifically, the manner of obtaining the correlation information between the initial risk indicator feature and the risk benefit information may be: acquiring initial risk index characteristics and risk return information of historical credit business data; and carrying out correlation analysis on the initial risk index characteristics and the risk return information of the historical credit business data to obtain correlation information between the initial risk index characteristics and the risk return information.

Wherein the correlation information is characterized by a correlation evaluation value. In this embodiment, the manner of performing correlation analysis on the initial risk index feature and risk gain information of the historical credit business data may be: the method is realized by adopting any existing correlation analysis method, and is not limited herein. Accordingly, after obtaining the correlation information between the initial risk index feature and the risk benefit information, the initial risk index feature, whose correlation information satisfies the set condition, is screened out, and the mode of determining the initial risk index feature as the candidate risk index feature may be: and screening out the initial risk index features with the correlation evaluation value larger than the set threshold value, and determining the initial risk index features as candidate risk index features.

The set threshold may be set by a user, and is not limited herein. Illustratively, table 1 shows initial risk indicator features that are ranked first with respect to the relevance score between risk benefit information, and these initial risk indicator features are determined as candidate risk indicator features.

TABLE 1

As shown in table 1, the indexes of credit balance, customer default correlation, small and micro, medium and large, customer credit policy, term, high credit level guarantor substitution, supply chain financing and deposit and risk gain have strong correlation, i.e. the correlation evaluation value exceeds 0.1, so that the indexes are determined as candidate risk index features. In the embodiment, the initial risk index features are screened based on the correlation information, so that the interference of irrelevant initial risk index features on the predicted target risk benefit information can be reduced, and the prediction accuracy is improved.

And S130, screening candidate risk index features based on a random forest regression model to obtain target risk index features.

The random forest regression model may be a regression model trained in advance, and is used for predicting the association degree between the risk index and the target risk gain information, where the association degree may be represented by an association degree index value.

Specifically, the method for screening candidate risk index features based on the random forest regression model may be that: inputting the candidate risk index features into a random forest regression model, and outputting relevance index values of the candidate risk index features; and screening target risk index features from the candidate risk index features based on the association index values.

The relevance index value is used for representing the relevance between the candidate risk index and the target risk gain information. In this embodiment, candidate risk indicator features are respectively input into a random forest regression model, and relevance index values of the candidate risk indicator features are obtained.

Specifically, the process of screening the target risk indicator feature from the candidate risk indicator features based on the relevance index value may be: sorting the association index values; and determining the candidate risk index features with the set number with the maximum association index value as target risk index features.

The ranking of the association index values may be performed in order of from small to large, or may be performed in order of from large to small. And if the candidate risk index features are ranked in the order from small to large, determining the candidate risk index features with the number set after the ranking as target risk index features. And if the candidate risk index features are ranked in the order from large to small, determining the candidate risk index features ranked in the preset number as target risk index features. In this embodiment, a certain number of candidate risk index features with the maximum relevance index value are screened out, so that accuracy of predicting target risk benefit information can be improved.

And S140, processing the target risk index features based on the heterogeneous regression models to obtain target risk return information of the target credit business data.

Wherein the target risk benefit information is characterized by a capital benefit rate. The multiple heterogeneous regression models may be pre-trained regression models including at least two of: tree-structured regression models, linear regression models, nonlinear regression models, and neural network regression models. The tree structure regression model may be a distributed gradient tree (XGBoost) regression model, and the neural network regression model may be a convolutional neural network regression model, a cyclic neural network regression model, or a neural network perceptron. In this embodiment, the loss function used in training the heterogeneous regression models may be determined based on mean square error, mean absolute error, root mean square error, mean absolute percentage error, median absolute error, or a decision coefficient (determined by mean square error and variance).

Optionally, the processing the target risk indicator feature based on the multiple heterogeneous regression models may be the process of obtaining risk benefit information of the target credit business data: respectively inputting target risk index features into a plurality of heterogeneous regression models, and outputting a plurality of initial risk return information; and fusing the plurality of initial risk gain information to obtain target risk gain information of the target credit business data.

In this embodiment, the target risk index features are respectively input into multiple heterogeneous regression models, each regression model processes the target risk index features, and initial risk benefit information is output, i.e. multiple initial risk benefit information is obtained. The manner of fusing the plurality of initial risk benefit information may be: the plurality of initial risk return information is weighted summed.

Specifically, the method for fusing the multiple initial risk gain information to obtain the target risk gain information of the target credit business data may be: obtaining weights of multiple heterogeneous regression models; and carrying out weighted summation on the plurality of initial risk gain information based on the weights to obtain target risk gain information of the target credit business data.

The weight of each heterogeneous regression model can be determined by the identification accuracy of the regression model, and the weight is in direct proportion to the identification accuracy, namely, the higher the identification accuracy is, the larger the weight is. After the weight of each regression model is obtained, weighting and summing the plurality of initial risk gain information based on the weight to obtain target risk gain information of the target credit business data. In this embodiment, the prediction results of the multiple heterogeneous regression models are fused, so that accuracy of risk and benefit information prediction can be improved.

In this scenario, the financial institution is instructed to pay after the target risk and benefit information of the target credit business data is obtained.

According to the technical scheme of the embodiment, initial risk index features are extracted from the target credit business data; screening the initial risk index features based on the correlation information to obtain candidate risk index features; screening candidate risk index features based on a random forest regression model to obtain target risk index features; the target risk index features are processed based on the multiple heterogeneous regression models to obtain target risk benefit information of the target credit business data, and the screened target risk index features of the target credit business data are processed through the multiple heterogeneous regression models to obtain target risk benefit information, so that accuracy of predicting risk benefit can be improved, and safety of the target credit business is improved.

Fig. 2 is a schematic structural diagram of a credit risk prediction device according to an embodiment of the present application, where, as shown in fig. 2, the device includes:

an initial risk indicator feature acquisition module 210 for extracting initial risk indicator features from the target credit business data;

a candidate risk indicator feature obtaining module 220, configured to screen the initial risk indicator feature based on the correlation information, to obtain a candidate risk indicator feature;

the target risk indicator feature obtaining module 230 is configured to screen the candidate risk indicator features based on a random forest regression model to obtain target risk indicator features;

the target risk benefit information obtaining module 240 is configured to process the target risk index feature based on multiple heterogeneous regression models, and obtain target risk benefit information of the target credit business data.

Optionally, the initial risk indicator feature acquiring module 210 is further configured to:

preprocessing the target credit business data, wherein at least one of the following steps is performed: removing redundant field, processing missing field data and detecting abnormal field data;

extracting a plurality of risk indexes in the preprocessed target credit business data;

and carrying out feature coding on the multiple risk indexes to obtain initial risk index features.

Optionally, the candidate risk indicator feature obtaining module 220 is further configured to:

acquiring correlation information between the initial risk index features and the risk benefit information;

and screening out the initial risk index features of which the correlation information meets the set conditions, and determining the initial risk index features as candidate risk index features.

acquiring initial risk index characteristics and risk return information of historical credit business data;

and carrying out correlation analysis on the initial risk index features and the risk return information of the historical credit business data to obtain correlation information between the initial risk index features and the risk return information, wherein the correlation information is characterized by a correlation evaluation value.

and screening out the initial risk index features with the correlation evaluation value larger than a set threshold value, and determining the initial risk index features as candidate risk index features.

Optionally, the target risk indicator feature obtaining module 230 is further configured to:

inputting the candidate risk index features into a random forest regression model, and outputting relevance index values of the candidate risk index features; the association index value is used for representing the association between the candidate risk index and the target risk gain information;

and screening target risk index features from the candidate risk index features based on the relevance index values.

sorting the relevance index values;

and determining the candidate risk index features with the maximum association index value and the set quantity as target risk index features.

Optionally, the target risk benefit information obtaining module 240 is further configured to:

respectively inputting the target risk index features into the heterogeneous regression models, and outputting a plurality of initial risk return information;

and fusing the plurality of initial risk gain information to obtain target risk gain information of the target credit business data.

acquiring weights of the multiple heterogeneous regression models;

and carrying out weighted summation on the plurality of initial risk gain information based on the weight to obtain target risk gain information of the target credit business data.

Optionally, the multiple heterogeneous regression models include at least two of: tree-structured regression models, linear regression models, nonlinear regression models, and neural network regression models.

Optionally, the target risk benefit information is characterized by a capital benefit rate.

The device can execute the method provided by all the embodiments of the application, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided in all the foregoing embodiments of the application.

Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a credit risk prediction method.

In some embodiments, the credit risk prediction method may be implemented as a computer program, which is tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the credit risk prediction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the credit risk prediction method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a method of predicting credit risk as provided by any of the embodiments of the present application.

Computer program product in the implementation, the computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present application and the technical principle applied. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, while the application has been described in connection with the above embodiments, the application is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the application, which is set forth in the following claims.

Claims

1. A method of predicting credit risk, comprising:

extracting initial risk index features from the target credit business data;

2. The method according to claim 1, wherein extracting initial risk indicator features from the target credit business data comprises:

3. The method of claim 1, wherein screening the initial risk indicator features based on feature correlation information to obtain candidate risk indicator features comprises:

4. A method according to claim 3, wherein obtaining correlation information between the initial risk indicator feature and the risk return information comprises:

5. The method of claim 4, wherein screening out initial risk indicator features for which the correlation information satisfies a set condition, and determining the initial risk indicator features as candidate risk indicator features, comprises:

6. The method of claim 1, wherein screening the candidate risk indicator features based on a random forest regression model to obtain target risk indicator features comprises:

7. The method of claim 6, wherein screening target risk indicator features from the candidate risk indicator features based on the relevance index value comprises:

sorting the relevance index values;

8. The method of claim 1, wherein processing the target risk indicator features based on a plurality of heterogeneous regression models to obtain risk benefit information for the target credit business data comprises:

9. The method of claim 8, wherein fusing the plurality of initial risk revenue information to obtain target risk revenue information for the target credit business data comprises:

acquiring weights of the multiple heterogeneous regression models;

10. The method of claim 1, wherein the multiple heterogeneous regression models comprise at least two of: tree-structured regression models, linear regression models, nonlinear regression models, and neural network regression models.

11. The method of claim 1, wherein the target risk benefit information is characterized by a capital benefit rate.

12. A credit risk prediction apparatus, comprising:

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the credit risk prediction method according to any of claims 1-11 when executing the computer program.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a credit risk prediction method as claimed in any one of claims 1-11.

15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the credit risk prediction method according to any of claims 1-11.