CN114707762A

CN114707762A - Credit risk prediction method, device, equipment and medium

Info

Publication number: CN114707762A
Application number: CN202210477902.9A
Authority: CN
Inventors: 乔媛; 朱道彬; 闫冬梅; 汪婕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-05

Abstract

The disclosure provides a credit risk prediction method, which can be used in the technical field of big data. The method comprises the following steps: acquiring personal information of N clients, wherein the personal information of each client comprises N information items; quantifying the personal information of n clients to obtain a personal information matrix; aiming at the personal information matrix, calculating a graph Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method; reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix; classifying the n customers by adopting a clustering method based on the characteristic matrix; and predicting credit risks of the n customers according to the classified n customers. According to the prediction method in the embodiment of the disclosure, the feature space of the graph laplacian matrix can be rapidly obtained through the local optimal block conjugate gradient method, the calculation speed is high, and the occupied memory is small.

Description

Credit risk prediction method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of big data, in particular to a method, a device, equipment and a medium for predicting credit risk.

Background

At present, the big data analysis of credit risk generally uses the information of each client as a sample, and uses a machine learning method to classify the clients to obtain the credit risk level of each client. However, for a bank with more than ten million levels of customer data, when a new customer is added or a new item of information is added, the machine needs to recalculate and classify all the data, which results in huge calculation amount, time and resource consumption.

Disclosure of Invention

In view of the above, the present disclosure provides a method, apparatus, device, medium, and program product for credit risk prediction.

According to a first aspect of the present disclosure, there is provided a method for predicting credit risk, comprising the steps of:

obtaining the authorization of a client to obtain personal information;

under the condition that the authorization of the client for obtaining the personal information is obtained, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer greater than or equal to 1, and N is an integer greater than or equal to 2;

quantizing the personal information of the N clients to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents quantized personal information of one client;

aiming at the personal information matrix, calculating a graph Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix, wherein the characteristic matrix is a matrix with n rows and b columns, b is a positive integer and b is more than or equal to 1 and less than n;

classifying the n customers by adopting a clustering method based on the characteristic matrix; and

and predicting credit risks of the n clients according to the classified n clients.

According to the prediction method in the embodiment of the disclosure, the optimal gradient direction can be quickly searched by using the local optimal block conjugate gradient method, the dimension of the N information items of each client is reduced, and then the characteristic space of the graph Laplace matrix is quickly obtained, the calculation speed is high, the occupied memory is small, and the calculation amount and the training speed of the neural network are greatly reduced.

According to some exemplary embodiments, the reducing the dimension of the graph laplacian matrix by using the locally optimal block conjugate gradient method to obtain the feature matrix corresponding to the graph laplacian matrix specifically includes:

determining a search direction by adopting an iteration method, wherein the search direction is gradually consistent with the direction of a vector of each column in a characteristic matrix to be determined; and

and determining a characteristic matrix corresponding to the graph Laplacian matrix according to the determined search direction.

According to some exemplary embodiments, the determining the search direction by using the iterative method so that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined specifically includes:

obtaining an intermediate matrix based on the graph Laplace matrix, wherein the intermediate matrix is a matrix with n rows and b columns;

calculating eigenvalues and eigenvectors of the intermediate matrix; and

and generating a first sub-matrix according to the feature vector of the intermediate matrix, wherein the vector of each column in the first sub-matrix represents a search direction, the search direction represented by the vector of each column in the first sub-matrix corresponds to the direction of the vector of each column in the feature matrix to be determined, and the first sub-matrix is a matrix with n rows and b columns.

According to some exemplary embodiments, the determining the search direction by using the iterative method so that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined further specifically includes:

generating a second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix and the first sub-matrix, wherein the second sub-matrix is a matrix with n rows and b columns;

wherein the vector of each column in the second sub-matrix represents a residual vector between the graph laplacian matrix and the first sub-matrix.

generating a third sub-matrix according to the feature vector of the intermediate matrix, the first sub-matrix and the second sub-matrix, wherein the third sub-matrix is a matrix with n rows and b columns;

wherein the first, second, and third sub-matrices constitute a search matrix representing a search subspace.

updating the first sub-matrix using an iterative method based on the feature vectors of the intermediate matrix and the search matrix,

and in the iteration process, updating the first sub-matrix according to the feature vector of the intermediate matrix in the previous iteration process and the search matrix in the previous iteration process to generate the first sub-matrix in the current iteration process.

in the iteration process, updating the second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix in the current iteration process and the first sub-matrix in the current iteration process to generate the second sub-matrix in the current iteration process.

According to some exemplary embodiments, the vectors of each column in the third sub-matrix represent a difference between bases of the search subspace during two adjacent iterations.

According to some exemplary embodiments, the determining the search direction by using an iterative method so that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined further includes:

in the iteration process, updating the third sub-matrix according to the feature vector of the intermediate matrix in the previous iteration process and the first sub-matrix and the second sub-matrix in the previous iteration process so as to generate the third sub-matrix in the current iteration process.

According to some exemplary embodiments, determining a feature matrix corresponding to the graph laplacian matrix according to the determined search direction specifically includes:

in the iteration process, when the vector of the ith column in the updated second sub-matrix meets a first specified condition, determining the vector of the ith column in the updated first sub-matrix as one column of the feature matrix, wherein i is a positive integer and is more than or equal to 1 and less than b.

According to some exemplary embodiments, the updating that the vector of the ith column in the second sub-matrix satisfies the first prescribed condition includes:

and the norm of the vector of the ith column in the updated second submatrix is smaller than a specified threshold value.

According to some exemplary embodiments, obtaining an intermediate matrix based on the graph laplacian matrix specifically includes:

generating the intermediate matrix based on the graph laplacian matrix, the search matrix, and a transpose of the search matrix.

According to some exemplary embodiments, the first sub-matrix used for the first time in the iterative process is a random matrix with n rows and b columns.

According to some exemplary embodiments, the method further comprises: updating the feature matrix when at least one information item of at least one of the n customers changes; and/or the presence of a gas in the gas,

and after the personal information of the (n + 1) th client is acquired, updating the feature matrix.

According to some exemplary embodiments, in the process of updating the feature matrix, a first sub-matrix used for the first time in an iterative process is a feature matrix before updating.

A second aspect of the present disclosure provides a credit risk prediction apparatus, including:

the client authorization acquisition module is used for acquiring the authorization of the client for acquiring the personal information;

a personal information acquisition module to: under the condition that the authorization of the client for obtaining the personal information is obtained, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer which is greater than or equal to 1, and N is an integer which is greater than or equal to 2;

a personal information matrix acquisition module to: quantizing the personal information of the N clients to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents quantized personal information of one client;

a graph laplacian matrix calculation module to: aiming at the personal information matrix, calculating a graph Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method, wherein the graph Laplacian matrix is a matrix with n rows and n columns;

a feature matrix acquisition module to: reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix, wherein the characteristic matrix is a matrix with n rows and b columns, b is a positive integer and b is more than or equal to 1 and less than n;

a classification module to: classifying the n customers by adopting a clustering method based on the characteristic matrix; and

a credit risk prediction module to: and predicting credit risks of the n clients according to the classified n clients.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described method.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.

A fifth aspect of the disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the method described above.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, taken in conjunction with the accompanying drawings of which:

fig. 1 schematically illustrates an application scenario of a credit risk prediction method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a method of prediction of credit risk according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow diagram for dimensionality reduction of an illustrative Laplace matrix using a locally optimal block conjugate gradient method, in accordance with an embodiment of the present disclosure;

fig. 4 schematically shows a block diagram of a structure of a credit risk prediction apparatus according to an embodiment of the present disclosure; and

fig. 5 schematically shows a block diagram of an electronic device adapted to implement a prediction method of credit risk according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Currently, the prediction of credit risk of customers by each big bank is usually realized by manual judgment or Machine prediction, the manual classification has a large influence of subjective factors, and objective evaluation criteria are difficult to form, and for applying Machine learning to classify and predict, the information of each customer is usually used as a sample, and then the credit risk level of each customer is obtained by using algorithms such as Support Vector Machine (SVM), neural network or decision tree.

However, for a bank with more than ten million levels of customer data, when a new customer is added or a new information item is added, the machine needs to recalculate and classify all the data, which results in huge calculation amount, time and resource consumption, and may not respond to the changed data in time.

Because the manual judgment is subjective, the prediction method provided by the application is improved on machine learning. In a plurality of information items of the client, a plurality of irrelevant information or unimportant information exists, so that the evaluation result of credit risk is interfered, unnecessary calculation burden is increased, and the method effectively avoids the information to quickly find useful information items and further classifies the credit of financial clients or enterprises. Different from the dimension reduction in the prior art, the dimension reduction is performed on the graph Laplace matrix by using a local optimal block conjugate gradient method, and the characteristic space of the graph Laplace matrix can be quickly approximated, so that the prediction method has the advantages of small occupied memory and quick calculation.

To facilitate understanding of the technical solutions of the present application, technical terms related to the present application will be described below.

And (3) spectral clustering algorithm: a non-supervision machine learning method, the spectral clustering algorithm is based on spectrogram theory in the graph theory, its essence is to change the clustering problem into the optimal dividing problem of the graph, compare with traditional clustering algorithm, it can be clustered on the sample space of arbitrary shape and converged on the advantage of the overall optimum solution.

And (3) reducing the dimensionality: the method is a key step in a spectral clustering algorithm, and input data can be reduced in the method, so that the calculated amount is reduced. For example, the data amount is reduced from n × n to n × b, where b must be smaller than n.

Eigenvectors and eigenvalues: the eigenvectors of the matrix are one of the important concepts in matrix theory, and the eigenvectors of the linear transformation are a non-degenerate vector whose direction does not change under the transformation, and the scale of the vector scaled under the transformation becomes its eigenvalues. Mathematically, if the vector v and the transformation a satisfy Av ═ λ v, then the vector v is said to be a feature vector of the transformation a, and λ is the corresponding feature value.

Characteristic space: each feature corresponds to a unique coordinate in the feature space.

Embodiments of the present application will be described below with reference to the accompanying drawings. It is to be understood that such description is merely exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth and a thorough explanation of embodiments of the present application is provided. However, one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the description.

It should be noted that, in the technical solution of the present application, the acquisition, storage, application, and the like of the personal information of the related client all conform to the regulations of the relevant laws and regulations, and necessary security measures are taken without violating the good customs of the public order.

Fig. 1 schematically illustrates an application scenario of a credit risk prediction method according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the credit risk prediction method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the credit risk prediction device provided by the embodiment of the present disclosure may be generally disposed in the server 105. The method for predicting credit risk provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the credit risk prediction apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Fig. 2 schematically shows a flow chart of a method of predicting credit risk according to an embodiment of the present disclosure.

As shown in fig. 2, the prediction method of this embodiment includes operations S110 to S170.

In operation S110, a client' S authorization to acquire personal information is acquired.

In embodiments of the present disclosure, the consent or authorization of the customer needs to be obtained before the information of the customer is obtained. For example, prior to operation S120, the customer may be notified to issue a request to the customer to obtain his personal information and associated other information. In case that the customer agrees or authorizes that the personal information can be acquired, operation S120 is performed.

In operation S120, in a case where the authorization for acquiring the personal information is obtained from the client, the personal information of N clients is acquired, where the personal information of each client includes N information items, the N information items are all related to credit risk, N is an integer greater than or equal to 1, and N is an integer greater than or equal to 2.

Before credit risk prediction is performed on a client, personal information related to the client, namely client information, needs to be obtained. From the customer data, information of the customer's business ability, profitability, repayment ability, development ability, customer qualifications, credit status, etc. of the customer is generally known, and the credit of the customer is judged according to the relevant information.

The personal information channel from which the customer is obtained may be varied, for example, in a specific example, the following fields may be included: "company, 2015 established, turnover, earnings per share, net profit growth rate, performance, … …". The related information of the attribute and the operation condition of the organization can be obtained from the data information. In some examples, the profile information may be further expanded, for example, the information of the organization may include: enterprise registration supplementary information, enterprise registration change information, stockholder registration information, unit participation condition information, enterprise legal persons, enterprise financial data, enterprise tax payment information, enterprise tax payment registration information, enterprise public deposit payment information and the like; other information such as: organization name, business income of enterprises, basic information of enterprises, etc.

After the personal information of the clients is obtained, the personal information is arranged into N information items, namely, each client corresponds to the N information items. However, since some information items may be associated in the N information items, for example, the income of a company or an individual is high, the tax payment will also increase accordingly, and the income information and the tax payment amount are both in a certain proportion, the related information items can be merged, thereby achieving the purpose of reducing the dimension. The present application relates to dimensionality reduction, and therefore in this operation, the acquired information item N is an integer equal to or greater than 2.

It is understood that the existence of negative ratios among a plurality of information items can also be processed by combining and reducing dimensions.

In operation S130, the personal information of N customers is quantized to obtain a personal information matrix, where the personal information matrix is a matrix of N rows and N columns, and each row of the personal information matrix represents quantized personal information of one customer.

Before dimension reduction, the personal information of n customers is subjected to mathematical information processing.

The specific processing method may be adjusted for the specific content in the information item, for example, by a simple number for profitability (net profit, gross profit, etc.), by the number of default for performance, or by the ratio of the number of default to the total number of credits, for example.

Utilizing quantized client data available element X_ijAnd j represents the jth information item of the ith client. Thus, i and j can both be represented as nodes in an undirected graph.

In operation S140, a graph laplacian matrix corresponding to the personal information matrix is calculated for the personal information matrix by using a spectral clustering method, wherein the graph laplacian matrix is a matrix of n rows and n columns.

When calculating the graph laplacian, it is necessary to generate the adjacency matrix W using the quantized personal information. The adjacency matrix is a matrix representation of the graph, and the problem of the graph can be researched by a linear algebra method through the structure of the graph which can be conveniently stored. In the present application, the solution is performed on n customers, so that the adjacency matrix W is a matrix of n × n, that is, the adjacency matrix W is a matrix of n × n dimensions, where the matrix elements are X obtained in operation S130_ijThe element, denoted as the weight of the edge (i, j), is 0 in the adjacency matrix if no edge is connected between the two nodes. Particularly, all sample points are traversed by a K-neighbor method by using a KNN algorithm, only k points nearest to each sample are reserved as neighbors, namely only W between the k points nearest to the sample_ij> 0, and the rest are set to 0.

The calculation formula is as follows:

in the above formula, X_iIs the data of the ith row, i.e. the personal information of the ith client, X_jIs the data of the jth row, i.e., the personal information of the jth client.

It is understood that formula (1) represents as long as X is present_iAt X_jIn the K neighbor set of (1), W is reserved_ij(ii) a (2) Formula (II) represents X_iAnd X_jMust be in k-adjacent sets of each other to retain W_ij。

For W_ijCan use Euclidean distance measure to calculate (X) any two points_iAnd X_j) The distance between them. In the process of presetting the function, a polynomial kernel function, a gaussian kernel function and a Sigmoid kernel function are commonly used.

As a specific embodiment of the present application, the most common gaussian kernel function is used for the calculation.

After the adjacent matrix W is obtained by the above method, the diagonal matrix D, i.e., the degree matrix, is further calculated.

Since the application belongs to the undirected graph, for the undirected graph, the weighting degree of the node is the sum of the weighted values of all edges related to the node. The weighting degree of the node i of the adjacent matrix W of the undirected graph is the sum of the ith row elements of the adjacent matrix.

And calculating a corresponding diagonal matrix D according to the adjacent matrix W, wherein the formula is as follows:

the diagonal matrix D is finally obtained as:

from n nodes (n clients) in the undirected graph, the adjacency matrix W and the diagonal matrix D, a graph laplacian matrix may be defined on the basis of the adjacency matrix W and the diagonal matrix D, i.e., the graph laplacian matrix a is defined as the difference between the diagonal matrix D and the adjacency matrix W:

A＝D-W

in operation S150, a local optimal block conjugate gradient method is used to perform dimensionality reduction on the graph laplacian matrix to obtain a feature matrix corresponding to the graph laplacian matrix, where the feature matrix is a matrix with n rows and b columns, b is a positive integer and b is greater than or equal to 1 and less than n.

The obtained graph Laplace matrix A is a matrix of n x n dimensions, and in the operation, the graph Laplace matrix is subjected to dimensionality reduction by adopting a local optimal block conjugate gradient method, namely the graph Laplace matrix of n x n dimensions is changed into a characteristic matrix of n x b dimensions.

The local optimal block conjugate gradient method can explore the optimal gradient direction, so that the feature space of the graph Laplace A can be quickly approximated in the machine training and using processes.

In operation S160, n customers are classified by using a clustering method based on the feature matrix.

For the n x b dimensional feature matrix Q, using the data of each row as a sample, and using K-means to classify the customers. It should be clear that the feature matrix Q is a matrix with n rows and b columns, and this operation is to classify the personal information b after dimension reduction corresponding to each customer in the customer n as a sample.

In the embodiment of the present disclosure, after the n × b-dimensional feature matrix is obtained by using the above-mentioned dimension reduction method, the customers are classified by using the data of each row as a sample, and the embodiment of the present disclosure is not limited to the above-mentioned K-means clustering method, and all customers may be classified by using a trained neural network or a decision tree (including a derivation method such as a classical decision tree method and a random forest).

It should be noted that in practical applications, experience shows that the resulting b column should be the same as the number of categories after the customer is classified, otherwise the error becomes large. That is, if the customers are classified into 7 classes, n × 7 data, that is, 7 eigenvectors of the graph laplacian a should be present in the feature matrix Q, and the feature matrix Q after dimensionality reduction is a matrix of n rows and 7 columns.

In operation S170, credit risks of the n customers are predicted according to the classified n customers.

Each category is set to a different credit rating, i.e. n customers can be divided into b categories, corresponding to b credit risks.

In one embodiment, after the classification of the clients is completed in operation S160, the performance of all clients is retrieved, the performance of the clients in each group is summed in units of the classified group, and then the clients are sorted according to the obtained values. The highest score, i.e., the best performance within a group, may correspond to one level, the second highest to two levels, and so on.

According to the prediction method in the embodiment of the disclosure, the optimal gradient direction can be quickly searched by using a local optimal block conjugate gradient method, and the dimension reduction is performed on the N information items of each client, so that the feature space of the graph laplacian matrix is quickly obtained, the calculation speed is high, the occupied memory is small, and the calculation amount and the training speed of the neural network are greatly reduced.

Fig. 3 schematically illustrates a flow diagram for dimensionality reduction of an illustrative laplacian matrix using a locally optimal block conjugate gradient method, according to an embodiment of the present disclosure.

As shown in fig. 3, the dimension reduction process of this embodiment includes operations S210 to S220.

In operation S210, a search direction is determined using an iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined.

For operation S210, first, based on the graph laplacian matrix, an intermediate matrix is obtained, where the intermediate matrix is a matrix of n rows and b columns.

It can be understood that the dimension of the graph laplace matrix a is changed from the dimension of n × n to the dimension of n × B in the intermediate matrix B, i.e. dimension reduction is realized.

In the machine learning process in this step, the solution problem of partial differential equations is involved. In practical application, the finite element method can simplify the solving process, and the approximate solution of the calculus equation is obtained.

Then, eigenvalues and eigenvectors of the intermediate matrix are calculated.

And finally, generating a first sub-matrix according to the feature vector of the intermediate matrix, wherein the vector of each column in the first sub-matrix represents a search direction, the search direction represented by the vector of each column in the first sub-matrix corresponds to the direction of the vector of each column in the feature matrix to be determined, and the first sub-matrix is a matrix with n rows and b columns.

The algorithm applied by the present application for the above solving process is Rayleigh-Ritz method (Rayleigh-Ritz method), which is a process of finding a function that can minimize it, triggered directly from a generic function.

For example, the Rayleigh-Ritz algorithm may be applied as follows:

inputting: a is an element of R^n*nA matrix;

S∈R^n*bmatrix, (b is more than or equal to 0 and less than n);

and (3) outputting: θ: diagonal matrix of order b

Y: b-order matrix

RR：

Calculating matrix B ═ S^TAS

Finding all eigenvectors y of B₁，y₂，...，y_bWith all characteristic values theta₁，θ₂，...，θ_b

Let Y be ═ Y₁ y₂…y_b]，

The matrix B is an intermediate matrix, the input matrix A is a graph Laplace matrix, S is a search matrix, the output matrix theta is an eigenvalue matrix, the output Y is an eigenvector, and RR is an abbreviation of Rayleigh-Ritz and represents a Rayleigh-Ritz algorithm.

The above algorithm can be interpreted as being that at S ∈ R^n*bAn orthogonal basis number is calculated, an eigenspace corresponding to B eigenvectors is approximated (B is the number of desired categories, which is specifically explained in operations S160 and S170), an intermediate matrix B is calculated, and an eigenvector Y and an eigenvalue θ of the intermediate matrix B are solved.

In calculating the intermediate matrix B, the intermediate matrix is generated based on the graph laplacian matrix, the search matrix, and the transposed matrix of the search matrix, i.e., B ═ S^TAS。

The following dimensionality reduction is performed on the graph laplacian matrix by using the eigenvalue theta and the eigenvector Y obtained by the Rayleigh-Ritz algorithm.

Generating a second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix and the first sub-matrix, wherein the second sub-matrix is a matrix with n rows and b columns; and the vector of each column in the second sub-matrix represents a residual vector between the graph Laplace matrix and the first sub-matrix.

Solving a second sub-matrix R according to the intermediate matrix B, the eigenvector Y and the eigenvalue theta₀The residual vector characterizes the desired accuracy.

In the process, Rayleigh-Ritz can be reused

RR(Y，θ)＝RR(A，X₀)

Wherein X is₀Is a random matrix, X₀Is a matrix of n rows and b columns, i.e. X₀∈R^n*bThen by

R₀＝AX₀-X₀θ₀

Obtaining a second sub-matrix R₀The second sub-matrix R₀Which can be understood as the direction of the search subspace residual.

It should be noted that the first sub-matrix used for the first time in the iteration process is a random matrix with n rows and b columns, that is, the first sub-matrix is X₀。

Generating a third sub-matrix according to the feature vector of the intermediate matrix, the first sub-matrix and the second sub-matrix, wherein the third sub-matrix is a matrix with n rows and b columns; wherein the first, second and third sub-matrices constitute a search matrix representing a search subspace.

According to the feature vector Y of the intermediate matrix and the first sub-matrix X₀And a second sub-matrix R₀Solving the third sub-matrix P₀Finally the first sub-matrix X₀A second sub-matrix R₀And a third sub-matrix P₀Constructing a search matrix S representing a search subspace₀I.e. can be represented as

S₀＝[X₀，R₀，P₀]

And updating the first sub-matrix by adopting an iteration method according to the characteristic vector of the intermediate matrix and the search matrix, wherein in the iteration process, the first sub-matrix is updated according to the characteristic vector of the intermediate matrix in the previous iteration process and the search matrix in the previous iteration process so as to generate the first sub-matrix in the current iteration process.

And in the iteration process, updating the second sub-matrix according to the eigenvalue and the eigenvector of the intermediate matrix in the current iteration process and the first sub-matrix in the current iteration process to generate the second sub-matrix in the current iteration process.

And in the iteration process, updating the third sub-matrix according to the feature vector of the intermediate matrix in the previous iteration process and the first sub-matrix and the second sub-matrix in the previous iteration process so as to generate the third sub-matrix in the current iteration process.

In the computer domain, k may be used to represent the number of iterations, usually noted in the lower corner. In the iterative process, each time the iterative process is performed, the eigenvector, the first sub-matrix, the second sub-matrix value and the third sub-matrix of the intermediate matrix are +1 on the basis of the original subscript, and based on the relationship among the eigenvector, the first sub-matrix, the second sub-matrix value and the third sub-matrix of the intermediate matrix, the iterative process can be expressed as follows:

X_k+1＝S_kY_k

P_k+1＝[0，R_k，P_k]Y_k

R_k+1＝AX_k+1-X_k+1θ_k+1

k＝k+1

further, the vector of each column in the third sub-matrix represents the difference between the bases of the search sub-spaces in the two adjacent iterations.

In operation S220, a feature matrix corresponding to the graph laplacian matrix is determined according to the determined search direction.

For operation S220, in the iterative process, when the vector of the ith column in the updated second sub-matrix satisfies a first specified condition, determining the vector of the ith column in the updated first sub-matrix as a column of the feature matrix, where i is a positive integer and 1 ≦ i < b.

Further, the step of updating the vector of the ith column in the second submatrix to satisfy the first specified condition includes: and the norm of the vector of the ith column in the updated second submatrix is smaller than a specified threshold value.

For example, in an embodiment of the present disclosure, the feature matrix Q may be determined as follows in an iterative process.

(1) Generating a random matrix X₀∈R^n*b，(0≤b＜n)；

(2) Feature vectors and feature values are calculated using the Rayleigh-Ritz algorithm:

(3) giving an iteration initial value: r is₀：＝AX₀-X₀ θ₀，k：＝0，Q：＝[]，P₀：＝[]；

(4) When the number of columns of the feature matrix Q is less than b, the following iterative process is performed:

let Q and R_kNormalizing and orthogonalizing;

order S_k：＝[X_k，R_k，P_k]，

X_k+1：＝S_kY_k；

P_k+1：＝[0，R_k，P_k]Y_k；

R_k+1：＝AX_k+1-X_k+1 θk₊₁；

k：＝k+1；

If the matrix R is_k+1Norm of some column is less than a prescribed threshold value epsilon, X_k+1The corresponding column in (1) is put into a characteristic matrix Q; a (X) is_k+1The corresponding column in (2) is set as a random vector, and then: x₀：＝X_k+1And k is as follows: the above iterative process is performed until the number of columns of the feature matrix Q is equal to b, 0.

According to an embodiment of the application, the prediction method further comprises: updating the feature matrix when at least one information item of at least one of the n clients changes; and/or updating the feature matrix after acquiring the personal information of the (n + 1) th customer.

When a new information item is added, a field value corresponding to an information item of one of the clients is changed, and a new client is added, the graph laplacian matrix is necessarily updated, and the feature matrix is also updated.

In one embodiment, in the process of updating the feature matrix, the first sub-matrix used for the first time in the iterative process is the feature matrix before updating.

In the prior art, when a new information item is added, a field value corresponding to an information item of one client is changed, and a new client is added, all data needs to be recalculated.

Considering that even if a new information item is added or a new client is added, the change of various feature values is not large, and the feature space obtained by new calculation is inevitably close to the original feature space, so that the new feature space can be obtained quickly. By the above concept, the prediction method of the present application is calculated on the original feature matrix, and when the information of the client changes, a new feature space of the graph laplacian matrix can be established on the feature space of the existing graph laplacian matrix, that is, on the feature space of the graph laplacian matrixX in operation S150₀Replacing directly with order X₀＝Q。

The second calculation can be hundreds to thousands of times faster than the first calculation in one experiment. It can be concluded that the prediction method of the application can reduce the machine calculation amount, thereby saving resources and reducing calculation time.

Based on the credit risk prediction method, the disclosure also provides a credit risk prediction device. The apparatus will be described in detail below with reference to fig. 4. Fig. 4 schematically shows a block diagram of a prediction apparatus according to an embodiment of the present disclosure.

As shown in fig. 4, the prediction apparatus 800 of this embodiment includes a client authorization acquisition module 810, a personal information acquisition module 820, a personal information matrix acquisition module 830, a graph laplacian matrix calculation module 840, a feature matrix acquisition module 850, a classification module 860, and a credit risk prediction module 870.

The client authorization acquisition module 810 is used for acquiring the authorization of the client to acquire the personal information. In one embodiment, the client authorization obtaining module 810 may be configured to perform the operation S110 described above, and will not be described herein again.

The personal information acquisition module 820 is configured to: and under the condition of obtaining the authorization of the client for obtaining the personal information, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer which is greater than or equal to 1, and N is an integer which is greater than or equal to 2. In an embodiment, the personal information obtaining module 820 may be configured to perform the operation S120 described above, and will not be described herein again.

The personal information matrix obtaining module 830 is configured to: and quantizing the personal information of the N clients to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents the quantized personal information of one client. In an embodiment, the personal information matrix obtaining module may be configured to perform operation S130 described above, and is not described herein again.

The graph laplacian matrix calculation module 840 is configured to: and aiming at the personal information matrix, calculating an image Laplacian matrix corresponding to the personal information matrix by using a spectral clustering method, wherein the image Laplacian matrix is a matrix with n rows and n columns. In an embodiment, the graph laplacian matrix calculation module may be configured to perform operation S140 described above, which is not described herein again.

The feature matrix acquisition module 850 is configured to: and reducing the dimension of the graph Laplace matrix by adopting a local optimal block conjugate gradient method to obtain a characteristic matrix corresponding to the graph Laplace matrix, wherein the characteristic matrix is a matrix with n rows and b columns, b is a positive integer and b is more than or equal to 1 and less than n. In an embodiment, the feature matrix obtaining module may be configured to perform operation S150 described above, which is not described herein again.

The classification module 860 is configured to: and classifying the n customers by adopting a clustering method based on the characteristic matrix. In an embodiment, the classification module may be configured to perform the operation S160 described above, which is not described herein again.

Credit risk prediction module 870 is to: and predicting the credit risks of the n clients according to the classified n clients. In an embodiment, the credit risk prediction module 830 may be configured to perform the operation S170 described above, and will not be described herein again.

According to the prediction device in the embodiment of the disclosure, the prediction method can be executed, the optimal gradient direction can be quickly searched by using the local optimal block conjugate gradient method, the dimension of the N information items of each client is reduced, the feature space of the graph laplacian matrix is quickly obtained, the calculation speed is high, the occupied memory is small, and the calculation amount and the training speed of the neural network are greatly reduced.

According to an embodiment of the present disclosure, any of the client authorization acquisition module 810, the personal information acquisition module 820, the personal information matrix acquisition module 830, the graph laplacian matrix calculation module 840, the feature matrix acquisition module 850, the classification module 860, and the credit risk prediction module 870 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the client authorization acquisition module 810, the personal information acquisition module 820, the personal information matrix acquisition module 830, the graph laplace matrix calculation module 840, the feature matrix acquisition module 850, the classification module 860, and the credit risk prediction module 870 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-a-chip, a system-on-a-substrate, a system-on-a-package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the client authorization acquisition module 810, the personal information acquisition module 820, the personal information matrix acquisition module 830, the graph laplacian matrix calculation module 840, the feature matrix acquisition module 850, the classification module 860, and the credit risk prediction module 870 may be at least partially implemented as a computer program module that, when executed, may perform corresponding functions.

As shown in fig. 5, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or the RAM 903 described above and/or one or more memories other than the ROM 902 and the RAM 903.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated by the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the credit risk prediction method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program may include program code that may be transmitted over any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the client computing device, partly on the client device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the client computing devices over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., over the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method for predicting credit risk, comprising the steps of:

obtaining the authorization of a client to obtain personal information;

2. The method according to claim 1, wherein the dimensionality reduction of the graph laplacian matrix by using a locally optimal block conjugate gradient method to obtain an eigen matrix corresponding to the graph laplacian matrix specifically includes:

3. The method according to claim 2, wherein the determining the search direction using an iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined comprises:

calculating eigenvalue and eigenvector of the intermediate matrix; and

4. The method according to claim 3, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

5. The method according to claim 4, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

6. The method according to any one of claims 2 to 5, wherein the determining the search direction using an iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

7. The method according to claim 6, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

8. The method of claim 7, wherein the vector of each column in the third sub-matrix represents a difference between bases of the search subspace in two adjacent iterations.

9. The method according to claim 8, wherein the determining the search direction by using the iterative method such that the search direction gradually coincides with a direction of a vector of each column in the feature matrix to be determined, further comprises:

10. The method according to claim 9, wherein the determining a feature matrix corresponding to the graph laplacian matrix according to the determined search direction specifically includes:

11. The method of claim 10, wherein the updated vector of the ith column in the second sub-matrix satisfying a first specified condition comprises:

and the norm of the vector of the ith column in the updated second sub-matrix is smaller than a specified threshold value.

12. The method according to claim 3, wherein the obtaining an intermediate matrix based on the graph laplacian matrix specifically includes:

13. The method according to claim 6, wherein the first sub-matrix used for the first time in the iterative process is a random matrix with n rows and b columns.

14. The method of claim 13, further comprising: updating the feature matrix when at least one information item of at least one of the n customers changes; and/or the presence of a gas in the gas,

and after the personal information of the (n + 1) th customer is acquired, updating the feature matrix.

15. The method of claim 14, wherein in the updating of the feature matrix, the first sub-matrix used for the first time in the iterative process is the feature matrix before updating.

16. An apparatus for predicting a credit risk, comprising:

a personal information acquisition module to: under the condition that the authorization of the client for obtaining the personal information is obtained, obtaining the personal information of N clients, wherein the personal information of each client comprises N information items, the N information items are all related to credit risk, N is an integer greater than or equal to 1, and N is an integer greater than or equal to 2;

a personal information matrix acquisition module to: quantizing the personal information of the N customers to obtain a personal information matrix, wherein the personal information matrix is a matrix with N rows and N columns, and each row of the personal information matrix represents the quantized personal information of one customer;

17. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-15.

18. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 15.

19. A computer program product comprising a computer program which, when executed by a processor, carries out the method according to any one of claims 1 to 15.