CN113469807B

CN113469807B - Credit risk determination and data processing method, apparatus, medium, and program product

Info

Publication number: CN113469807B
Application number: CN202111016266.1A
Authority: CN
Inventors: 张阿飞
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2022-03-01
Anticipated expiration: 2041-08-31
Also published as: CN113469807A

Abstract

Embodiments of the present application provide a credit risk determination and data processing method, apparatus, medium, and program product. In the embodiment of the application, the feature extraction can be respectively carried out on the credit application data of the user to be tested and the credit application data of the sample user, and the credit features of the user to be tested and the credit features of the sample user are determined; calculating the similarity between the user to be tested and the sample user according to the credit characteristics of the user to be tested and the credit characteristics of the sample user; the risk attribute of the user to be detected can be determined according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user. Because the similarity between the user to be detected and the sample user is introduced when the credit risk prediction is carried out on the user to be detected, the reason that the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user, so that the credit risk prediction result has interpretability, and the requirement of a credit wind control scene on the interpretability can be met.

Description

Credit risk determination and data processing method, apparatus, medium, and program product

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a credit risk determination and data processing method, device, medium, and program product.

Background

The deep learning model has stronger representation capability and obtains better model effect in a plurality of scenes. On one hand, the deep learning model has a large demand on the data volume of the sample, and the sample volume under the credit wind control scene generally has difficulty in meeting the data demand of the deep network; on the other hand, the interpretability of the deep learning model is poor, and the requirement of the credit wind control scene on the interpretability cannot be met.

Disclosure of Invention

Aspects of the present application provide a credit risk determination and data processing method, apparatus, storage medium, and program product to satisfy the interpretability requirements of a credit pneumatic scenario.

The embodiment of the application provides a credit risk determination method, which comprises the following steps:

acquiring credit application data of a user to be tested and credit application data of a first sample user with known risk attributes;

respectively performing feature extraction on the credit application data of the user to be tested and the credit application data of the first sample user to determine the credit features of the user to be tested and the credit features of the first sample user;

calculating the similarity between the user to be detected and the first sample user according to the credit characteristics of the user to be detected and the credit characteristics of the first sample user;

and determining the risk attribute of the user to be tested according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user.

An embodiment of the present application further provides a data processing method, including:

acquiring production data of an object to be detected and production data of a sample object with known quality attributes;

respectively extracting characteristics of the production data of the object to be detected and the production data of the sample object to determine the production characteristics of the object to be detected and the sample object;

calculating the similarity between the object to be detected and the sample object according to the production characteristics of the object to be detected and the production characteristics of the sample object;

and determining the quality attribute of the object to be detected according to the similarity between the object to be detected and the sample object and the quality attribute of the sample object.

An embodiment of the present application further provides a computer device, including: a memory and a processor; wherein the memory is used for storing a computer program;

the processor is coupled to the memory for executing the computer program for performing the steps in the above credit risk determination method and/or data processing method.

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the above credit risk determination method and/or data processing method.

An embodiment of the present application further provides a computer program product, including: a computer program; when executed by a processor, cause the processor to perform the steps of the credit risk determination method and/or the data processing method described above.

In the embodiment of the application, the feature extraction can be respectively carried out on the credit application data of the user to be tested and the credit application data of the sample user, and the credit features of the user to be tested and the credit features of the sample user are determined; calculating the similarity between the user to be tested and the sample user according to the credit characteristics of the user to be tested and the credit characteristics of the sample user; and then, determining the risk attribute of the user to be detected according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the user to be detected, the similarity between the user to be detected and the sample user is introduced, and the reason why the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user, so that the credit risk prediction has interpretability, and the requirement of a credit wind control scene on the interpretability can be met.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a credit risk determination method provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart of another credit risk determination method provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a model training process provided in an embodiment of the present application;

fig. 4 is a schematic process diagram of a sample obtaining manner according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In a credit wind control scenario, the current mainstream risk prediction model is a scoring card model. However, the scoring card model belongs to a linear model, and a non-linear relation between features cannot be acquired, so that the credit risk prediction accuracy is low. The deep learning model can acquire complex feature relationships, but has poor interpretability. How to meet the requirement of interpretability of a credit wind control scene while considering the accuracy of credit risk prediction becomes a technical problem to be solved urgently by technical personnel in the field.

In order to meet the requirement of interpretability of a credit wind control scene, in some embodiments of the application, feature extraction can be performed on credit application data of a user to be tested and credit application data of a sample user respectively, and credit features of the user to be tested and credit features of the sample user are determined; calculating the similarity between the user to be tested and the sample user according to the credit characteristics of the user to be tested and the credit characteristics of the sample user; and then, determining the risk attribute of the user to be detected according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the user to be detected, the similarity between the user to be detected and the sample user is introduced, and the reason why the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user, so that the credit risk prediction has interpretability, and the requirement of a credit wind control scene on the interpretability can be met.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

It should be noted that: like reference numerals refer to like objects in the following figures and embodiments, and thus, once an object is defined in one figure or embodiment, further discussion thereof is not required in subsequent figures and embodiments.

Fig. 1 is a flowchart illustrating a credit risk determination method according to an embodiment of the present application. As shown in fig. 1, the method includes:

101. and acquiring credit application data of the user to be tested and credit application data of the sample user with known risk attributes.

102. And respectively carrying out feature extraction on the credit application data of the user to be detected and the credit application data of the sample user so as to determine the credit features of the user to be detected and the credit features of the sample user.

103. And calculating the similarity between the user to be detected and the sample user according to the credit characteristics of the user to be detected and the credit characteristics of the sample user.

104. And determining the risk attribute of the user to be detected according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user.

In this embodiment, the credit refers to a value movement form conditioned on repayment and payment, and generally includes credit activities such as bank deposit and loan, and may also include other forms of loan activities such as credit card consumption, white stripe consumption or other actions of paying money after consuming. Credit risk prediction mainly refers to predicting and predicting whether a user has overdue risk. Accordingly, the risk attributes may include: has overdue risk, no overdue risk and the like.

The user to be tested can be any credit application user, can be a credit application user, and can also be a credit application successful user. The user may be a unit user, a collective user, or an individual user, etc. Credit application data refers to any data related to credit risk prediction, which may include: user information, loan information applied by the user, historical behavior information of the user, and the like. The user historical behavior information may include: historical consumption behavior data of the user, historical credit behavior data of the user, and the like. These credit application data may reflect, to some extent, the credit characteristics of the user.

Based on this, in step 101, credit application data of the user to be tested may be obtained. In this embodiment, in order to predict the credit risk of the user to be tested, in step 101, the credit application data of the sample user with known risk attribute may also be obtained. Wherein the risk attributes of the sample users may be the same or partially the same. Preferably, the risk attributes of the sample users are partially the same and partially different, i.e. the risk attributes of the sample users include: with and without risk of overdue.

Further, in step 102, feature extraction may be performed on the credit application data of the user to be tested and the credit application data of the sample user, respectively, to obtain a credit feature of the user to be tested and a credit feature of the sample user. In the embodiment of the application, a specific implementation manner of feature extraction on the credit application data of the user to be tested and the credit application data of the sample user is not limited. In some embodiments, the same feature extraction model may be used to perform feature extraction on the credit application data of the user to be tested and the credit application data of the sample user, so that the similarity between the credit features of the user to be tested and the credit features of the sample user may be calculated in step 103 to have quantifiability. If different feature extraction models are adopted to perform feature extraction on the credit application data of the user to be tested and the credit application data of the sample user, the obtained credit features of the user to be tested and the features of the credit feature characterization of the sample user may be different, and the dimensions between the features may also be different, so that in step 103, the similarity between the user to be tested and the sample user cannot be measured.

In the embodiment of the present application, a specific implementation form of the feature extraction model is not limited. In some embodiments, the feature extraction model may be implemented as a twin network model. The sub-networks in the twin network model have the same structure, and parameters such as sharing weight are shared among the sub-networks. The activation function in the twin network model may be a ReLU function, a softmax function, a tanh function, a sigmoid function, or the like. Accordingly, as shown in fig. 2, the credit application data xt of the user to be tested and the credit application data of the sample user (x1, x2, …, xn shown in fig. 2) may be input into the twin network model. Where n represents the total number of sample users and n is a positive integer. In the twin network model, feature extraction can be respectively carried out on the credit application data of the user to be tested and the credit application data of the sample user so as to determine the credit features of the user to be tested and the credit features of the sample user. In fig. 2, yi represents a risk attribute of the sample user, i =1,2, …, n. Optionally, the risk attribute is overdue risk, yi = 1; the risk attribute is no overdue risk, yi = 0.

Since the number of sample users is plural, plural means 2 or more. In the embodiment of the application, in order to improve the accuracy of credit risk prediction, the number of sample users can be as large as possible, wherein the greater the number of sample users, the greater the accuracy of credit risk prediction on a user to be tested. For multiple sample users, as shown in fig. 2, each sample user and the user to be tested may be grouped into a sample pair to be tested (e.g., (x1, xt), (x2, xt), …, (xn, xt) as shown in fig. 2); and respectively inputting the credit application data of the sample user and the user to be detected in each sample pair to be detected into two sub-networks of the twin network model to obtain the credit characteristics of the sample user and the credit characteristics of the user to be detected corresponding to the sample pair.

Further, in step 103, the similarity between the user to be tested and the sample user may be calculated according to the credit characteristics of the user to be tested and the credit characteristics of the sample user.

Alternatively, as shown in fig. 2, the credit characteristics of the user to be tested and the credit characteristics of the first sample user may be input into the similarity calculation model; in the similarity calculation model, the distance between the credit features of the user to be tested and the credit features of the sample user is calculated, such as dist-x1, dist-x2, … and dist-xn shown in fig. 2. And dist-xi represents the distance between the credit features of the user xt to be tested and the credit features of the sample user xi. Determining the distance between the credit features of the user to be tested and the credit features of the sample user as the similarity between the user to be tested and the sample user; the smaller the distance between the credit features of the user to be tested and the credit features of the sample user is, the greater the similarity between the similarity of the user to be tested and the sample user is. The distance between the credit feature of the user to be tested and the credit feature of the sample user can adopt an euclidean distance, a cosine distance or a mahalanobis distance, but is not limited thereto.

In the embodiment of the present application, a specific implementation form of the similarity calculation model is not limited. In some embodiments, the similarity calculation model may be implemented as a neural network model, such as a Convolutional Neural Network (CNN) model, a Recurrent Neural Network (RNN) model, or a Deep Neural Network (DNN) model, among others.

For the twin network model and the similarity calculation model, before feature extraction by using the twin network model and similarity calculation by using the similarity calculation model, model training is performed on the twin network model and the similarity calculation model. The model training process of the twin network model and the similarity calculation model is exemplified below.

As shown in fig. 3, credit application data for a sample user with known risk attributes may be obtained. In the embodiment of the present application, in order to distinguish the sample user participating in the credit risk prediction of the user to be tested from the sample user participating in the credit risk prediction of the user to be tested, the sample user participating in the credit risk prediction of the user to be tested in fig. 1 is defined as a first sample user; the sample user participating in the model training in the present embodiment is defined as a second sample user. Wherein, the second sample user may or may not include the first sample user. The risk attributes of the second sample user include: with and without risk of overdue.

Wherein the second sample user may be a randomly selected sample user. Considering the degree of imbalance between positive and negative samples, the accuracy of the trained model is affected. In this embodiment, in order to reduce the degree of imbalance between the positive and negative samples, credit application data of a third sample user whose risk attribute is known may be acquired. Wherein the number of the third sample users is greater than the number of the second sample users; and the third sample user comprises the second sample user described above. The risk attributes of the third sample user include: with and without risk of overdue.

In the training process of the neural network model, due to the fact that training samples are insufficient, the overfitting problem in the training process is easily caused. In order to solve the problem, the embodiment of the application expands the sample size of the original sample in a form of pairing the original sample into a sample pair, and provides a method for rapidly generating the sample pair in batch. The main implementation principle is as follows:

assuming that the total number of users of the third sample is M, the number of the largest possible pairs of different samples is M

Each iterative training (step) process is equivalent to a slave

A batch (batch) of randomly drawn Q pairs is trained on different sample pairs. Wherein Q is<And N is added. Usually, M is in the order of tens of thousands, the number of generated sample pairs is large in order of magnitude, which is helpful for solving the over-fitting problem in the deep neural network model training process caused by insufficient training samples. An example of an iterative process of training is described below.

As shown in fig. 4, during each training iteration (step), the selected risk attributes that may be put back from the third sample users are [2N × 1-a ] positive sample users with overdue risk; and the selected risk attributes which are returned from the third sample users are [2N (1-a) ] negative sample users without overdue risk; determining credit application data corresponding to (2N x a) positive sample users and [2N x (1-a) ] negative sample users as credit application data of a second sample user; wherein, N and a are hyper-parameters, N is a positive integer, and the number of N can be determined by the number of actually required samples, which is equivalent to batch size in neural network training. a is a number between (0, 1), the value of which is mainly determined by the degree of unbalance between the positive and negative samples and is determined by adjusting parameters in practical application.

After obtaining 2N second sample users, a plurality of sample pairs may be randomly determined using the second sample users. Each sample pair includes: 2 second sample users. At least one of the sample users contained in different sample pairsThe user is different. For the above embodiment of 2N second sample users, the maximum composition can be obtained

The value of N is generally large and at least larger than 2, so that the sample size of the original sample can be expanded compared with the original 2N samples, the shortage of the sample size of the deep neural network model is compensated, the overfitting degree of the deep neural network model in the training process caused by the shortage of the training samples is reduced, and the overfitting problem of the deep neural network model in the training process caused by the shortage of the training samples can be solved.

After obtaining a plurality of sample pairs, the sample label of each sample pair can be calculated according to the risk attribute of the second sample user contained in the sample pair. In some embodiments, a sample label of a sample pair containing two second sample users with the same risk attribute may be determined to be 1, and a sample label of a sample pair containing two second sample users with different risk attributes may be determined to be 0. Correspondingly, the risk attributes of the two second sample users included in each sample pair can be subjected to exclusive-or calculation to obtain the exclusive-or result of the sample pair corresponding to the risk attributes; and further negating the XOR result of the risk attributes corresponding to each sample pair to obtain a sample label of the sample pair.

Further, the twin network model and the similarity calculation model may be trained using a plurality of samples, with the minimization loss function being a training target. Wherein the loss function is determined according to a difference between the similarity probability output from the similarity calculation model and the sample label. For example, a cross entropy function between the similarity probability output by the similarity calculation model and the sample label may be used as a loss function, and so on.

The similarity calculation model may be learned as a distance model, and a deep neural network is used to generate a high-dimensional nonlinear space in which the closer the credit risk attribute is, the smaller the distance between samples is, and vice versa, the larger the distance is.

Alternatively, the supervised learning process may be decomposed into two stages, namely, a learning stage of a similarity calculation model and a learning stage of a risk prediction model, the learning stage of the similarity calculation model may adopt a pre-training process of a depth-like neural network, and the risk prediction model may adopt a fine-tuning (fine-tuning) training process.

After the twin network model and the similarity calculation model are obtained, feature extraction can be carried out on the credit application data of the user to be tested and the credit application data of the first sample user by using the twin network model to obtain the credit features of the user to be tested and the credit features of the first sample user; further, the credit characteristics of the user to be detected and the credit characteristics of the first sample user can be calculated by using the similarity calculation model, and the similarity between the user to be detected and the first sample user is calculated.

Further, in step 104, the risk attribute of the user to be tested may be determined according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user.

In this embodiment, feature extraction may be performed on the credit application data of the user to be tested and the credit application data of the sample user, respectively, to determine the credit features of the user to be tested and the credit features of the sample user; calculating the similarity between the user to be tested and the sample user according to the credit characteristics of the user to be tested and the credit characteristics of the sample user; and then, determining the risk attribute of the user to be detected according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the user to be detected, the similarity between the user to be detected and the sample user is introduced, and the reason why the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user, so that the credit risk prediction has interpretability, and the requirement of a credit wind control scene on the interpretability can be met.

In the embodiment of the present application, a specific implementation manner of determining the risk attribute of the user to be tested according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user is not limited. In some embodiments, as shown in fig. 2, the similarity between the user to be tested and the first sample user may be normalized to obtain the weight of the risk attribute of the first sample user, such as w1, w2, …, wn shown in fig. 2. Where wi represents the weight of the risk attribute of the sample user xi. Optionally, the similarity between the user to be tested and the first sample user may be normalized by using an activation function. Fig. 2 illustrates only the activation function as the softmax function as an example, but is not limited thereto. Further, the risk attributes of the first sample user can be weighted and summed by using the weight of the risk attributes of the first sample user, so as to obtain the risk value of the user to be tested. For example, in fig. 2, the risk value = (w1 × y1+ w2 × y2+ … + wn × yn)/n for the user under test. And then, determining the risk attribute of the user to be tested according to the risk value of the user to be tested.

Optionally, if the risk value of the user to be detected is greater than or equal to the set risk threshold, determining that the risk attribute of the user to be detected is overdue risk; correspondingly, if the risk value of the user to be detected is smaller than the set risk threshold, determining that the risk attribute of the user to be detected is free of overdue risk. In fig. 2, yt represents the risk attribute of the user xi to be tested. Optionally, yt =1 indicates that the risk attribute of the user to be tested is overdue risk; yt =0 indicates that the risk attribute of the user to be tested is no overdue risk.

Further, after the risk attribute of the user to be tested is determined, the risk attribute of the user to be tested can be output. The credit risk determination method described above may be performed by any computer device. In some embodiments, the computer device may be a terminal device or a server device of a credit agency. Accordingly, the risk attribute of the user to be tested can be displayed, and/or the risk attribute of the user to be tested can be played in a voice mode. In other embodiments, the credit risk prediction method provided in the embodiments of the present application may be deployed in a cloud and implemented as a software as a service (SaaS) product. After the risk attribute of the user to be tested is determined, the risk attribute of the user to be tested can be provided to the terminal initiating the credit risk prediction request. The terminal can receive the risk attribute of the user to be tested, display the risk attribute of the user to be tested, and/or play the risk attribute of the user to be tested in a voice mode.

Because the similarity between the user to be detected and the sample user is introduced when the credit risk prediction is carried out on the user to be detected, the reason that the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user. In the embodiment of the application, in order to determine the reason why the user to be tested is determined as the current risk attribute, a target sample user whose similarity with the user to be tested meets the set similarity requirement may be selected from the first sample users according to the similarity between the user to be tested and the first sample users. For example, a sample user with a similarity greater than or equal to a set similarity threshold with the user to be tested is selected from the first sample users as a target sample user. Or, a set number of target sample users may be selected from the first sample users according to the descending order of similarity between the target sample users and the users to be detected.

Because the similarity between the target sample user and the user to be detected is high, the characteristic information of the target sample user can reflect the characteristics of the user to be detected to a certain extent. Therefore, influence factors of the risk attribute of the user to be tested can be analyzed according to the credit application data of the target sample user.

Further, after determining the influence factors of the risk attributes of the users to be tested, the influence factors of the risk attributes of the users to be tested can be output. The credit risk determination method described above may be performed by any computer device. In some embodiments, the computer device may be a terminal device or a server device of a credit agency. Accordingly, the influence factors of the risk attribute of the user to be tested can be displayed, and/or the influence factors of the risk attribute of the user to be tested can be played in a voice mode. In other embodiments, the credit risk prediction method provided in the embodiments of the present application may be deployed in a cloud and implemented as a software as a service (SaaS) product. After determining the influence factors of the risk attribute of the user to be tested, the influence factors of the risk attribute of the user to be tested can be provided to the terminal initiating the credit risk prediction request. The terminal can receive the influence factors of the risk attributes of the user to be tested, display the influence factors of the risk attributes of the user to be tested, and/or play the influence factors of the risk attributes of the user to be tested in a voice mode.

It should be noted that the credit risk determination method provided by the embodiment of the present application is applicable to any application scenario having interpretable requirements on the prediction result, in addition to the credit wind control scenario. The following provides an exemplary description of a data processing method provided in an embodiment of the present application.

Fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present application. As shown in fig. 5, the data processing method includes:

501. and acquiring the production data of the object to be detected and the production data of the sample object with known quality attribute.

502. And respectively extracting the characteristics of the production data of the object to be detected and the production data of the sample object to determine the production characteristics of the object to be detected and the sample object.

503. And calculating the similarity between the object to be detected and the sample object according to the production characteristics of the object to be detected and the production characteristics of the sample object.

504. And determining the quality attribute of the object to be detected according to the similarity between the object to be detected and the sample object and the quality attribute of the sample object.

The data processing method provided by the embodiment can be suitable for any application scenes requiring interpretability of processing results. The application scenes are different, the objects to be detected are different, and the production data are different. For example, in a medical rationality analysis scenario, the object under test may be a patient under test and the quality attribute may be implemented as medical rationality (reasonable or unreasonable). The sample object may be a historic patient or the like for which the medical rationality is known. Correspondingly, the production data are medical record data, medical insurance data and the like of the patient. For another example, in the operation state monitoring scene of the industrial object, the object to be measured is the industrial object to be measured, and the quality attribute may be implemented as an operation state attribute (good or abnormal). Accordingly, the production data may be operating parameters of the industrial object, and the like. For another example, in a quality monitoring scenario of an industrial product, the object to be measured is an industrial product to be measured (such as a tire, a glass product, and the like), and the quality attribute may be implemented as good or abnormal. Accordingly, the production data may be image data or other characteristic data of the object to be measured, or the like.

In this embodiment, in order to predict the quality attribute of the object to be tested, in step 501, the production data of the object to be tested may be acquired. In this embodiment, in order to predict credit risk of the object to be measured, in step 501, production data of a sample object with known quality attribute may also be acquired. Wherein the quality attributes of the sample objects may be the same or may be partially the same. Preferably, the quality attributes of the sample objects are partly identical and partly different, i.e. the quality attributes of the sample objects comprise: good quality and abnormal quality.

Further, in step 502, feature extraction may be performed on the production data of the object to be tested and the production data of the sample object, respectively, to obtain the production features of the object to be tested and the production features of the sample object. Alternatively, the same feature extraction model may be used for feature extraction on the production data of the object to be measured and the production data of the sample object.

Alternatively, the production data of the object to be tested and the production data of the sample object may be input into the twin network model. In the twin network model, feature extraction may be performed on the production data of the object to be measured and the production data of the sample object, respectively, to determine the production features of the object to be measured and the production features of the sample object.

Further, in step 503, the similarity between the object to be measured and the sample object may be calculated according to the production characteristics of the object to be measured and the production characteristics of the sample object.

Optionally, the production characteristics of the object to be measured and the production characteristics of the first sample object may be input into the similarity calculation model; in the similarity calculation model, a distance between the production characteristic of the object to be measured and the production characteristic of the sample object is calculated. Determining the distance between the production characteristics of the object to be detected and the production characteristics of the sample object as the similarity between the object to be detected and the sample object; the smaller the distance between the production characteristic of the object to be measured and the production characteristic of the sample object is, the greater the similarity between the similarities of the object to be measured and the sample object is.

Production data for sample objects with known quality attributes may be obtained. In the embodiment of the present application, in order to distinguish the sample object participating in the credit risk prediction of the object to be measured, the sample object participating in the credit risk prediction of the object to be measured in fig. 5 is defined as a first sample object; the sample object of the present embodiment participating in model training is defined as a second sample object. The second sample object may or may not include the first sample object. The quality attributes of the second sample object include: good quality and abnormal quality. In each training iteration process, the obtaining mode of the second sample object may refer to the obtaining mode of the second sample user, which is not described herein again.

After 2N second sample objects are obtained during each training iteration, a plurality of sample pairs may be randomly determined using the second sample objects. Each sample pair includes: 2 second sample objects. Different sample pairs contain sample objects that differ by at least one user. For the above embodiment of 2N second sample objects, the maximum composition can be achieved

After obtaining a plurality of sample pairs, the sample label of each sample pair may also be calculated according to the quality attribute of the second sample object included in the sample pair. In some embodiments, the quality attributes of the two second sample objects included in each sample pair may be subjected to an exclusive or calculation, so as to obtain an exclusive or result of the quality attributes corresponding to the sample pair; and further negating the exclusive or result of the quality attribute corresponding to each sample pair to obtain a sample label of the sample pair.

After the twin network model and the similarity calculation model are obtained, the twin network model can be used for carrying out feature extraction on the production data of the object to be detected and the production data of the first sample object to obtain the production features of the object to be detected and the production features of the first sample object; further, the similarity calculation model can be used for calculating the production characteristics of the object to be detected and the first sample object, and calculating the similarity between the object to be detected and the first sample object.

Further, in step 504, the quality attribute of the object to be tested may be determined according to the similarity between the object to be tested and the first sample object and the quality attribute of the first sample object.

In this embodiment, feature extraction may be performed on the production data of the object to be measured and the production data of the sample object, respectively, to determine the production features of the object to be measured and the production features of the sample object; calculating the similarity between the object to be detected and the sample object according to the production characteristics of the object to be detected and the production characteristics of the sample object; then, the quality attribute of the object to be measured can be determined according to the similarity between the object to be measured and the sample object and the quality attribute of the sample object. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the object to be detected, the similarity between the object to be detected and the sample object is introduced, and the reason why the object to be detected is determined as the current quality attribute can be explained according to the similarity between the object to be detected and the sample object, so that the quality prediction result of the object to be detected has interpretability.

In the embodiment of the present application, the similarity between the object to be measured and the first sample object may be normalized to obtain the weight of the quality attribute of the first sample object. Further, the weight of the quality attribute of the first sample object can be utilized to perform weighted summation on the quality attribute of the first sample object, so as to obtain a risk value of the object to be measured. And then, determining the quality attribute of the object to be detected according to the risk value of the object to be detected.

Optionally, if the risk value of the object to be detected is greater than or equal to the set risk threshold, determining that the quality attribute of the object to be detected is overdue risk; correspondingly, if the risk value of the object to be detected is smaller than the set risk threshold value, the quality attribute of the object to be detected is determined to be free of overdue risk. Further, after the quality attribute of the object to be detected is determined, the quality attribute of the object to be detected can be output. For the implementation of outputting the quality attribute of the object to be tested, reference may be made to the above-mentioned related contents of outputting the risk attribute of the user to be tested, which are not described herein again.

Because the similarity between the object to be detected and the sample object is introduced when the quality of the object to be detected is predicted, the reason that the object to be detected is determined as the current quality attribute can be explained according to the similarity between the object to be detected and the sample object. In the embodiment of the present application, in order to determine the reason why the object to be measured is determined as the current quality attribute, a target sample object whose similarity with the object to be measured meets the set similarity requirement may be selected from the first sample object according to the similarity between the object to be measured and the first sample object. For example, a sample object with a similarity greater than or equal to a set similarity threshold with the object to be measured is selected from the first sample objects as a target sample object. Alternatively, a set number of target sample objects may be selected from the first sample objects in descending order of similarity to the object to be measured.

Because the similarity between the target sample object and the object to be measured is high, the characteristic information of the target sample object can reflect the characteristics of the object to be measured to a certain extent. Therefore, the influence factors of the quality attribute of the object to be measured can be analyzed according to the production data of the target sample object.

Further, after determining the influence factors of the quality attributes of the object to be detected, the influence factors of the quality attributes of the object to be detected can be output. For the implementation of outputting the influence factors of the quality attribute of the object to be tested, reference may be made to the above-mentioned related contents of outputting the risk attribute of the user to be tested, which are not described herein again.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subject of steps 101 and 102 may be device a; for another example, the execution subject of step 101 may be device a, and the execution subject of step 102 may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the sequence numbers of the operations, such as 101, 102, etc., are merely used for distinguishing different operations, and the sequence numbers do not represent any execution order per se. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the credit risk determination method and/or the data processing method described above.

An embodiment of the present application further provides a computer program product, including: a computer program. The computer program, when executed by one or more processors, may perform the steps in the credit risk determination method and/or the data processing method described above. In the embodiments of the present application, a specific implementation form of the computer program product is not limited. In some embodiments, the computer program product may be implemented as application software for credit risk prediction by a credit agency, as Web application software, or as SaaS software, among others.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 6, the computer apparatus includes: a memory 60a and a processor 60 b. The memory 60a is used for storing computer programs.

The processor 60b is coupled to the memory 60a for executing a computer program for performing: acquiring credit application data of a user to be tested and credit application data of a first sample user with known risk attributes; respectively extracting the characteristics of the credit application data of the user to be tested and the credit application data of the first sample user to determine the credit characteristics of the user to be tested and the credit characteristics of the first sample user; calculating the similarity between the user to be detected and the first sample user according to the credit characteristics of the user to be detected and the credit characteristics of the first sample user; and determining the risk attribute of the user to be tested according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user.

In some embodiments, the processor 60b, when determining the risk attribute of the user to be tested, is specifically configured to: carrying out normalization processing on the similarity between the user to be detected and the first sample user to obtain the weight of the risk attribute of the first sample user; weighting and summing the risk attributes of the first sample user by using the weight of the risk attributes of the first sample user to obtain a risk value of the user to be tested; and determining the risk attribute of the user to be tested according to the risk value of the user to be tested.

Further, when determining the risk attribute of the user to be tested, the processor 60b is specifically configured to: and if the risk value of the user to be detected is greater than or equal to the set risk threshold, determining that the risk attribute of the user to be detected is overdue risk.

Optionally, the processor 60b is further configured to: selecting a target sample user with the similarity meeting the set similarity requirement with the similarity between the target sample user and the user to be detected from the first sample users according to the similarity between the user to be detected and the first sample user; and analyzing the influence factors of the risk attribute of the user to be detected according to the credit application data of the target sample user.

In other embodiments, the processor 60b is specifically configured to, when performing feature extraction on the credit application data of the user to be tested and the credit application data of the first sample user respectively: inputting the credit application data of the user to be tested and the credit application data of the first sample user into the twin network model; in the twin network model, feature extraction is respectively carried out on the credit application data of the user to be tested and the credit application data of the first sample user so as to determine the credit features of the user to be tested and the credit features of the first sample user.

Optionally, when the processor 60b calculates the similarity between the user to be tested and the first sample user, it is specifically configured to: inputting the credit characteristics of the user to be tested and the credit characteristics of the first sample user into the similarity calculation model; calculating the distance between the credit feature of the user to be detected and the credit feature of the first sample user in the similarity calculation model; determining the distance between the credit feature of the user to be tested and the credit feature of the first sample user as the similarity between the user to be tested and the first sample user; the smaller the distance between the credit features of the user to be tested and the credit features of the first sample user is, the greater the similarity between the similarity of the user to be tested and the first sample user is.

In some embodiments, the processor 60b is further configured to: obtaining credit application data of a second sample user with known risk attributes; the risk attributes of the second sample user include: overdue risk and no overdue risk exist; randomly determining a plurality of sample pairs by using a second sample user; each sample pair comprises two second sample users; different sample pairs contain sample users that are at least one different; calculating a sample label of each sample pair according to the risk attribute of the second sample user contained in the sample pair; training the twin network model and the similarity calculation model by using a plurality of samples by taking the minimum loss function as a training target; wherein the loss function is determined according to a difference between the similarity probability output by the similarity calculation model and the sample label.

Optionally, the processor 60b, when obtaining credit application data of a second sample user for which the risk attribute is known, is specifically configured to: obtaining credit application data of a third sample user with known risk attributes; the third sample user includes: a second sample user; (2 na) positive sample users with overdue risks are selected from the third sample users with returned selected risk attributes; selecting [2N (1-a) ] negative sample users with the risk attributes without overdue risks from the third sample users; determining credit application data corresponding to (2N x a) positive sample users and [2N x (1-a) ] negative sample users as credit application data of a second sample user; wherein N is a positive integer, and a is a hyper-parameter.

Optionally, the processor 60b, when calculating the sample label of the sample pair, is specifically configured to: performing XOR calculation on the risk attributes of the second sample user included in each sample pair to obtain the XOR result of the corresponding risk attribute of the sample pair; and negating the exclusive or result of the risk attribute of each sample pair to obtain a sample label of the sample pair.

In the computer device provided by the embodiment, the feature extraction can be respectively carried out on the credit application data of the user to be tested and the credit application data of the sample user, and the credit feature of the user to be tested and the credit feature of the sample user are determined; calculating the similarity between the user to be tested and the sample user according to the credit characteristics of the user to be tested and the credit characteristics of the sample user; and then, determining the risk attribute of the user to be detected according to the similarity between the user to be detected and the sample user and the risk attribute of the sample user. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the user to be detected, the similarity between the user to be detected and the sample user is introduced, and the reason why the user to be detected is determined as the current risk attribute can be explained according to the similarity between the user to be detected and the sample user, so that the credit risk prediction has interpretability, and the requirement of a credit wind control scene on the interpretability can be met.

In some embodiments of the present application, the processor 60b is further configured to: acquiring production data of an object to be detected and production data of a sample object with known quality attributes; respectively extracting characteristics of the production data of the object to be detected and the production data of the sample object to determine the production characteristics of the object to be detected and the production characteristics of the sample object; calculating the similarity between the object to be detected and the sample object according to the production characteristics of the object to be detected and the production characteristics of the sample object; and determining the quality attribute of the object to be detected according to the similarity between the object to be detected and the sample object and the quality attribute of the sample object.

The computer device provided by this embodiment may further perform feature extraction on the production data of the object to be detected and the production data of the sample object, respectively, to determine the production features of the object to be detected and the production features of the sample object; calculating the similarity between the object to be detected and the sample object according to the production characteristics of the object to be detected and the production characteristics of the sample object; then, the quality attribute of the object to be measured can be determined according to the similarity between the object to be measured and the sample object and the quality attribute of the sample object. According to the credit risk determining method provided by the embodiment of the application, when the credit risk prediction is performed on the object to be detected, the similarity between the object to be detected and the sample object is introduced, and the reason why the object to be detected is determined as the current quality attribute can be explained according to the similarity between the object to be detected and the sample object, so that the quality prediction result of the object to be detected has interpretability.

In some optional embodiments, as shown in fig. 6, the computer device further comprises: communication component 60c, power component 60d, etc. In some embodiments, the computer device may be implemented as a computer, a mobile phone, or other terminal device. Accordingly, the computer device may further include: a display component 60e, an audio component 60f, etc. Only some of the components shown in fig. 6 are schematically shown, and it is not meant that the computer device must include all of the components shown in fig. 6, nor that the computer device only includes the components shown in fig. 6.

In embodiments of the present application, the memory is used to store computer programs and may be configured to store other various data to support operations on the device on which it is located. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

In the embodiments of the present application, the processor may be any hardware processing device that can execute the above described method logic. Alternatively, the processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Micro Controller Unit (MCU); programmable devices such as Field-Programmable Gate arrays (FPGAs), Programmable Array Logic devices (PALs), General Array Logic devices (GAL), Complex Programmable Logic Devices (CPLDs), etc. may also be used; or Advanced Reduced Instruction Set (RISC) processors (ARM), or System On Chips (SOC), etc., but is not limited thereto.

In embodiments of the present application, the communication component is configured to facilitate wired or wireless communication between the device in which it is located and other devices. The device in which the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, 4G, 5G or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may also be implemented based on Near Field Communication (NFC) technology, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.

In the embodiment of the present application, the display assembly may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display assembly includes a touch panel, the display assembly may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

In embodiments of the present application, a power supply component is configured to provide power to various components of the device in which it is located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

In embodiments of the present application, the audio component may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for devices with language interaction functionality, voice interaction with a user may be enabled through an audio component, and so forth.

It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A credit risk determination method, comprising:

determining the risk attribute of the user to be tested according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user;

wherein, the credit application data of the user to be tested and the credit application data of the first sample user are respectively subjected to feature extraction to determine the credit features of the user to be tested and the credit features of the first sample user, and the method comprises the following steps:

inputting the credit application data of the user to be tested and the credit application data of the first sample user into a twin network model;

in the twin network model, feature extraction is respectively carried out on the credit application data of the user to be tested and the credit application data of the first sample user so as to determine the credit features of the user to be tested and the credit features of the first sample user.

2. The method according to claim 1, wherein the determining the risk attribute of the user to be tested according to the similarity between the user to be tested and the first sample user and the risk attribute of the first sample user comprises:

normalizing the similarity between the user to be detected and the first sample user to obtain the weight of the risk attribute of the first sample user;

weighting and summing the risk attributes of the first sample user by using the weight of the risk attributes of the first sample user to obtain a risk value of the user to be tested;

and determining the risk attribute of the user to be tested according to the risk value of the user to be tested.

3. The method according to claim 2, wherein the determining the risk attribute of the user to be tested according to the risk value of the user to be tested comprises:

and if the risk value of the user to be detected is greater than or equal to the set risk threshold, determining that the risk attribute of the user to be detected is overdue risk.

4. The method of claim 1, further comprising:

selecting a target sample user with the similarity meeting a set similarity requirement with the similarity between the target sample user and the user to be detected from the first sample user according to the similarity between the user to be detected and the first sample user;

and analyzing the influence factors of the risk attribute of the user to be detected according to the credit application data of the target sample user.

5. The method according to claim 1, wherein the calculating the similarity between the user to be tested and the first sample user according to the credit characteristics of the user to be tested and the credit characteristics of the first sample user comprises:

inputting the credit characteristics of the user to be tested and the credit characteristics of the first sample user into a similarity calculation model;

calculating the distance between the credit features of the user to be tested and the credit features of the first sample user in the similarity calculation model;

determining the distance between the credit features of the user to be tested and the credit features of the first sample user as the similarity between the user to be tested and the first sample user;

the smaller the distance between the credit features of the user to be tested and the credit features of the first sample user is, the greater the similarity between the similarity of the user to be tested and the first sample user is.

6. The method of claim 5, further comprising:

obtaining credit application data of a second sample user with known risk attributes; the risk attributes of the second sample user include: overdue risk and no overdue risk exist;

randomly determining a plurality of sample pairs using the second sample user; each sample pair comprises two second sample users; different sample pairs contain sample users that are at least one different;

calculating a sample label of each sample pair according to the risk attribute of the second sample user contained in the sample pair;

training a twin network model and a similarity calculation model by using the plurality of samples with a minimized loss function as a training target;

wherein the loss function is determined according to a difference between the similarity probability output by the similarity calculation model and the sample label.

7. The method according to claim 6, wherein the obtaining credit application data for a second sample user for which the risk attribute is known:

obtaining credit application data of a third sample user with known risk attributes; the third sample user includes: a second sample user;

selecting a risk attribute from the third sample of users that has been replaced as having an overdue risk

Individual positive sample users;

selecting risk attribute from the third sample of users that has been replaced as being free of overdue risk

A negative example user;

determining the

A positive sample user and

the credit application data corresponding to the negative sample user is the credit application data of the second sample user;

wherein, N is a positive integer,

is a hyper-parameter.

8. A data processing method, comprising:

determining the quality attribute of the object to be detected according to the similarity between the object to be detected and the sample object and the quality attribute of the sample object;

wherein, respectively performing feature extraction on the production data of the object to be detected and the production data of the sample object to determine the production features of the object to be detected and the sample object, comprises:

inputting the production data of the object to be detected and the production data of the sample object into a twin network model;

in the twin network model, feature extraction is respectively carried out on the production data of the object to be detected and the production data of the sample object so as to determine the production features of the object to be detected and the sample object.

9. A computer device, comprising: a memory and a processor; wherein the memory is used for storing a computer program;

the processor is coupled to the memory for executing the computer program for performing the steps of the method of any of claims 1-8.

10. A computer-readable storage medium having stored thereon computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of any one of claims 1-8.