CN116205726B

CN116205726B - Loan risk prediction method and device, electronic equipment and storage medium

Info

Publication number: CN116205726B
Application number: CN202310474449.0A
Authority: CN
Inventors: 甘元笛; 刘洪江; 任晓东; 陈昱任; 吕文勇; 周智杰
Original assignee: Chengdu New Hope Finance Information Co Ltd
Current assignee: Chengdu New Hope Finance Information Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-01
Anticipated expiration: 2043-04-28
Also published as: CN116205726A

Abstract

The application provides a loan risk prediction method, a loan risk prediction device, electronic equipment and a storage medium, wherein the loan risk prediction method comprises the following steps: acquiring user data, wherein the user data comprises unstructured user data and structured user data; inputting user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data. The first sub-model is used for extracting information in unstructured data, and the risk prediction result is obtained by inputting the data characteristics output by the first sub-model into the second sub-model. The method improves the information utilization rate in unstructured data, effectively utilizes the interpretability of logistic regression or integrated decision trees, and improves the risk assessment and prediction capability of a risk prediction model.

Description

Loan risk prediction method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a loan risk prediction method, a loan risk prediction device, electronic equipment and a storage medium.

Background

The loan risk control refers to predicting the risk that a borrower may have a repayment problem by analyzing credit data and repayment behaviors of the borrower through an algorithm in the loan repayment process. Since loan traffic forms are numerous and risk forms are diverse, risk control is the core basis for such traffic. At present, most enterprises perform risk control through a risk policy model or manually, and analyze and sort information filled in by a user during borrowing so as to predict loan risks. This approach does not fully exploit the characteristics and information of the data, resulting in lower prediction accuracy.

Disclosure of Invention

The embodiment of the invention aims at providing a loan risk prediction method, a loan risk prediction device, electronic equipment and a storage medium, which utilize flexibly selected data with different structures to train a risk prediction model, expand the dimension of characteristics and improve the prediction capability of the risk prediction model.

In a first aspect, an embodiment of the present application provides a loan risk prediction method, including: acquiring user data, wherein the user data comprises unstructured user data and structured user data; inputting user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data.

In the implementation process, the risk prediction model comprises a first sub-model and a second sub-model, wherein the first sub-model is used for extracting information in unstructured data, and data features output through the first sub-model are input into the second sub-model to obtain a risk prediction result. The method improves the information utilization rate in unstructured data, effectively utilizes the interpretability of logistic regression or integrated decision trees, and improves the risk assessment and prediction capability of a risk prediction model.

Optionally, in an embodiment of the present application, the unstructured training data includes first unstructured training data and second unstructured training data; before inputting the user data into the preset risk prediction model to obtain the risk prediction result, the method further comprises the following steps: training a preset neural network through first unstructured training data to obtain a first sub-model; inputting the second unstructured training data into the first sub-model to obtain data characteristics; and training the preset meta model through the data characteristics and the structured training data to obtain a second sub model.

In the implementation process, the risk prediction model comprises a first sub-model and a second sub-model, the unstructured training data is utilized to train the first sub-model, the structured training data is utilized to train the second sub-model, the data with different structures can be flexibly selected to train the risk prediction model according to the wind control business requirement, the prediction capability of the risk prediction model is improved, and accurate risk prediction under the relatively complex condition is realized.

Optionally, in this embodiment of the present application, training, through the first unstructured training data, a preset neural network to obtain a first sub-model includes: obtaining a vector sequence based on the first unstructured training data; adding corresponding labels to the vector sequence; and training the neural network through the vector sequence after adding the label to obtain a first sub-model.

In the implementation process, training is performed on a preset neural network through the first unstructured training data to obtain a first sub-model. The neural network is used for fully utilizing the high-dimensional data, and the dimension of the features is expanded on the basis of the original structural features. The data range of the risk prediction model can be greatly enlarged, and the accuracy of risk prediction is improved.

Optionally, in an embodiment of the present application, the first unstructured training data includes event sequence data; the vector sequence comprises a sequence of event vectors; based on the first unstructured training data, a vector sequence is obtained, comprising: acquiring attribute information of event sequence data; acquiring derivative attribute information of the event sequence data based on attribute information corresponding to the event sequence data and attribute information corresponding to the previous event sequence data; splicing the attribute information of the event sequence data and the derived attribute information to generate a feature vector of the event sequence data; and splicing the feature vectors of each event sequence data according to the time sequence to obtain an event vector sequence.

In the implementation process, the event sequence data is analyzed by collecting and integrating various sources and different types of data, information hidden in the data is found, multidimensional evaluation of clients is realized, and the accuracy of the model is improved.

Optionally, in an embodiment of the present application, the vector sequence includes a behavior vector sequence; based on the first unstructured training data, a vector sequence is obtained, comprising: obtaining behavior time information corresponding to the behavior sequence data; and splicing the behavior sequence data according to the behavior time information to obtain a behavior vector sequence.

In the implementation process, the data of the high-dimensional complex structure such as behavior sequence data are deeply utilized, and the applicability and accuracy of the model are improved.

Optionally, in an embodiment of the present application, training the preset meta-model through the data feature and the structured training data to obtain the second sub-model includes: generating structural training data features based on the structural training data through a preset feature generation rule; adding the data features and the structured training data features to a feature pool; screening the data features and the structured training data features in the feature pool to obtain modeling features; and training the meta model through the in-mold feature to obtain a second sub model.

In the implementation process, the first sub-model and the second sub-model are trained step by step, and the risk prediction model is a stacked model fused with the first sub-model and the second sub-model; meanwhile, the second sub-model integrates the feature with the interpretability based on rule derivation, so that the model maintains a certain degree of interpretability, and the accuracy of predicting risks by the risk prediction model is improved.

Optionally, in the embodiment of the present application, inputting the user data into a preset risk prediction model to obtain a risk prediction result includes: inputting unstructured user data into a first sub-model to obtain unstructured user data characteristics; generating structured user data features based on the structured user data; splicing the unstructured user data features and the structured user data features to generate splicing features; and inputting the spliced characteristic into a second sub-model to obtain a risk prediction result.

In the implementation process, splicing the unstructured user data features and the structured user data features to generate splicing features; and inputting the spliced characteristic into a second sub-model to obtain a risk prediction result. The unstructured user data features are fully utilized, and the dimension of the features is expanded on the basis of the original structured features. The comprehensive risk of credit and fraud is precisely controlled.

In a second aspect, an embodiment of the present application further provides a loan risk prediction apparatus, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring user data, and the user data comprises unstructured user data and structured user data; the prediction module is used for inputting the user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor and a memory storing machine-readable instructions executable by the processor to perform the method as described above when executed by the processor.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the method described above.

By adopting the loan risk prediction method, the loan risk prediction device, the electronic equipment and the storage medium, the risk prediction model comprises the first sub-model and the second sub-model, the first sub-model is used for information extraction in unstructured data, and the risk prediction result is obtained by inputting the data characteristics output by the first sub-model into the second sub-model. The method has the advantages that the information utilization rate in unstructured data is improved, meanwhile, the interpretability of logistic regression or integrated decision trees is effectively utilized, the risk assessment and prediction capacity of a risk prediction model is improved, and accurate risk prediction is achieved under the condition of being complex.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a loan risk prediction method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a risk prediction model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a loan risk prediction apparatus according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the technical solutions of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present application, and thus are only examples, and are not intended to limit the scope of protection of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In the description of the embodiments of the present application, the technical terms "first," "second," etc. are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present application, the meaning of "plurality" is two or more unless explicitly defined otherwise.

Existing risk prediction models for assessing user loan risk use algorithms that are based on logistic regression and integrated decision trees. For the establishment of the risk prediction model, the original data is subjected to feature derivation based on feature generation rules to generate data features corresponding to the original data, and then the model is established by using the data features. This approach makes it difficult to fully exploit high-dimensional unstructured data such as images, text, sequences, etc. Based on the characteristics generated by the characteristic generation rules, only a small amount of information in unstructured data can be extracted, and the information in high-dimensional data is difficult to cover in an all-around manner, so that the prediction accuracy of a risk prediction model mainly comprising logistic regression and an integrated decision tree is low.

The deep learning model based on the deep neural network is more suitable for extracting information from high-dimensional data, namely unstructured data. However, if the deep learning model is adopted to replace the logistic regression or the integrated decision tree in the existing risk prediction model, the neural network has a significantly lower interpretability than the logistic regression and the decision tree, and therefore, the prediction accuracy of the risk prediction model is also lower.

In the prior art, although a neural network is used in the anti-fraud model, high-dimensional data such as images are utilized, the application scene is narrow and scattered, and the identification of whether fraud scenes such as other persons for carrying out the loan transaction, non-self operation or other illegal operations exist or not is limited. The neural network in the anti-fraud model of the prior art cannot be effectively combined with the logistic regression or the integrated decision tree in the credit model, so that the repayment capability, overdue risk and the like of the user are predicted.

In order to achieve comprehensive assessment on clients, the interpretability of a logistic regression or an integrated decision tree is effectively utilized while the information utilization rate in unstructured data is improved, and the risk assessment and prediction capacity of a risk prediction model are improved.

Please refer to fig. 1, which illustrates a flowchart of a loan risk prediction method provided in an embodiment of the present application. The loan risk prediction method provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can comprise a terminal and a server; the terminal can be a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assitant, PDA) and the like; the server may be an application server or a Web server. The loan risk prediction method may include the steps of:

step S110: user data is acquired, the user data including unstructured user data and structured user data.

Step S120: inputting user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data.

In step S110, the user typically performs operations through credit product client software in the terminal device, such as filling in personal information and performing identity verification in a loan APP (Application) installed in a mobile phone, a tablet computer or a computer, when performing the transaction of a loan service. Thus, the user data can be collected by the terminal device after the user authorization is obtained.

Unstructured user data may be high-dimensional data that is not suitable for expression and implementation by a database two-dimensional table. By way of example, unstructured user data may include image and video class data, text class data, sequence data, signal data, and the like. In loan risk prediction scenarios, image and video type data such as live face verification video, identification card photographs, etc.; text-like data such as application information filled in by a user, etc.; the sequence data is, for example, various page operation data, behavior operation data and the like obtained through the terminal equipment; signal data such as sound signals and sensor signals, etc. Unstructured user data collected by embodiments of the present application may be one or more of the above listed data, as well as other unstructured data.

The structured user data may be data logically expressed and implemented by a two-dimensional table structure, which may be stored and managed by a relational database. Illustratively, the structured user data includes user personal information, device information corresponding to a terminal device that collects the user data, and the like.

In step S120, the risk prediction model includes a first sub-model and a second sub-model, and the risk prediction model may be understood as a stacked model composed of the first sub-model and the second sub-model.

The first sub-model may be a deep learning model based on a deep neural network; the first sub-model is used for extracting data characteristic information of unstructured user data. Each type of unstructured user data may have a corresponding type of first sub-model, e.g., if the unstructured user data is image and video type data, the first sub-model may be to build a computer vision model using CNN; if the unstructured user data is text-like data, the first sub-model may be a Natural Language Processing (NLP) model; if the unstructured user data is sequence data, the first sub-model may be a sequence processing model established using LSTM; if the unstructured user data is signal data, the first sub-model may be a signal processing model built using a transducer.

The second sub-model comprises a logistic regression model or an integrated decision tree, such as in particular GBDT (Gradient Boosting Decision Tree, gradient-lifting decision tree) or a generalized linear regression model, etc. The second sub-model is used for carrying out risk prediction on the data features of the unstructured user data extracted according to the first sub-model and the structured features generated based on the feature derivation rule, and obtaining a risk prediction result.

The process of obtaining the risk prediction result by using the risk prediction model may be: inputting unstructured user data in the user data into a first sub-model corresponding to the unstructured user data to obtain data characteristics; based on the feature derivation rule, generating corresponding structured features according to the structured user data; and taking the data features and the structured features as the input of a second sub-model, and inputting the data features and the structured features into the second sub-model to obtain a risk prediction result. The risk prediction result may be a user credit score, fraud prediction, income prediction, or a refund overdue risk probability, etc.

In the implementation process, the risk prediction model comprises a first sub-model and a second sub-model, wherein the first sub-model is used for extracting characteristic information in unstructured data, and a risk prediction result is obtained by inputting data characteristics output by the first sub-model into the second sub-model. The method improves the information utilization rate in unstructured data, effectively utilizes the interpretability of the structured data, and improves the risk assessment and prediction capability of a risk prediction model.

In the specific implementation process: the data characteristics output by the neural network encoder in the first sub-model enter a logistic regression model or an integrated decision tree in the second sub-model to form a stacking model, namely a risk prediction model. However, since the training mode of the neural network is completely different from that of the logistic regression model or the integrated decision tree, the risk prediction model formed by stacking the first sub-model and the second sub-model is difficult to train directly. Taking GBDT as an example, the neural network and the GBDT need iteration, but all parameters of the neural network change when each iteration is performed, and a part of parameters are increased when each iteration is performed, so that the previous parameters are not changed, the neural network and the GBDT are difficult to iterate at the same time, and direct training is not possible.

The embodiment of the application adopts a step-by-step training mode for the first sub-model and the second sub-model to obtain the risk prediction model. For the training process of the first sub-model, specifically, for example, first unstructured training data is collected, where the unstructured training data is collected according to an actual risk prediction scenario. For example, the risk prediction scene is to predict the overdue risk of repayment of the customer, and the collected first unstructured training data may be various data in the collection device collected by the customer terminal, for example, touch behavior on the handheld intelligent device; and various events predefined in the whole credit period, such as registration, living human face verification and other data, can be also used. And training a preset neural network through the acquired first unstructured training data to obtain a first sub-model.

And in the training process of the second sub-model, specifically, for example, the pre-collected second unstructured training data is input into the trained first sub-model, and the first sub-model is used for reasoning the second unstructured training data to obtain data characteristics corresponding to the second unstructured training data.

The second unstructured training data is a new training set relative to the first unstructured training data, and the data type, the acquisition mode and the like of the second unstructured training data can be the same as those of the first unstructured training data. The first sub-model and the second sub-model are respectively trained through different training sets, so that the problem of model overfitting caused by the fact that the same training set is adopted in the training process of the first sub-model and the second sub-model is solved.

After the data features corresponding to the second unstructured training data are obtained, training is carried out on the preset meta-model through the data features and the structured training data and combining with the preset labels, and a second sub-model is obtained. The GBDT or generalized linear regression can be selected as an algorithm of the meta-model to train the meta-model.

The training labels of the metamodel may be the same as or different from the training labels of the neural network in the first submodel. As an embodiment, the tag of the first sub-model may be set as related content of fraud, and the tag of the meta-model may be set as related content of loan repayment, debt default. And the first sub-model is used for extracting the information related to fraud in the data, and then the information is input into the meta-model for training with loan repayment, debt default and the like as labels, so that the second sub-model can establish the connection between fraud features and debt default, and the integration of credit evaluation and anti-fraud is realized.

In the specific implementation process: the process of training the first sub-model may specifically be: based on the first unstructured training data, a vector sequence corresponding to the first unstructured training data is obtained. Wherein the first unstructured training data comprises event sequence data and behavior sequence data; correspondingly, the vector sequence corresponding to the event sequence data is an event vector sequence, and the vector sequence corresponding to the behavior sequence data is a behavior vector sequence.

And carrying out data preprocessing on the acquired first unstructured training data to obtain a vector sequence. And adding corresponding labels to the vector sequence according to the information in the vector sequence, wherein the labels comprise overdue repayment, normal repayment, loan compensation or fraud substitution and the like. And training the neural network through the vector sequence after adding the label to obtain a first sub-model.

The event sequence data and the behavior sequence data can respectively train the corresponding neural networks independently, but use the same label; the corresponding neural networks can be trained independently and respectively, and different labels are used; the neural network corresponding to the event sequence data and the behavior sequence data can be integrated together, and the same label is used for training the multi-mode data.

In the implementation process, training is performed on a preset neural network through the first unstructured training data to obtain a first sub-model. The high-dimensional data is fully utilized through the neural network, and the dimension of the features is expanded on the basis of the original structured features. The data range of the risk prediction model can be greatly enlarged, and the accuracy of risk prediction is improved.

In the specific implementation process: the event sequence data is the information of each event recorded in the whole operation flow when the user operates through credit product client software in the terminal equipment. The event sequence data comprises information corresponding to page operation events such as registration, living body authentication, application, withdrawal and/or repayment. The attribute information of the event sequence data comprises event time information and/or event space information; the event time information is the time when the event occurs, and the event space information is the place where the time occurs, such as GPS positioning data.

And obtaining derivative attribute information of the event sequence data based on the attribute information corresponding to the event sequence data and the attribute information corresponding to the previous event sequence data. The previous event sequence data is arranged according to time sequence, and the event happens before the current event. Based on event time information and/or event space information in the attribute information of the event sequence data, calculating time displacement and/or space displacement of the event sequence data and the previous event sequence data, and taking the time displacement and/or space displacement as derivative attribute information of the event sequence data.

And splicing the attribute information and the derivative attribute information of the event sequence data to generate a feature vector of the current event sequence data, and splicing the feature vector of each event sequence data according to a time sequence to obtain an event vector sequence.

In an alternative embodiment, the attribute information of the event sequence data may further include an event type, and the event sequence data and the type of the previous event sequence data may be changed as derivative attribute information of the event sequence data.

In the implementation process, the event sequence data is analyzed by collecting and integrating various sources and different types of data, the information hidden in the data is deeply mined, the multidimensional evaluation of clients is realized, and the accuracy of the model is improved.

In the specific implementation process: the behavior sequence data comprises information of touch behavior and repayment behavior on the terminal equipment, such as overdue, normal or compensation, repayment mode and the like of specific repayment at each period. The behavior time information corresponding to the behavior sequence data is obtained, and specifically, the time of repayment in each period can be obtained. And splicing the behavior sequence data according to the behavior time information to obtain a behavior vector sequence.

In the specific implementation process: and mapping the structured training data through a preset feature generation rule to generate the features of the structured training data. Feature mapping is the mapping of data to a high-dimensional space.

Adding data features and structured training data features into a feature pool, and carrying out feature analysis and screening on the features in the feature pool, wherein feature screening can be carried out in the following way, namely, in the first way: a filtered approach that filters features based on predefined criteria, such as the correlation of individual features with a target variable or the information gain of individual features. The second way is: wrapped methods, which screen features based on model performance, iteratively eliminate unimportant features, for example, using a recursive feature elimination algorithm. After screening, the retained features are used as the mold entering features; and adding a corresponding label to the model-in feature by utilizing a pre-designed label, selecting GBDT or generalized linear regression as an algorithm of the meta-model, and training the meta-model through the model-in feature to obtain a second sub-model.

In the implementation process, the risk prediction model is a stacked model fused with the first sub-model and the second sub-model; the first sub-model acquires unstructured data features, the second sub-model integrates rule-derived features with interpretability, so that the model maintains a certain degree of interpretability, and the accuracy of risk prediction of the risk prediction model is improved.

Please refer to a schematic diagram of the risk prediction model provided in the embodiment of the present application shown in fig. 2.

In the specific implementation process: as shown in fig. 2, unstructured user data includes event sequence data and behavior sequence data, which may be specifically a repayment behavior sequence and event sequence. Inputting the repayment behavior sequence and the event sequence into an LSTM model neural network encoder in the first sub-model, obtaining unstructured user data features, and adding the unstructured user data features into a feature pool.

The structured user data comprises user personal information and equipment information, a feature derivation rule is obtained based on service experience, the structured user data is characterized, a structured user data feature is generated, and the structured user data feature is added into a feature pool.

The unstructured user data features and the structured user data features added into the feature pool can be subjected to feature screening, and the unstructured user data features and the structured user data features after screening are spliced to generate spliced features. And inputting the spliced features into a second sub-model, and obtaining a risk prediction result through GBDT or generalized linear regression algorithm.

The method can enable the risk prediction model to simultaneously realize unstructured features of neural network coding and structured features related to credit evaluation, and can realize full utilization of data features and simultaneously consider model interpretability.

The label of the first sub-model can be set as related content of fraud, the first sub-model is used for extracting related information of fraud in data, and then the related information is input into the meta-model for training with loan repayment, debt default and the like as labels, and the trained GBDT can establish the connection between fraud features and debt default, so that the integration of credit evaluation and anti-fraud is realized.

In an alternative embodiment, first unstructured training data and second unstructured training data for training a risk prediction model are collected in advance, an LSTM neural network of a first sub-model is trained through the first unstructured training data, the structure of the neural network is divided into an encoder and a pre-measuring head, the data is input into the encoder, feature vectors are output, then the feature vectors enter the pre-measuring head, a predicted value is output, and finally the trained first sub-model is obtained.

Features are extracted using a trained neural network encoder. Specifically, the second unstructured training data is input into a neural network encoder of the first sub-model, and reasoning is carried out to obtain a feature vector.

And obtaining the structured training data, and mapping the structured training data through a preset feature generation rule to generate the features of the structured training data.

And adding the feature vector and the structural training data features into a feature pool, and screening the features in the feature pool to obtain screened features.

And taking the screened characteristics as input data, combining with the designed label, selecting GBDT or generalized linear regression as an algorithm of a meta-model, training the meta-model, and obtaining a second sub-model, thereby completing training of the risk prediction model.

The process of predicting loan risk for a user by a risk prediction model is as follows: data of a user who needs loan risk prediction is obtained, and the user data includes unstructured user data and structured user data. And inputting the unstructured user data into a first sub-model in the trained risk prediction model to obtain unstructured user data characteristics.

And mapping the structured user data based on a preset feature generation rule to generate the structured user data feature.

Feature screening is carried out on unstructured user data features and structured user data features, and the unstructured user data features and the structured user data features after screening are spliced to generate spliced features. And inputting the spliced features into a second sub-model, and obtaining a risk prediction result through GBDT or generalized linear regression algorithm. The risk prediction result may be a user credit score, fraud prediction, income prediction, or a refund overdue risk probability, etc.

Please refer to fig. 3, which illustrates a schematic structural diagram of a loan risk prediction apparatus provided in an embodiment of the present application; the embodiment of the application provides a loan risk prediction device 200, which comprises:

an acquisition module 210, configured to acquire user data, where the user data includes unstructured user data and structured user data;

The prediction module 220 is configured to input user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data.

Optionally, in an embodiment of the present application, the loan risk prediction device, the unstructured training data includes first unstructured training data and second unstructured training data; further comprises: the training module is used for training a preset neural network through the first unstructured training data to obtain a first sub-model; inputting the second unstructured training data into the first sub-model to obtain data characteristics; and training the preset meta model through the data characteristics and the structured training data to obtain a second sub model.

Optionally, in an embodiment of the present application, the loan risk prediction device, the training module, is further configured to obtain a vector sequence based on the first unstructured training data; adding corresponding labels to the vector sequence; and training the neural network through the vector sequence after adding the label to obtain a first sub-model.

Optionally, in an embodiment of the present application, the loan risk prediction device, the first unstructured training data includes event sequence data; the vector sequence comprises a sequence of event vectors; the training module is also used for obtaining attribute information of the event sequence data; acquiring derivative attribute information of the event sequence data based on attribute information corresponding to the event sequence data and attribute information corresponding to the previous event sequence data; splicing the attribute information of the event sequence data and the derived attribute information to generate a feature vector of the event sequence data; and splicing the feature vectors of each event sequence data according to the time sequence to obtain an event vector sequence.

Optionally, in an embodiment of the present application, the loan risk prediction device, the first unstructured training data includes behavior sequence data; the vector sequence comprises a sequence of behavior vectors; the training module is also used for obtaining behavior time information corresponding to the behavior sequence data; and splicing the behavior sequence data according to the behavior time information to obtain a behavior vector sequence.

Optionally, in the embodiment of the present application, the loan risk prediction device, the training module, and the training module are further configured to generate, according to a preset feature generation rule, a feature of the structured training data based on the structured training data; adding the data features and the structured training data features to a feature pool; screening the data features and the structured training data features in the feature pool to obtain modeling features; and training the meta model through the in-mold feature to obtain a second sub model.

Optionally, in an embodiment of the present application, the loan risk prediction device, the prediction module, are specifically configured to input unstructured user data into the first sub-model to obtain unstructured user data features; generating structured user data features based on the structured user data; splicing the unstructured user data features and the structured user data features to generate splicing features; and inputting the spliced characteristic into a second sub-model to obtain a risk prediction result.

It should be understood that, corresponding to the loan risk prediction method embodiment described above, the apparatus can perform the steps related to the method embodiment described above, and specific functions of the apparatus may be referred to the above description, and detailed descriptions thereof are omitted herein as appropriate to avoid redundancy. The device includes at least one software functional module that can be stored in memory in the form of software or firmware (firmware) or cured in an Operating System (OS) of the device.

Please refer to fig. 4, which illustrates a schematic structural diagram of an electronic device provided in an embodiment of the present application. An electronic device 300 provided in an embodiment of the present application includes: a processor 310 and a memory 320, the memory 320 storing machine-readable instructions executable by the processor 310, which when executed by the processor 310 perform the method as described above.

The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, performs a method as above.

The storage medium may be implemented by any type of volatile or nonvolatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM), electrically erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), erasable Programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The foregoing description is merely an optional implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art may easily think about changes or substitutions within the technical scope of the embodiments of the present application, and the changes or substitutions should be covered in the scope of the embodiments of the present application.

Claims

1. A loan risk prediction method, comprising:

obtaining user data, wherein the user data comprises unstructured user data and structured user data;

inputting the user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data;

The unstructured training data includes first unstructured training data and second unstructured training data; before inputting the user data into a preset risk prediction model to obtain a risk prediction result, the method further comprises:

training a preset neural network through the first unstructured training data to obtain the first sub-model;

inputting the second unstructured training data into the first sub-model to obtain the data characteristics;

training a preset meta model through the data characteristics and the structural training data to obtain the second sub model;

training a preset neural network through the first unstructured training data to obtain the first sub-model, wherein the training comprises the following steps:

obtaining a vector sequence based on the first unstructured training data;

adding a corresponding label to the vector sequence;

training the neural network through the vector sequence added with the label to obtain the first sub-model;

the first unstructured training data includes event sequence data; the vector sequence comprises an event vector sequence; based on the first unstructured training data, obtaining a vector sequence comprises:

Obtaining attribute information of the event sequence data; the attribute information of the event sequence data comprises event time information and/or event space information;

acquiring derivative attribute information of the event sequence data based on the attribute information corresponding to the event sequence data and the attribute information corresponding to the previous event sequence data;

splicing the attribute information of the event sequence data and the derivative attribute information to generate a feature vector of the event sequence data;

and splicing the feature vectors of each event sequence data according to the time sequence to obtain the event vector sequence.

2. The method of claim 1, wherein the first unstructured training data comprises behavior sequence data; the vector sequence comprises a behavior vector sequence; based on the first unstructured training data, obtaining a vector sequence comprises:

obtaining behavior time information corresponding to the behavior sequence data;

and splicing the behavior sequence data according to the behavior time information to obtain the behavior vector sequence.

3. The method of claim 1, wherein training a pre-set meta-model through the data features and the structured training data to obtain the second sub-model comprises:

Generating structural training data features based on the structural training data through a preset feature generation rule;

adding the data features and the structured training data features to a feature pool;

screening the data features and the structural training data features in the feature pool to obtain modeling features;

and training the meta model through the modeling feature to obtain the second sub model.

4. A method according to any one of claims 1-3, wherein inputting the user data into a preset risk prediction model to obtain a risk prediction result comprises:

inputting the unstructured user data into the first sub-model to obtain unstructured user data characteristics;

generating structured user data features based on the structured user data;

splicing the unstructured user data features and the structured user data features to generate splicing features;

and inputting the splicing characteristics into the second sub-model to obtain the risk prediction result.

5. A loan risk prediction apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring user data, and the user data comprises unstructured user data and structured user data;

The prediction module is used for inputting the user data into a preset risk prediction model to obtain a risk prediction result; the risk prediction model comprises a first sub-model and a second sub-model; the first sub-model is obtained by training unstructured training data; the second sub-model is obtained by obtaining data features through the first sub-model and training the data features and the structured training data;

the unstructured training data includes first unstructured training data and second unstructured training data; the device further comprises a training module, a first sub-model and a second sub-model, wherein the training module is used for training a preset neural network through the first unstructured training data to obtain the first sub-model; inputting the second unstructured training data into the first sub-model to obtain the data characteristics; training a preset meta model through the data characteristics and the structural training data to obtain the second sub model;

the training module is further configured to obtain a vector sequence based on the first unstructured training data; adding a corresponding label to the vector sequence; training the neural network through the vector sequence added with the label to obtain the first sub-model;

The first unstructured training data includes event sequence data; the vector sequence comprises an event vector sequence; the training module is also used for obtaining attribute information of the event sequence data; the attribute information of the event sequence data comprises event time information and/or event space information; acquiring derivative attribute information of the event sequence data based on the attribute information corresponding to the event sequence data and the attribute information corresponding to the previous event sequence data; splicing the attribute information of the event sequence data and the derivative attribute information to generate a feature vector of the event sequence data; and splicing the feature vectors of each event sequence data according to the time sequence to obtain the event vector sequence.

6. An electronic device, comprising: a processor and a memory storing machine-readable instructions executable by the processor to perform the method of any one of claims 1 to 4 when executed by the processor.

7. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the method according to any of claims 1 to 4.