CN116258579A - Training method of user credit scoring model and user credit scoring method - Google Patents

Training method of user credit scoring model and user credit scoring method Download PDF

Info

Publication number
CN116258579A
CN116258579A CN202310474439.7A CN202310474439A CN116258579A CN 116258579 A CN116258579 A CN 116258579A CN 202310474439 A CN202310474439 A CN 202310474439A CN 116258579 A CN116258579 A CN 116258579A
Authority
CN
China
Prior art keywords
neural network
training
data
model
dimensional data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310474439.7A
Other languages
Chinese (zh)
Other versions
CN116258579B (en
Inventor
刘洪江
甘元笛
任晓东
陈昱任
吕文勇
周智杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu New Hope Finance Information Co Ltd
Original Assignee
Chengdu New Hope Finance Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu New Hope Finance Information Co Ltd filed Critical Chengdu New Hope Finance Information Co Ltd
Priority to CN202310474439.7A priority Critical patent/CN116258579B/en
Publication of CN116258579A publication Critical patent/CN116258579A/en
Application granted granted Critical
Publication of CN116258579B publication Critical patent/CN116258579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a training method of a user credit scoring model and the user credit scoring method, wherein training of the user credit scoring model is divided into two stages of training, a trained neural network encoder is obtained through training in the previous stage, and training of a meta-model (for example, training of GBDT or generalized linear regression) is carried out through feature vectors output by the trained neural network encoder and derived features derived according to original data in the next stage. Therefore, the user credit scoring model obtained through distribution training can fully utilize high-dimensional unstructured data such as images, characters and videos, and improves the accuracy of user credit scoring, so that risks are reduced.

Description

Training method of user credit scoring model and user credit scoring method
Technical Field
The present application relates to the field of credit assessment, and in particular, to a training method for a user credit scoring model and a user credit scoring method.
Background
Retail credit is small, short-term credit directed to a consumer, typically used to purchase a consumer product or service. Retail credit has the characteristics of convenience and rapidness, but at the same time has higher risks. Traditional retail credit risk prediction methods are typically based on a single data source, such as credit score or historical repayment record. These methods have some limitations that make it difficult to accurately predict retail credit risk.
At present, as for risk control of retail credit loans, as the business forms belong to huge numbers, but the average single credit amount is smaller, the risk forms are various, complex and changeable, and the situation that business experience is difficult to comprehensively cover is solved, the risk control in the field is mostly controlled through a risk model and a strategy, and the manual intervention degree is relatively low. Among these, risk models are necessary and core tools for coping with complex and varied risks, and common risk models such as credit assessment models and anti-fraud models.
In the existing credit evaluation model, the algorithm used is mainly based on logistic regression and integrated decision tree. The logistic regression is a traditional algorithm for establishing a scoring card model, has long history and mature scheme, and is characterized by small model parameters, high stability, simple algorithm and strong interpretation. The integrated decision tree algorithm comprises a random forest and gradient lifting decision tree (GBDT), is a mainstream algorithm for establishing a machine learning wind control model, and is characterized by high model performance, low requirement on model entering characteristics, nonlinearity and partial interpretation.
The existing credit evaluation model is built by firstly carrying out feature derivation based on the original data, generating features with scalar or category values and then using the features to build the model. The model development method is difficult to fully utilize high-dimensional unstructured data such as images, characters, videos and the like. For these high-dimensional data, the existing mainstream scheme is to empirically design a series of feature generation rules, thereby generating features. However, the features of the design are difficult to cover the information in the high-dimensional data in all aspects, and most of the features designed according to experience can only extract a small amount of information. The model which is most suitable for extracting information from high-dimensional data is a deep learning model based on a deep neural network. However, deep learning models are difficult to replace logistic regression and decision trees in credit models because neural networks are far less interpretable than logistic regression and decision trees.
Disclosure of Invention
The embodiment of the application aims to provide a training method of a user credit scoring model and the user credit scoring method, which are used for solving the problem that the existing credit scoring model is difficult to fully utilize high-dimensional unstructured data such as images, characters, videos and the like.
The embodiment of the application provides a training method for a user credit scoring model, wherein the user credit scoring model comprises a neural network encoder and a meta model, and the training method comprises the following steps:
inputting the high-dimensional data into a trained neural network encoder to obtain a feature vector; wherein the high-dimensional data includes at least one of image data, video data, and text data;
according to the original data, carrying out rule derivation based on service experience to obtain derived features; wherein the raw data includes at least one of personal information, device information, credit history, and financial data;
feature screening is carried out according to all feature vectors and derivative features, and screened features are obtained;
and training the meta-model according to the screened characteristics and the corresponding labels to obtain the trained meta-model.
In the above technical solution, training of the user credit scoring model is divided into two stages of training, the training in the previous stage obtains a trained neural network encoder, and training (for example, training of GBDT or generalized linear regression) of the meta model is performed in the next stage by using feature vectors output by the trained neural network encoder and derived features derived from the original data. Therefore, the user credit scoring model obtained through distribution training can fully utilize high-dimensional unstructured data such as images, characters and videos, and improves the accuracy of user credit scoring, so that risks are reduced.
In some alternative embodiments, before inputting the high-dimensional data into the trained neural network encoder, further comprising:
training the neural network encoder.
In some alternative embodiments, training a neural network encoder includes:
establishing a neural network structure corresponding to the high-dimensional data; the neural network structure comprises a neural network encoder and a neural network prediction head, wherein the neural network encoder is used for generating and outputting corresponding feature vectors according to the high-dimensional data, and the neural network prediction head is used for generating and outputting corresponding prediction values according to the feature vectors;
and training the neural network structure according to the high-dimensional data and the corresponding labels to obtain the trained neural network encoder.
In some optional embodiments, establishing a neural network structure corresponding to the high-dimensional data includes:
establishing a corresponding neural network structure for each type of high-dimensional data;
training the neural network structure according to the high-dimensional data and the corresponding label to obtain a trained neural network encoder, comprising:
and respectively training the corresponding neural network structure according to each type of high-dimensional data and the corresponding label to obtain a trained neural network encoder corresponding to each type of high-dimensional data.
In the above technical solution, training of the neural network structure includes the following two cases: first, each type of high-dimensional data is used to independently train its corresponding neural network, but the same labels are used; second, each type of high-dimensional data is used to independently train its corresponding neural network, and a different label is used.
In some optional embodiments, training the neural network structure according to the high-dimensional data and the corresponding tag, to obtain a trained neural network encoder, further comprising:
integrating the neural network structures corresponding to the high-dimensional data of the multiple categories into a neural network overall structure;
and training the multi-mode data of the overall structure of the neural network by utilizing the high-dimensional data of a plurality of categories and the corresponding labels to obtain a plurality of trained neural network encoders.
In the technical scheme, when training the neural network encoder, all the high-dimensional data planned into the model are used, the neural networks corresponding to the high-dimensional data are integrated together, and one label is selected to train the multi-mode data.
In some alternative embodiments, feature screening is performed comprising:
features are filtered based on predefined criteria and/or based on model performance.
In the above technical solution, the feature screening includes: a filtered approach to screening features based on predefined criteria, such as screening features based on correlation of individual features to a target variable or information gain of individual features; and, a wrapped approach to screening features based on model performance, such as iteratively eliminating unimportant features using a recursive feature elimination algorithm.
In some alternative embodiments, the metamodel includes metamodels based on gradient-lifting decision trees or generalized linear regression algorithms.
In the technical scheme, the characteristics screened in the previous step are used as input data, GBDT or generalized linear regression is selected as an algorithm of a meta-model by combining with the designed label, and the meta-model is trained. By the training method, the stacked model fused with the neural network and the GBDT or the generalized linear model is very flexible to build, and the accuracy of the model is improved. And the method can flexibly select required sources and different types of data according to the requirements of wind control business, develop a model which is sufficient for risk control under complex conditions by utilizing various labels, and integrate the rule-derived feature with the interpretability into the model, so that the model maintains a certain degree of interpretability.
The user credit scoring method provided by the embodiment of the application comprises the following steps:
inputting high-dimensional data in the user data into a trained neural network encoder to obtain an actual feature vector; carrying out rule derivatization based on service experience on the original data in the user data to obtain actual derivatization characteristics;
performing feature screening according to all the actual feature vectors and the actual derivative features to obtain screened actual features;
and inputting the screened actual characteristics into the trained meta model to obtain actual scores.
According to the technical scheme, the data of the input model comprise high-dimensional data and original data, the high-dimensional data of the internal source can be fully utilized based on the multi-mode data, the feature dimension information breadth of the model is greatly increased, higher accuracy can be still maintained in a scene that a client lacks credit history, and in a scene facing a high-risk client, more information is helpful for identifying fraud risk, so that an integrated scheme of anti-fraud and credit scoring is realized in a scene of the high-risk client in which fraud risk and credit risk are difficult to separate.
An electronic device provided in an embodiment of the present application includes: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor, perform a method as any one of the above.
A computer readable storage medium provided by an embodiment of the present application, on which a computer program is stored, which when executed by a processor performs a method as described in any of the above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of steps of a training method for a credit scoring model for a user according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a user credit scoring model according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating steps of a method for scoring credit of a user according to an embodiment of the present application;
fig. 4 shows a possible structure of the electronic device provided in the embodiment of the present application.
Icon: 1-processor, 2-memory, 3-communication interface, 4-communication bus.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Credit scoring refers to the process of evaluating and scoring a customer's credit status in retail credit. Such scoring is typically used to predict the ability of a customer to pay in the future and to provide a reference to a bank or other financial institution to decide whether to offer loans or financing products such as credit cards. Credit scoring typically takes into account a variety of factors including the customer's financial status, credit history, revenue level, etc.
In the field of retail credit, there are currently a number of solutions for assessing a customer's credit status and predicting the customer's repayment capacity in the future. These schemes include: rule-based method: such methods use manually set rules to evaluate the credit status of the customer. For example, the rules may take into account the customer's revenue level, credit history, liability ratio, etc. Statistical methods: such methods use statistical models to assess the credit status of the customer. For example, logistic regression algorithms may be used to predict whether a customer will violate.
Existing credit scoring schemes, such as rule-based methods and statistical methods, may suffer from several drawbacks in assessing the credit status of high risk customers, including: rule-based methods may be too simple to adequately account for customer specifics. The method based on logistic regression or decision tree relies on feature derived engineering based on rules, is difficult to efficiently, comprehensively and deeply utilize high-dimensional data, has less available information when facing customers with high risk or lack of credit history information, and is difficult to accurately identify credit risk.
The high-dimensional data mainly comprises: image and video data, including live face verification video, identification card photographs, and the like. Text data, including filling text in the application stage, text generated by Optical Character Recognition (OCR) technology, and the like, for example, by using BERT or ERNIE as a backbone, a Natural Language Processing (NLP) model is established, and the text data and various credit risk labels are fitted to fine tune a large pre-training NLP model, so that text information extraction considering semantics is realized. Sequence data, including predefined types of events throughout the credit cycle, such as registration, live face verification, etc., or touch behavior on the handheld smart device, may be represented as a sequence of vectors. Signal data, typically one-dimensional or three-dimensional, equi-frequency sampled waveform data, includes sound signals and motion sensor signals.
The applicant has also found that in the actual financial sector business, the credit risk and fraud risk are practically difficult to separate, and the role that a pure anti-fraud model can play is often very limited. The existing anti-fraud system mainly uses expert experience and discrete rules, a model used in anti-fraud is in a relatively independent and narrow application scene, and an algorithm used in the anti-fraud system mainly uses deep learning and mainly aims at abnormality detection of some specific targets and specific scenes. The existing anti-fraud model application has narrow and scattered scenes and is difficult to be organically combined with the credit model. However, the anti-fraud model uses a neural network, so that the anti-fraud model can make full use of high-dimensional data such as images, videos and the like, compared with the credit model.
Therefore, in one or more embodiments of the present application, a training method for a user credit scoring model and a user credit scoring method are provided, which solve the problem that the existing credit scoring model is difficult to fully utilize high-dimensional unstructured data such as images, characters, videos, etc. by combining a deep learning model structure in the credit scoring model.
In the embodiment of the application, the model structure using deep learning is various deep neural networks, including Convolutional Neural Networks (CNN), long-term memory (LSTM), and transformations, and these models are combined with generalized linear regression or GBDT in a stacked manner. The specific model structure is as follows: a class of high-dimensional data enters a neural network structure suitable for the class of data, and a neural network encoder in the neural network structure is utilized to output feature vectors, wherein the feature vectors are equivalent to relevant information extracted from the high-dimensional data subjected to modulo operation, and the feature vectors and derivative features derived from empirical rules are put into a feature pool to serve as generalized linear regression or GBDT alternative modulo operation features. Features automatically extracted from high-dimensional data using neural network encoders are more complete, deeper, and more relevant to the target variable than features derived using empirical rules. And the addition of the feature vectors greatly widens the dimension of the feature pool, greatly increases the available information of the generalized linear regression or GBDT model, thereby increasing the performance of the user credit scoring model and greatly widening the available scenes of the user credit scoring model.
Wherein the characteristics of the neural network encoder output, if entered into a subsequent generalized linear regression or GBDT, form a stacked model. However, the stacked model consisting of neural network and GBDT is difficult to train directly. The most important reason is that the neural network and GBDT are trained in a completely different manner. In the training stage of the model, the neural network and the GBDT need to iterate, but the difference is that all parameters of the neural network change when each iteration is performed, while a part of parameters are added when each iteration is performed by the GBDT, the previous parameters do not change, and therefore the neural network and the GBDT are difficult to iterate simultaneously.
In order to solve the above-mentioned problems, an embodiment of the present application provides a training method for a user credit score model, where the user credit score model includes a neural network encoder and a meta-model, please refer to fig. 1, the training method includes:
step 100, inputting high-dimensional data into a trained neural network encoder to obtain feature vectors; wherein the high-dimensional data includes at least one of image data, video data, and text data;
according to the original data, carrying out rule derivation based on service experience to obtain derived features; wherein the raw data includes at least one of personal information, device information, credit history, and financial data;
step 200, carrying out feature screening according to all feature vectors and derivative features to obtain screened features;
and 300, training the meta-model according to the screened characteristics and the corresponding labels to obtain the trained meta-model.
In this embodiment, training of the user credit score model is divided into two stages of training, the training in the previous stage obtains a trained neural network encoder, and the training (for example, training of GBDT or generalized linear regression) of the meta model is performed in the next stage by using feature vectors output by the trained neural network encoder and derived features derived from the original data. Therefore, the user credit scoring model obtained through distribution training can fully utilize high-dimensional unstructured data such as images, characters and videos, and improves the accuracy of user credit scoring, so that risks are reduced.
Before the high-dimensional data is input into the trained neural network encoder, the training of the neural network encoder at the last stage is further included, specifically including:
establishing a neural network structure corresponding to the high-dimensional data; the neural network structure comprises a neural network encoder and a neural network prediction head, wherein the neural network encoder is used for generating and outputting corresponding feature vectors according to the high-dimensional data, and the neural network prediction head is used for generating and outputting corresponding prediction values according to the feature vectors;
and training the neural network structure according to the high-dimensional data and the corresponding labels to obtain the trained neural network encoder.
In some optional embodiments, establishing a neural network structure corresponding to the high-dimensional data includes: establishing a corresponding neural network structure for each type of high-dimensional data;
correspondingly, training the neural network structure according to the high-dimensional data and the corresponding label to obtain a trained neural network encoder, comprising: and respectively training the corresponding neural network structure according to each type of high-dimensional data and the corresponding label to obtain a trained neural network encoder corresponding to each type of high-dimensional data.
In the embodiment of the present application, the training of the neural network structure includes the following two cases: first, each type of high-dimensional data is used to independently train its corresponding neural network, but the same labels are used; second, each type of high-dimensional data is used to independently train its corresponding neural network, and a different label is used.
In some optional embodiments, training the neural network structure according to the high-dimensional data and the corresponding tag, to obtain a trained neural network encoder, further comprising:
integrating the neural network structures corresponding to the high-dimensional data of the multiple categories into a neural network overall structure;
and training the multi-mode data of the overall structure of the neural network by utilizing the high-dimensional data of a plurality of categories and the corresponding labels to obtain a plurality of trained neural network encoders.
In the embodiment of the application, when training a neural network encoder, all high-dimensional data planned into a model are used, the neural networks corresponding to the high-dimensional data are integrated together, and a label is selected to train multi-mode data.
In some alternative embodiments, feature screening is performed comprising: features are filtered based on predefined criteria and/or based on model performance.
In this embodiment of the present application, feature screening includes: a filtered approach to screening features based on predefined criteria, such as screening features based on correlation of individual features to a target variable or information gain of individual features; and, a wrapped approach to screening features based on model performance, such as iteratively eliminating unimportant features using a recursive feature elimination algorithm.
In some alternative embodiments, the metamodel includes metamodels based on gradient-lifting decision trees or generalized linear regression algorithms.
In the embodiment of the application, the characteristics screened in the previous step are used as input data, GBDT or generalized linear regression is selected as an algorithm of a meta-model by combining with the designed label, and the meta-model is trained. By the training method, the stacked model fused with the neural network and the GBDT or the generalized linear model is very flexible to build, and the accuracy of the model is improved. And the method can flexibly select required sources and different types of data according to the requirements of wind control business, develop a model which is sufficient for risk control under complex conditions by utilizing various labels, and integrate the rule-derived feature with the interpretability into the model, so that the model maintains a certain degree of interpretability.
Referring to fig. 2, fig. 2 is a schematic diagram of a user credit scoring model provided in an embodiment of the present application, and the working procedure of using the model to score credit is as follows:
the first step is to collect data: when a user applies for loan by using credit product client software in the handheld touch intelligent device, the client can acquire various data in the device after the user authorizes. In the application flow, a living body authentication link is provided, and the client can acquire a living body authentication video which is video data. In the application process, the user needs to shoot the identity card on site, wherein the identity card comprises a front side and a back side, and the client can acquire photos shot in real time to obtain image data. The client also collects attribute information of the self-running device and personal basic information filled in by the user. In addition to the data directly collected by the client, third party data such as credit history, financial status, and income level of the user are also utilized.
The second step is data preprocessing: for an identity card photo, an Optical Character Recognition (OCR) technology is used for recognizing and extracting characters on the photo, and text data is generated.
The third step is feature derivation: this step is mainly to map the original data based on rules to obtain derived features. The information filled by the user, the information such as characters, equipment attributes and the like generated by the OCR technology and the third party data are used for mapping the information into category type or numerical scalar type characteristics based on rules generated by business experience. For example, the academic information is mapped into the academic category, the identification card words are mapped into the average dominant income ordinals of provincial administrative units, the filling words are mapped into the professional category, and the like. The derived features are then added to the pool of features. These features are generated based on rules with interpretability.
The fourth step is to design the tags for training the model: and defining a good credit client and a bad credit client according to the debt default condition of the user to form a classification label.
The fifth step is to train the model, specifically, the above-mentioned distributed training mode is adopted, and will not be described herein.
And step six, performing backtracking test on the trained model, formulating strategy rules according to test results, and embedding the strategy rules into an application admission strategy system.
Referring to fig. 3, fig. 3 is a flowchart illustrating steps of a user credit scoring method according to an embodiment of the present application, including:
step 400, inputting high-dimensional data in user data into a trained neural network encoder to obtain an actual feature vector; carrying out rule derivatization based on service experience on the original data in the user data to obtain actual derivatization characteristics;
step 500, carrying out feature screening according to all the actual feature vectors and the actual derivative features to obtain screened actual features;
step 600, inputting the screened actual characteristics into the trained meta model to obtain actual scores.
In the embodiment of the application, the data of the input model comprise high-dimensional data and original data, the high-dimensional data of the internal source can be fully utilized based on the multi-mode data, the feature dimension information breadth of the model is greatly increased, higher accuracy can be still maintained in a scene that a client lacks credit history, and in a scene facing a high-risk client, more information is helpful for identifying fraud risk, so that an integrated scheme of anti-fraud and credit scoring is realized in a scene of the high-risk client in which fraud risk and credit risk are difficult to separate.
Fig. 4 shows a possible structure of the electronic device provided in the embodiment of the present application. Referring to fig. 4, the electronic device includes: processor 1, memory 2, and communication interface 3, which are interconnected and communicate with each other by a communication bus 4 and/or other forms of connection mechanisms (not shown).
The Memory 2 includes one or more (Only one is shown in the figure), which may be, but is not limited to, a random access Memory (Random Access Memory, RAM for short), a Read Only Memory (ROM for short), a programmable Read Only Memory (Programmable Read-Only Memory, PROM for short), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), an electrically erasable programmable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM for short), and the like. The processor 1 and possibly other components may access the memory 2, read and/or write data therein.
The processor 1 comprises one or more (only one shown in the figure), which may be an integrated circuit chip with signal processing capabilities. The processor 1 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a micro control unit (Micro Controller Unit, MCU), a network processor (Network Processor, NP), or other conventional processor; but may also be a special purpose processor including a Neural Network Processor (NPU), a graphics processor (Graphics Processing Unit GPU), a digital signal processor (Digital Signal Processor DSP), an application specific integrated circuit (Application Specific Integrated Circuits ASIC), a field programmable gate array (Field Programmable Gate Array FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Also, when the processor 1 is plural, some of them may be general-purpose processors, and the other may be special-purpose processors.
The communication interface 3 comprises one or more (only one is shown) and may be used for direct or indirect communication with other devices for data interaction. The communication interface 3 may comprise an interface for wired and/or wireless communication.
One or more computer program instructions may be stored in the memory 2, which may be read and executed by the processor 1 to implement the methods provided by the embodiments of the present application.
It will be appreciated that the configuration shown in fig. 4 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 4, or have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof. The electronic device may be a physical device such as a PC, a notebook, a tablet, a cell phone, a server, an embedded device, etc., or may be a virtual device such as a virtual machine, a virtualized container, etc. The electronic device is not limited to a single device, and may be a combination of a plurality of devices or a cluster of a large number of devices.
The present embodiments also provide a computer readable storage medium having stored thereon computer program instructions that, when read and executed by a processor of a computer, perform the methods provided by the embodiments of the present application. For example, the computer readable storage medium may be implemented as the memory 2 in the electronic device of fig. 4.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of training a user credit scoring model, the user credit scoring model comprising a neural network encoder and a meta-model, the training method comprising:
inputting the high-dimensional data into a trained neural network encoder to obtain a feature vector; wherein the high-dimensional data includes at least one of image data, video data, and text data;
according to the original data, carrying out rule derivation based on service experience to obtain derived features; wherein the raw data includes at least one of personal information, device information, credit history, and financial data;
performing feature screening according to all the feature vectors and the derivative features to obtain screened features;
and training the meta-model according to the screened characteristics and the corresponding labels to obtain the trained meta-model.
2. The method of claim 1, wherein before inputting the high-dimensional data into the trained neural network encoder, further comprising: training the neural network encoder.
3. The method of claim 2, wherein the training neural network encoder comprises:
establishing a neural network structure corresponding to the high-dimensional data; the neural network structure comprises a neural network encoder and a neural network prediction head, wherein the neural network encoder is used for generating and outputting corresponding feature vectors according to high-dimensional data, and the neural network prediction head is used for generating and outputting corresponding prediction values according to the feature vectors;
and training the neural network structure according to the high-dimensional data and the corresponding labels to obtain the trained neural network encoder.
4. The method of claim 3, wherein the establishing a neural network structure corresponding to the high-dimensional data comprises:
establishing a corresponding neural network structure for each type of high-dimensional data;
training the neural network structure according to the high-dimensional data and the corresponding label to obtain a trained neural network encoder, comprising:
and respectively training the corresponding neural network structure according to each type of high-dimensional data and the corresponding label to obtain a trained neural network encoder corresponding to each type of high-dimensional data.
5. The method of claim 4, wherein training the neural network structure based on the high-dimensional data and the corresponding labels, resulting in a trained neural network encoder, further comprises:
integrating the neural network structures corresponding to the high-dimensional data of the multiple categories into a neural network overall structure;
and training the multi-mode data of the overall structure of the neural network by utilizing the high-dimensional data of a plurality of categories and the corresponding labels to obtain a plurality of trained neural network encoders.
6. The method of claim 1, wherein the performing feature screening comprises:
screening features based on predefined criteria; and/or screening features based on model performance.
7. The method of claim 1, wherein the metamodel comprises a metamodel based on a gradient-lifting decision tree or a generalized linear regression algorithm.
8. A method of scoring a user credit, comprising:
inputting high-dimensional data in the user data into a trained neural network encoder to obtain an actual feature vector; carrying out rule derivatization based on service experience on the original data in the user data to obtain actual derivatization characteristics;
performing feature screening according to all the actual feature vectors and the actual derivative features to obtain screened actual features;
and inputting the screened actual characteristics into the trained meta model to obtain actual scores.
9. An electronic device, comprising: a processor and a memory storing machine-readable instructions executable by the processor, which when executed by the processor, perform the method of any of claims 1-8.
10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, performs the method according to any of claims 1-8.
CN202310474439.7A 2023-04-28 2023-04-28 Training method of user credit scoring model and user credit scoring method Active CN116258579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310474439.7A CN116258579B (en) 2023-04-28 2023-04-28 Training method of user credit scoring model and user credit scoring method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310474439.7A CN116258579B (en) 2023-04-28 2023-04-28 Training method of user credit scoring model and user credit scoring method

Publications (2)

Publication Number Publication Date
CN116258579A true CN116258579A (en) 2023-06-13
CN116258579B CN116258579B (en) 2023-08-04

Family

ID=86688203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310474439.7A Active CN116258579B (en) 2023-04-28 2023-04-28 Training method of user credit scoring model and user credit scoring method

Country Status (1)

Country Link
CN (1) CN116258579B (en)

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105976049A (en) * 2016-04-28 2016-09-28 武汉宝钢华中贸易有限公司 Chaotic neural network-based inventory prediction model and construction method thereof
CN107832718A (en) * 2017-11-13 2018-03-23 重庆工商大学 Finger vena anti false authentication method and system based on self-encoding encoder
CN108364028A (en) * 2018-03-06 2018-08-03 中国科学院信息工程研究所 A kind of internet site automatic classification method based on deep learning
CN108985941A (en) * 2018-07-18 2018-12-11 河海大学 A kind of stock intelligent Forecasting of combination newsletter archive
CN109036571A (en) * 2014-12-08 2018-12-18 20/20基因系统股份有限公司 The method and machine learning system of a possibility that for predicting with cancer or risk
CN109615072A (en) * 2018-11-27 2019-04-12 长威信息科技发展股份有限公司 A kind of integrated approach and computer equipment fighting neural network
CN109993412A (en) * 2019-03-01 2019-07-09 百融金融信息服务股份有限公司 The construction method and device of risk evaluation model, storage medium, computer equipment
CN110191113A (en) * 2019-05-24 2019-08-30 新华三信息安全技术有限公司 A kind of user behavior methods of risk assessment and device
CN111724083A (en) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 Training method and device for financial risk recognition model, computer equipment and medium
CN111931520A (en) * 2020-10-16 2020-11-13 北京百度网讯科技有限公司 Training method and device of natural language processing model
CN112270547A (en) * 2020-10-27 2021-01-26 上海淇馥信息技术有限公司 Financial risk assessment method and device based on feature construction and electronic equipment
CN112655004A (en) * 2018-09-05 2021-04-13 赛多利斯司特蒂姆数据分析公司 Computer-implemented method, computer program product, and system for anomaly detection and/or predictive maintenance
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
CN113553540A (en) * 2020-04-24 2021-10-26 株式会社日立制作所 Commodity sales prediction method
US20210350239A1 (en) * 2020-05-05 2021-11-11 Mitsubishi Electric Research :aboratories, Inc. Non-Uniform Regularization in Artificial Neural Networks for Adaptable Scaling
CN114078050A (en) * 2021-11-17 2022-02-22 中国建设银行股份有限公司 Loan overdue prediction method and device, electronic equipment and computer readable medium
CN114118192A (en) * 2020-09-01 2022-03-01 中国移动通信有限公司研究院 Training method, prediction method, device and storage medium of user prediction model
CN114186831A (en) * 2021-11-30 2022-03-15 四川新网银行股份有限公司 Personal credit risk prediction method and system by applying transfer learning
CN114298417A (en) * 2021-12-29 2022-04-08 中国银联股份有限公司 Anti-fraud risk assessment method, anti-fraud risk training method, anti-fraud risk assessment device, anti-fraud risk training device and readable storage medium
CN114493826A (en) * 2021-12-22 2022-05-13 四川新网银行股份有限公司 Personal credit assessment scoring method based on neural network
CN115147155A (en) * 2022-07-05 2022-10-04 西南交通大学 Railway freight customer loss prediction method based on ensemble learning
CN115470844A (en) * 2022-08-30 2022-12-13 广东电网有限责任公司广州供电局 Feature extraction and selection method for multi-source heterogeneous data of power system
CN115796173A (en) * 2023-02-20 2023-03-14 杭银消费金融股份有限公司 Data processing method and system for supervision submission requirements
US20230083437A1 (en) * 2021-08-27 2023-03-16 The Regents Of The University Of California Hyperdimensional learning using variational autoencoder
CN115829722A (en) * 2022-11-30 2023-03-21 中国农业银行股份有限公司 Training method of credit risk scoring model and credit risk scoring method
CN116011071A (en) * 2022-12-28 2023-04-25 华中科技大学 Method and system for analyzing structural reliability of air building machine based on active learning
CN116012131A (en) * 2022-10-17 2023-04-25 江苏城乡建设职业学院 Method, system, device and medium for evaluating credit risk of user

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036571A (en) * 2014-12-08 2018-12-18 20/20基因系统股份有限公司 The method and machine learning system of a possibility that for predicting with cancer or risk
CN105976049A (en) * 2016-04-28 2016-09-28 武汉宝钢华中贸易有限公司 Chaotic neural network-based inventory prediction model and construction method thereof
CN107832718A (en) * 2017-11-13 2018-03-23 重庆工商大学 Finger vena anti false authentication method and system based on self-encoding encoder
CN108364028A (en) * 2018-03-06 2018-08-03 中国科学院信息工程研究所 A kind of internet site automatic classification method based on deep learning
CN108985941A (en) * 2018-07-18 2018-12-11 河海大学 A kind of stock intelligent Forecasting of combination newsletter archive
CN112655004A (en) * 2018-09-05 2021-04-13 赛多利斯司特蒂姆数据分析公司 Computer-implemented method, computer program product, and system for anomaly detection and/or predictive maintenance
CN109615072A (en) * 2018-11-27 2019-04-12 长威信息科技发展股份有限公司 A kind of integrated approach and computer equipment fighting neural network
CN109993412A (en) * 2019-03-01 2019-07-09 百融金融信息服务股份有限公司 The construction method and device of risk evaluation model, storage medium, computer equipment
CN110191113A (en) * 2019-05-24 2019-08-30 新华三信息安全技术有限公司 A kind of user behavior methods of risk assessment and device
CN113553540A (en) * 2020-04-24 2021-10-26 株式会社日立制作所 Commodity sales prediction method
US20210350239A1 (en) * 2020-05-05 2021-11-11 Mitsubishi Electric Research :aboratories, Inc. Non-Uniform Regularization in Artificial Neural Networks for Adaptable Scaling
CN111724083A (en) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 Training method and device for financial risk recognition model, computer equipment and medium
CN114118192A (en) * 2020-09-01 2022-03-01 中国移动通信有限公司研究院 Training method, prediction method, device and storage medium of user prediction model
CN111931520A (en) * 2020-10-16 2020-11-13 北京百度网讯科技有限公司 Training method and device of natural language processing model
CN112270547A (en) * 2020-10-27 2021-01-26 上海淇馥信息技术有限公司 Financial risk assessment method and device based on feature construction and electronic equipment
CN112862593A (en) * 2021-01-28 2021-05-28 深圳前海微众银行股份有限公司 Credit scoring card model training method, device, system and computer storage medium
US20230083437A1 (en) * 2021-08-27 2023-03-16 The Regents Of The University Of California Hyperdimensional learning using variational autoencoder
CN114078050A (en) * 2021-11-17 2022-02-22 中国建设银行股份有限公司 Loan overdue prediction method and device, electronic equipment and computer readable medium
CN114186831A (en) * 2021-11-30 2022-03-15 四川新网银行股份有限公司 Personal credit risk prediction method and system by applying transfer learning
CN114493826A (en) * 2021-12-22 2022-05-13 四川新网银行股份有限公司 Personal credit assessment scoring method based on neural network
CN114298417A (en) * 2021-12-29 2022-04-08 中国银联股份有限公司 Anti-fraud risk assessment method, anti-fraud risk training method, anti-fraud risk assessment device, anti-fraud risk training device and readable storage medium
CN115147155A (en) * 2022-07-05 2022-10-04 西南交通大学 Railway freight customer loss prediction method based on ensemble learning
CN115470844A (en) * 2022-08-30 2022-12-13 广东电网有限责任公司广州供电局 Feature extraction and selection method for multi-source heterogeneous data of power system
CN116012131A (en) * 2022-10-17 2023-04-25 江苏城乡建设职业学院 Method, system, device and medium for evaluating credit risk of user
CN115829722A (en) * 2022-11-30 2023-03-21 中国农业银行股份有限公司 Training method of credit risk scoring model and credit risk scoring method
CN116011071A (en) * 2022-12-28 2023-04-25 华中科技大学 Method and system for analyzing structural reliability of air building machine based on active learning
CN115796173A (en) * 2023-02-20 2023-03-14 杭银消费金融股份有限公司 Data processing method and system for supervision submission requirements

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李振国: "安卓恶意代码检测技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 07, pages 138 - 97 *
梁巧梅: "基于自编码器与堆叠模型的多因子量化研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 02, pages 140 - 131 *

Also Published As

Publication number Publication date
CN116258579B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN112613501A (en) Information auditing classification model construction method and information auditing method
CN110619568A (en) Risk assessment report generation method, device, equipment and storage medium
CN111861768B (en) Service processing method and device based on artificial intelligence, computer equipment and medium
CN112395979B (en) Image-based health state identification method, device, equipment and storage medium
CN112819604A (en) Personal credit evaluation method and system based on fusion neural network feature mining
CN107807941A (en) Information processing method and device
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
CN115050064A (en) Face living body detection method, device, equipment and medium
Klaas Machine learning for finance: principles and practice for financial insiders
CN105740808A (en) Human face identification method and device
CN114626731A (en) Risk identification method and device, electronic equipment and computer readable storage medium
CN115204886A (en) Account identification method and device, electronic equipment and storage medium
CN114639152A (en) Multi-modal voice interaction method, device, equipment and medium based on face recognition
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN112634017A (en) Remote card opening activation method and device, electronic equipment and computer storage medium
CN116258579B (en) Training method of user credit scoring model and user credit scoring method
CN114818999B (en) Account identification method and system based on self-encoder and generation countermeasure network
CN114973374A (en) Expression-based risk evaluation method, device, equipment and storage medium
CN113781201B (en) Risk assessment method and device for electronic financial activity
CN114202337A (en) Risk identification method, device, equipment and storage medium
CN113240513A (en) Method for determining user credit line and related device
CN116205726B (en) Loan risk prediction method and device, electronic equipment and storage medium
WO2024013939A1 (en) Machine learning program, machine learning method, and information processing device
CN115907961A (en) Credit evaluation system and method based on fusion model
CN111639536A (en) Auxiliary identification method and device for face-examination fraud, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant