CN117495546A - Bad account prediction method and device, electronic equipment and storage medium - Google Patents

Bad account prediction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117495546A
CN117495546A CN202311578478.8A CN202311578478A CN117495546A CN 117495546 A CN117495546 A CN 117495546A CN 202311578478 A CN202311578478 A CN 202311578478A CN 117495546 A CN117495546 A CN 117495546A
Authority
CN
China
Prior art keywords
bad account
prediction model
account prediction
training
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311578478.8A
Other languages
Chinese (zh)
Inventor
杜培良
戈汉权
肖勃飞
陈明
何兴凤
石建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongdian Jinxin Digital Technology Group Co ltd
Original Assignee
Zhongdian Jinxin Digital Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongdian Jinxin Digital Technology Group Co ltd filed Critical Zhongdian Jinxin Digital Technology Group Co ltd
Priority to CN202311578478.8A priority Critical patent/CN117495546A/en
Publication of CN117495546A publication Critical patent/CN117495546A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application provides a bad account prediction method, a bad account prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is input into an initial bad account prediction model to obtain an output result; determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified; performing performance test on the bad account prediction model to be verified by using the test set, and obtaining a target bad account prediction model if the bad account prediction model to be verified meets the preset performance requirement; and inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model. According to the method, the target parameters of the initial bad account prediction model are determined through the mode of dividing the training set pair again and performing cross verification, and the target bad account prediction model with higher prediction accuracy is obtained.

Description

Bad account prediction method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of bad account prediction technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for bad account prediction.
Background
With the rapid development of economy, more and more banks and other financial institutions are pushing credit products to meet the financial requirements of people. Credit refers to a form of value exercise conditioned on repayment and rest. Typically including bank deposit, loan, and the like. After the credit product is pushed out, the financial institution needs to monitor the credit product to avoid the occurrence of a large number of bad accounts in order to avoid losses.
In the prior art, logic regression, decision trees, random forests, support Vector Machines (SVMs), neural networks, deep learning or integrated learning methods are used for bad account prediction, but the methods generally require that a financial institution can collect more features, and when the features collected by the financial institution are limited, the bad account prediction cannot achieve an ideal effect.
Disclosure of Invention
In view of the foregoing, it is an object of the present application to provide a method, an apparatus, an electronic device and a storage medium for bad account prediction, so as to overcome the problems in the prior art.
In a first aspect, an embodiment of the present application provides a method for bad account prediction, where the method includes:
dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
performing performance test on the bad account prediction model to be verified by using the test set, and obtaining a target bad account prediction model if the bad account prediction model to be verified meets the preset performance requirement;
and inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
In some technical solutions of the present application, the training set includes a plurality of historical credit data, and the method splits the training set by:
selecting any historical credit data from the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
In some technical solutions of the present application, the training subset and the verification subset are multiple;
the method obtains a plurality of the verification subsets and the training subsets by:
sequentially taking historical credit data in the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
In some technical solutions of the present application, selecting the bad account prediction model to be verified from the bad account prediction models to be selected includes:
based on a preset screening criterion, selecting a preliminary bad account prediction model from the bad account prediction models to be selected;
and adjusting and optimizing target parameters of the primary-selection bad account prediction model based on the output results of the training subsets and the verification subsets to obtain the bad account prediction model to be verified.
In some technical solutions of the present application, the initial bad account prediction model is an autoregressive moving average model including fractional order scores; the determining the target parameters of the initial bad account prediction model includes:
and determining the target difference times, the target autoregressive order and the target moving average order of the autoregressive moving average model.
In some technical solutions of the present application, the method obtains the bad account prediction model to be verified by:
obtaining a zero-mean parameter sequence by carrying out fractional order difference on the training subset; the zero-mean parameter sequence comprises a regression order to be selected and a moving average order to be selected corresponding to the regression order to be selected;
the regression order to be selected and the moving average order to be selected corresponding to the regression order to be selected are put into an initial bad account prediction model to obtain a bad account prediction model to be selected;
and selecting the bad account prediction model to be verified from the bad account prediction models to be selected.
In some technical solutions of the present application, the above method determines whether the bad account prediction model to be verified meets a preset performance requirement by:
and carrying out residual analysis on the predicted value of the to-be-verified bad account prediction model for predicting the test set and the true value of the test set, and determining whether the to-be-verified bad account prediction model meets the preset performance requirement.
In a second aspect, an embodiment of the present application provides an apparatus for bad account prediction, where the apparatus includes:
the dividing module is used for dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
the determining module is used for determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
the testing module is used for performing performance testing on the bad account prediction model to be verified by using the testing set, and if the bad account prediction model to be verified meets the preset performance requirement, a target bad account prediction model is obtained;
and the prediction module is used for inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the steps of the method for realizing bad account prediction described above.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of bad account prediction described above.
The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:
dividing acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model; determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified; performing performance test on the bad account prediction model to be verified by using the test set, and obtaining a target bad account prediction model if the bad account prediction model to be verified meets the preset performance requirement; and inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
According to the method and the system, the target parameters of the initial bad account prediction model are determined through the mode of dividing the training set pair again and performing cross verification, the target bad account prediction model with higher prediction accuracy is obtained, the target bad account prediction model is used for predicting the credit data to be predicted, and economic losses of financial institutions such as banks can be effectively reduced.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for bad account prediction according to an embodiment of the present application;
fig. 2 is a schematic diagram of a method for splitting a training set according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another method for splitting a training set according to an embodiment of the present application;
FIG. 4 shows a schematic diagram of one embodiment provided by the examples of this application;
FIG. 5 is a schematic diagram of an apparatus for bad account prediction according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.
In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but not to exclude the addition of other features.
In the credit management field, bad account prediction refers to predicting the likelihood of a customer or borrower potentially developing a default or failing to pay back the debt on time in a future period of time by analyzing historical data and building a mathematical model. Bad account prediction aims to help financial institutions, enterprises and creditors assess risk and develop more effective risk management strategies to reduce potential economic losses. Depending on the data collected by the financial institution, different methods may be used to predict bad account, including but not limited to logistic regression, decision trees, random forests, support Vector Machines (SVMs), neural networks, deep learning or ensemble learning methods, but these methods generally require that the financial institution collect more features, and when fewer features are available for collection, a time series analysis method may be used, and an ARIMA model, that is, an autoregressive moving average model, is commonly used.
Although the ARIMA model achieves good results in application, it still has some drawbacks. Such as relatively weak in dealing with long term memory effects; the fitting ability to nonlinear features is limited; multiple differencing is required when there is a significant trend, seasonal, or other non-stationarity to the data; the requirement of the model order on expertise and experience is high; manual handling of seasonal variations, such as introducing seasonal differences, is often required depending on the scene, increasing model complexity.
Based on this, the embodiment of the application provides a bad account prediction method, a bad account prediction device, electronic equipment and a storage medium, and the method, the device and the storage medium are described in the following embodiments.
Fig. 1 is a flow chart illustrating a method for bad account prediction according to an embodiment of the present application, where the method includes steps S101-S104; specific:
s101, dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
s102, determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
s103, performing performance test on the bad account prediction model to be verified by using the test set, and obtaining a target bad account prediction model if the bad account prediction model to be verified meets preset performance requirements;
s104, inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
According to the method and the system, the target parameters of the initial bad account prediction model are determined through the mode of dividing the training set pair again and performing cross verification, the target bad account prediction model with higher prediction accuracy is obtained, the target bad account prediction model is used for predicting the credit data to be predicted, and economic losses of financial institutions such as banks can be effectively reduced.
Some embodiments of the present application are described in detail below. The following embodiments and features of the embodiments may be combined with each other without conflict.
In order to be able to predict bad accounts, the embodiment of the application needs to acquire historical credit data. The historical credit data is repayment data (including repayment time, times, amounts and the like) of certain credit products, and the historical credit data is taken as a sample to be analyzed, so that bad account (bad account refers to the part of the accounts receivable which is not withdrawn by an enterprise and approved to be listed for losses) is predicted.
Specifically, the embodiment of the application divides the acquired historical credit data into a training set and a testing set. The training set is used to build the model and the test set is used to evaluate the model. Both the training set and the test set may contain a plurality of historical credit data. For example, the historical credit data obtained in the embodiment of the application includes a1-a100, and the training set is a1-a80 and the test set is a81-a100 by dividing the historical credit data. In order to ensure the accuracy of parameters in the training model, the embodiment of the application does not directly train the initial bad account prediction model by using the training set, but splits the training set again. Splitting the training set results in a training subset and a verification subset. For example, training set a1-a80 as described above, needs to be split again to split a1-a80 into training and verification subsets.
In S101, when splitting the training set, the steps shown in fig. 2 are included:
s201, selecting any historical credit data from the training set as the verification subset;
s202, historical credit data except the verification subset in the training set is used as a training subset.
When the training set is split, the training set is only required to be split into one historical credit data. This split historical credit data is a verification subset, and other historical credit data in the training set than the split historical credit data is used as a training subset. The training subset is used for training the model, and the verification subset is used for performing verification evaluation on the model. For example, training set a1-a80 is described above, with a1 removed as the verification subset and a2-a80 as the training subset; a80 can be removed as a verification subset, a1-a79 as a training subset, a3 can be removed as a verification subset, a1, a2, a4-a80 as a training subset, etc.
Further, in order to overcome the problem of few features in the historical credit data, when the training data set is split to obtain the training subset and the verification subset, the embodiment of the present application further includes the steps as shown in fig. 3:
s301, sequentially taking historical credit data in the training set as the verification subset;
s302, historical credit data except the verification subset in the training set is used as a training subset.
According to the embodiment of the application, the training set is split for a plurality of times, and each split is used for obtaining one verification subset and the training subset corresponding to the verification subset. When splitting, the embodiment of the application splits each historical credit data in the training set in turn, namely the embodiment of the application adopts a k-fold cross validation method. For example, the training set a1-a80 described above, the present embodiment sequentially uses a1 as the verification subset, uses other historical credit data than a1 as the training subset, uses a2 as the verification subset, uses other historical credit data than a2 as the training subset, uses a3 as the verification subset, uses other historical credit data than a3 as the training subset … …, uses a80 as the verification subset, and uses other historical credit data than a80 as the training subset.
After the data partitioning is completed, the bad account prediction model needs to be trained and verified by using the data. In the embodiment of the application, a ARIMA (AutoRegressive Integrated Moving Average) model is selected as the bad account prediction model. The ARIMA model is a statistical model commonly used for time series analysis and prediction. It combines the concepts of Autoregressive (AR) and Moving Average (MA), and differencing (Integrated) operations for capturing trends, seasonal and random fluctuations in time series data.
AR (AutoRegressive) part: the autoregressive portion of the ARIMA model refers to the relationship between the current observations and some of the past observations. The autoregressive model takes into account the linear correlation between the current value and the past value, expressed as a linear combination with respect to the past value.
MA (Moving Average) part: the moving average part refers to the relationship between the current observations and some random error term in the past (white noise). The moving average model takes into account the effect of random fluctuations on the current observations.
Differential (Integrated) operation: differencing refers to differencing the raw time series data to eliminate possible trends or seasonal. The differential operation may convert the non-stationary time series to a stationary time series, thereby making the data more suitable for application of the ARIMA model.
The ARIMA model consists of three parts: p, d and q represent the autoregressive order, the number of differences and the moving average order, respectively. By choosing appropriate p, d and q values, ARIMA models of different orders can be constructed for describing and predicting time series data.
Specifically, the ARIMA model is:
Y t =φ 01 Y t-12 Y t-2 +…+φ p Y t-p +∈ t1t-1 -
θ 2t-2 -…-θ qt-q
wherein Y is t Is time series data requiring prediction, phi 0p Are parameters of the AR model that describe the relationship between the current value and the past p time point values. θ 1q Is a parameter of the MA model that is used to describe the relationship between the current value and the error at the past q time points. E-shaped article t Is a white noise sequence and phi p ≠0,θ q ≠0。
After obtaining the training subset, the training subset is input into an initial bad account prediction model (initial ARIMA model), and after the initial bad account prediction model processes the training subset, an output result of the initial bad account prediction model is obtained. After obtaining an output result of an initial bad account prediction model, the embodiment of the application determines target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain the bad account prediction model to be verified. That is, in the embodiment of the present application, the training subset and the verification subset select appropriate target parameters (including the target difference number, the target autoregressive order and the target moving average order) for the initial ARIMA model, and the initial ARIMA model including the appropriate target parameters is selected as the ARIMA model to be verified.
In an alternative embodiment, the training subset is differentially processed by the present embodiment before being input into the initial bad account prediction model, with the emphasis that the present embodiment performs fractional order differencing.
Specifically, the order d of the difference is determined by analyzing the long-term memory factors of each of the historical credit data in the training subset. The order of fractional order in the embodiment of the application is not limited to an integer, and is usually a real number between-0.5 and 0.5, the post shift operator is B, and the fractional order difference operator is:
wherein,representing binomial coefficients, Γ (·) is a gamma function, B is historical credit data, and d is the order of the difference.
Obtaining a zero-mean parameter sequence (ARMA (p, q) sequence) by carrying out fractional resolution on the training subset; the zero-mean parameter sequence comprises a regression order to be selected and a moving average order to be selected corresponding to the regression order to be selected. Substituting the regression order to be selected and the moving average order to be selected contained in the zero mean value parameter sequence obtained through fractional order difference into an initial bad account prediction model to obtain a corresponding bad account prediction model to be selected.
And in the obtained bad account prediction model to be selected, a processing process is required to be carried out on the bad account prediction model to be selected, and a bad account prediction model to be verified with better parameters is processed from the bad account prediction model to be selected. When the bad account prediction model to be selected is processed, the embodiment of the application mainly performs two processing processes, wherein the first processing process is performed based on preset processing process rules, and the second processing process is performed based on the output result of the training subset and the parameter adjustment optimization of the verification subset and the Xinin. And obtaining a preliminary election bad account prediction model through a first processing process, and then carrying out parameter adjustment and optimization on the preliminary election bad account prediction model to obtain the bad account prediction model to be verified.
For the first processing procedure: the process rules preset here include red pool information criteria (Akaike Information Criteria, AIC) and bayesian information criteria (Bayesian Information Criteria, BIC). Wherein, the lower the values of AIC and BIC, the better the bad account prediction model.
The erythroid information criterion measures the quality of the bad account prediction model according to the fitting goodness, simplicity and dependence degree of the bad account prediction model on the data, and the formula is as follows:
AIC=-2*log(L)+2k
wherein, L is the likelihood function of the bad account prediction model, which is used for measuring the fitting degree of the bad account prediction model to the data; k is the number of parameters of the bad account prediction model.
The Bayesian information criterion is that a measurement index n is added on the basis of AIC to represent the number of samples fitting the bad account prediction model, and the formula is as follows:
BIC=-2*log(L)+k*log(n)
wherein, L is the likelihood function of the bad account prediction model, which is used for measuring the fitting degree of the bad account prediction model to the data; k is the number of parameters of the bad account prediction model; n is the number of historical credit samples.
For the second treatment process: and training the initial bad account prediction model by using each split training subset. As shown in fig. 4, the training set includes Split1, split2, split3, and Split4. Split1 is removed to serve as a verification subset, split2, split3 and Split4 serve as training subsets, training is conducted once, and a bad account prediction model after training is verified by using Split 1. Split2 is removed to serve as a verification subset, split1, split3 and Split4 serve as training subsets, primary training is conducted, and a bad account prediction model after training is verified by using Split 2. Split3 is removed to serve as a verification subset, split1, split2 and Split4 serve as training subsets, primary training is conducted, and a bad account prediction model after training is verified by using Split 3. Split4 is removed to serve as a verification subset, split1, split2 and Split3 serve as training subsets, training is conducted once, and a bad account prediction model after training is verified by using Split4. And adjusting parameters of the primary bad account prediction model through the training and verifying processes to obtain the adjusted bad account prediction model to be verified.
In S103, after obtaining the bad account prediction model to be verified, the embodiment of the present application further needs to test and verify the performance of the bad account prediction model to be verified. If the performance of the bad account prediction model to be verified does not meet the preset performance requirement, training is required again (i.e. repeating the above steps S101-S103). And if the bad account prediction model to be verified meets the preset performance requirement, obtaining a target bad account prediction model capable of performing actual use prediction.
The performance test of the bad account prediction model to be verified is mainly to calculate the deviation condition between the predicted value and the actual value of the bad account prediction model to be verified. If the deviation is larger than a preset deviation threshold, the bad account prediction model to be verified does not meet the preset performance requirement, and if the deviation is smaller than or equal to the preset deviation threshold, the bad account prediction model to be verified meets the preset performance requirement.
Specifically, the analysis of the deviation condition of the bad account prediction model to be verified in the embodiment of the application is residual analysis between the predicted value and the true value. Residual analysis is a method for checking the quality of a time series model fit that helps to check whether the model is able to capture structures in the data and to determine if there are situations where the pattern is not captured. The residual is the difference between the actual observed value and the model predicted value, and the residual analysis aims at checking whether the residual shows randomness, stationarity, independence and normalization, and the steps are approximately as follows:
s401, calculating residual errors: subtracting the actual observed value from the predicted value of the model to the observed data to obtain a residual sequence.
S402, drawing a residual diagram: and drawing a graph of the residual error changing along with time, and checking whether the residual error has obvious trend, periodicity or abnormality.
S403, drawing an autocorrelation chart and a partial autocorrelation chart, and checking correlation between residuals through the charts to further determine whether the model contains non-captured information.
S404, a statistical test, such as Ljung-Box test, is used for checking the independence and randomness of the residual sequence.
S405, a normal test (such as a shape-Wilk test) is used to check whether the residual sequence approximately obeys normal distribution.
By the method in the embodiment of the application, the method has at least the following beneficial effects:
more accurate prediction: the fractional order ARIMA model is better able to capture long-term memory and non-linear features, so in bad account prediction, it can generally provide more accurate prediction results. This facilitates more intelligent risk management and decision making by financial institutions, businesses, and the like.
More stable predictions: the fractional order ARIMA model enables more stable predictions when dealing with non-stationary data. By introducing fractional order difference, the model can better cope with the instability of the data, thereby reducing the instability of the model.
Stronger adaptability: the fractional order ARIMA model has higher flexibility and can be suitable for wider data modes. The method can adapt to different data distribution, trend, seasonality and periodicity, so that the model is more adaptive, and can better cope with different types of account age data.
Fig. 5 shows a schematic structural diagram of an apparatus for bad account prediction according to an embodiment of the present application, where the apparatus includes:
the dividing module is used for dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
the determining module is used for determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
the testing module is used for performing performance testing on the bad account prediction model to be verified by using the testing set, and if the bad account prediction model to be verified meets the preset performance requirement, a target bad account prediction model is obtained;
and the prediction module is used for inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
The training set includes a plurality of historical credit data, and the device splits the training set by:
selecting any historical credit data from the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
The training subset and the verification subset are multiple;
the apparatus obtains a plurality of the verification subsets and the training subsets by:
sequentially taking historical credit data in the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
The selecting the bad account prediction model to be verified from the bad account prediction models to be selected comprises the following steps:
based on a preset screening criterion, selecting a preliminary bad account prediction model from the bad account prediction models to be selected;
and adjusting and optimizing target parameters of the primary-selection bad account prediction model based on the output results of the training subsets and the verification subsets to obtain the bad account prediction model to be verified.
The initial bad account prediction model is an autoregressive moving average model comprising fractional order scores; the determining the target parameters of the initial bad account prediction model includes:
and determining the target difference times, the target autoregressive order and the target moving average order of the autoregressive moving average model.
The device obtains the bad account prediction model to be verified by the following method:
obtaining a zero-mean parameter sequence by carrying out fractional order difference on the training subset; the zero-mean parameter sequence comprises a regression order to be selected and a moving average order to be selected corresponding to the regression order to be selected;
the regression order to be selected and the moving average order to be selected corresponding to the regression order to be selected are put into an initial bad account prediction model to obtain a bad account prediction model to be selected;
and selecting the bad account prediction model to be verified from the bad account prediction models to be selected.
The device determines whether the bad account prediction model to be verified meets preset performance requirements by the following method:
and carrying out residual analysis on the predicted value of the to-be-verified bad account prediction model for predicting the test set and the true value of the test set, and determining whether the to-be-verified bad account prediction model meets the preset performance requirement.
As shown in fig. 6, an embodiment of the present application provides an electronic device, for performing a method for bad account prediction in the present application, where the device includes a memory, a processor, a bus, and a computer program stored on the memory and capable of running on the processor, where the processor implements steps of the method for bad account prediction when executing the computer program.
In particular, the above memory and processor may be general-purpose memory and processor, which are not limited herein, and the method of bad account prediction described above can be performed when the processor runs a computer program stored in the memory.
Corresponding to the method of bad account prediction in the present application, the embodiments of the present application further provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the method of bad account prediction described above.
In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, on which a computer program is executed to perform the method of bad account prediction described above.
In the embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. The system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions in actual implementation, and e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of bad account prediction, the method comprising:
dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
performing performance test on the bad account prediction model to be verified by using the test set, and obtaining a target bad account prediction model if the bad account prediction model to be verified meets the preset performance requirement;
and inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
2. The method of claim 1, wherein the training set includes a plurality of historical credit data, the method splitting the training set by:
selecting any historical credit data from the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
3. The method of claim 2, wherein the training subset and the verification subset are each a plurality;
the method obtains a plurality of the verification subsets and the training subsets by:
sequentially taking historical credit data in the training set as the verification subset;
historical credit data in the training set other than the verification subset is taken as a training subset.
4. The method of claim 3, wherein the selecting the bad account prediction model to be verified from the bad account prediction models to be selected comprises:
based on a preset screening criterion, selecting a preliminary bad account prediction model from the bad account prediction models to be selected;
and adjusting and optimizing target parameters of the primary-selection bad account prediction model based on the output results of the training subsets and the verification subsets to obtain the bad account prediction model to be verified.
5. The method of claim 1, wherein the initial bad account prediction model is an autoregressive moving average model including fractional order scores; the determining the target parameters of the initial bad account prediction model includes:
and determining the target difference times, the target autoregressive order and the target moving average order of the autoregressive moving average model.
6. The method of claim 5, wherein the method obtains the bad account prediction model to be verified by:
obtaining a zero-mean parameter sequence by carrying out fractional order difference on the training subset; the zero-mean parameter sequence comprises a regression order to be selected and a moving average order to be selected corresponding to the regression order to be selected;
the regression order to be selected and the moving average order to be selected corresponding to the regression order to be selected are put into an initial bad account prediction model to obtain a bad account prediction model to be selected;
and selecting the bad account prediction model to be verified from the bad account prediction models to be selected.
7. The method of claim 1, wherein the method determines whether the bad account prediction model to be verified meets a preset performance requirement by:
and carrying out residual analysis on the predicted value of the to-be-verified bad account prediction model for predicting the test set and the true value of the test set, and determining whether the to-be-verified bad account prediction model meets the preset performance requirement.
8. An apparatus for bad account prediction, the apparatus comprising:
the dividing module is used for dividing the acquired historical credit data into a training set and a testing set, and splitting the training set again to obtain a training subset and a verification subset; the training subset is used for being input into an initial bad account prediction model to obtain an output result of the initial bad account prediction model;
the determining module is used for determining target parameters of the initial bad account prediction model based on the output result and the verification subset to obtain a bad account prediction model to be verified;
the testing module is used for performing performance testing on the bad account prediction model to be verified by using the testing set, and if the bad account prediction model to be verified meets the preset performance requirement, a target bad account prediction model is obtained;
and the prediction module is used for inputting the credit data to be predicted into the target bad account prediction model to obtain a bad account prediction result output by the target bad account prediction model.
9. An electronic device, comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the method of bad account prediction according to any of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the method of bad account prediction according to any of claims 1 to 7.
CN202311578478.8A 2023-11-23 2023-11-23 Bad account prediction method and device, electronic equipment and storage medium Pending CN117495546A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311578478.8A CN117495546A (en) 2023-11-23 2023-11-23 Bad account prediction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311578478.8A CN117495546A (en) 2023-11-23 2023-11-23 Bad account prediction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117495546A true CN117495546A (en) 2024-02-02

Family

ID=89678157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311578478.8A Pending CN117495546A (en) 2023-11-23 2023-11-23 Bad account prediction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117495546A (en)

Similar Documents

Publication Publication Date Title
JP6930013B2 (en) Performance model adverse effects compensation
US20180260891A1 (en) Systems and methods for generating and using optimized ensemble models
CN107025596B (en) Risk assessment method and system
Pavlenko et al. Credit risk modeling using Bayesian networks
CN112017040B (en) Credit scoring model training method, scoring system, equipment and medium
US20220343433A1 (en) System and method that rank businesses in environmental, social and governance (esg)
Pourheydari et al. Identifying qualified audit opinions by artificial neural networks
CN112150298A (en) Data processing method, system, device and readable medium
Jammazi et al. Estimating and forecasting portfolio’s Value-at-Risk with wavelet-based extreme value theory: Evidence from crude oil prices and US exchange rates
CN111062486A (en) Method and device for evaluating feature distribution and confidence coefficient of data
Lohmann et al. Using accounting‐based information on young firms to predict bankruptcy
Bertomeu et al. Using machine learning to measure conservatism
KR20180013102A (en) Method for evaluating credit rating, and apparatus and computer-readable recording media using the same
Britt Modeling the distribution of sentence length decisions under a guidelines system: An application of quantile regression models
CN116468273A (en) Customer risk identification method and device
CN117495546A (en) Bad account prediction method and device, electronic equipment and storage medium
BULUT et al. Financial Failure Estimation with Logistic Regression Model: A Study on Technology Sector Companies Treated in BIST
US20200265521A1 (en) Multimedia risk summarizer
CN113537577A (en) Revenue prediction method, system, electronic device, and computer-readable storage medium
CN113177733A (en) Medium and small micro-enterprise data modeling method and system based on convolutional neural network
CN113159419A (en) Group feature portrait analysis method, device and equipment and readable storage medium
Lohmann et al. Using accounting‐based and loan‐related information to estimate the cure probability of a defaulted company
Kumar et al. Cryptocurrency Price Forecasting in a Volatile Landscape: SARIMAX Modeling and Short-Term Strategies
Rosid Artificial Neural Networks for Predicting Taxpaying Behaviour of Indonesian Firms
US10832147B1 (en) Systems and methods for determining relative importance of one or more variables in a non-parametric machine learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination