CN114239981A - Asset level prediction method, device, equipment and storage medium - Google Patents

Asset level prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN114239981A
CN114239981A CN202111575934.4A CN202111575934A CN114239981A CN 114239981 A CN114239981 A CN 114239981A CN 202111575934 A CN202111575934 A CN 202111575934A CN 114239981 A CN114239981 A CN 114239981A
Authority
CN
China
Prior art keywords
asset level
data
prediction
asset
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111575934.4A
Other languages
Chinese (zh)
Inventor
陈庆麟
陈婷
吴三平
庄伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202111575934.4A priority Critical patent/CN114239981A/en
Publication of CN114239981A publication Critical patent/CN114239981A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Technology Law (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an asset level prediction method, device, equipment and storage medium, and belongs to the technical field of machine learning. According to the method, target data are obtained, the target data are input into a pre-constructed asset level prediction model to predict the missing user asset level, corresponding target asset level scores are obtained, the target asset level scores are sorted, corresponding sorting results are calculated, the sorting results are input into a prediction curve, the predicted asset level corresponding to the prediction curve is obtained, and then the predicted asset level is restored, so that the target asset level is obtained. And restoring the regression effect brought by the asset level prediction model by fitting a curve, so that the asset level distribution predicted by the model is basically consistent with the actual asset level distribution, and the user effect of the high and low asset levels is also ensured.

Description

Asset level prediction method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of machine learning, in particular to an asset level prediction method, device, equipment and storage medium.
Background
In the field of financial wind control, the user's asset level is an important assessment indicator, such as income, salary, credit, etc. However, most user asset level indicators are missing and need to be filled. The general practice in the industry is to train a regression model of a machine learning algorithm on a selected business sample, and then predict the asset level score of a large disk of users, thereby making up the asset level index of the missing user. Generally speaking, the model has regression effect, so the distribution of the asset level scores predicted by the model and the asset levels of actual users has deviation, the model scores are gathered near the mean value, and the prediction effect on low asset level and high asset level groups is low, which is not beneficial to the use of upper-layer services.
In order to solve the problem that the prediction effect of low-asset-level and high-asset-level groups is low, the existing technical schemes mainly have two types, and firstly, from the aspect of a model, a machine learning method with excellent performance can be used, and the overall model effect is improved; and secondly, solving the problem of distribution repair by carrying out dimensional balancing or standardization on the prediction scores.
However, actual production data is unique, the best effect cannot be achieved by directly applying a model, and a regression phenomenon occurs. In addition, the dimension balancing or standardization method can only perform integral translation or stretching on the asset level prediction model scores, the overall distribution trend cannot be changed, the regression effect of the distribution cannot be completely solved, and the prediction effect cannot be well improved.
Disclosure of Invention
The invention mainly aims to provide an asset level prediction method, an asset level prediction device, asset level prediction equipment and a storage medium, and aims to solve the problem that the existing asset level prediction is not high in accuracy.
To achieve the above object, the present invention provides an asset level prediction method, comprising the steps of:
acquiring target data;
inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores, and sequencing the target asset level scores to obtain corresponding sequencing results;
inputting the sequencing result into a pre-fitted prediction curve to obtain a prediction asset level corresponding to the prediction curve;
and restoring the predicted asset level to obtain a target asset level.
Optionally, before the acquiring the target data, the method further includes:
acquiring sample asset level data and sample characteristics;
constructing the asset level prediction model from the sample asset level data and the sample features.
Optionally, the step of constructing an asset level prediction model from the sample asset levels and the sample features comprises:
performing first preprocessing on the sample asset level data to obtain first asset level data;
constructing an initial asset level prediction model based on the first asset level data and the sample features;
inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
Optionally, before the acquiring the target data, the method further includes:
obtaining sample asset level data;
fitting a prediction curve according to the sample asset level data.
Optionally, the step of obtaining sample asset level data comprises:
acquiring initial asset level data;
and performing data cleaning on the initial asset level data to obtain sample asset level data.
Optionally, the step of fitting a prediction curve from the sample asset level data comprises:
performing second preprocessing on the sample asset level data to obtain second asset level data;
sequencing the second asset level data, and calculating quantiles corresponding to the second asset level data;
fitting a prediction curve based on the quantile and the second asset level data.
Optionally, the step of ranking the sample asset level data comprises:
dividing the sample asset level data into preset parts, and sequencing each part of the sample asset level data to obtain a local sequencing result corresponding to each part of the sample asset level data;
and merging the local sorting results.
Further, to achieve the above object, the present invention provides an asset level prediction apparatus comprising:
the acquisition module is used for acquiring target data;
the prediction model module is used for inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores and sequencing the target asset level scores to obtain corresponding sequencing results;
the prediction module is used for inputting the sequencing result into a prediction curve fitted in advance to obtain a prediction asset level corresponding to the prediction curve;
and the reduction module is used for reducing the predicted asset level to obtain a target asset level.
Optionally, the obtaining module is further configured to:
acquiring sample asset level data and sample characteristics;
constructing the asset level prediction model from the sample asset level data and the sample features.
Optionally, the obtaining module is further configured to:
performing first preprocessing on the sample asset level data to obtain first asset level data;
constructing an initial asset level prediction model based on the first asset level data and the sample features;
inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
Optionally, the obtaining module is further configured to:
obtaining sample asset level data;
fitting a prediction curve according to the sample asset level data.
Optionally, the obtaining module is further configured to:
acquiring initial asset level data;
and performing data cleaning on the initial asset level data to obtain sample asset level data.
Optionally, the obtaining module is further configured to:
performing second preprocessing on the sample asset level data to obtain second asset level data;
sequencing the second asset level data, and calculating quantiles corresponding to the second asset level data;
fitting a prediction curve based on the quantile and the second asset level data.
Optionally, the obtaining module is further configured to:
dividing the sample asset level data into preset parts, and sequencing each part of the sample asset level data to obtain a local sequencing result corresponding to each part of the sample asset level data;
and merging the local sorting results.
Further, to achieve the above object, the present invention also provides an asset level prediction apparatus comprising: a memory, a processor, and an asset level prediction program stored on the memory and executable on the processor, the asset level prediction program configured to implement the steps of the asset level prediction method as described above.
Further, to achieve the above object, the present invention also provides a storage medium having stored thereon an asset level prediction program which, when executed by a processor, implements the steps of the asset level prediction method as described above.
According to the asset level prediction method, device, equipment and storage medium provided by the embodiment of the invention, sample asset level data and sample characteristics are obtained, a prediction curve is fitted according to the sample asset level data, an asset level prediction model is established by the sample asset level data and the sample characteristics, the obtained target data is predicted by the asset level prediction model to obtain corresponding target asset level scores, the target asset level scores are sorted, corresponding sorting results are calculated, the sorting results are input into the prediction curve to obtain the prediction asset level corresponding to the prediction curve, and the prediction asset level is restored to obtain the target asset level. According to the method, the asset level prediction model is established, the accuracy of the asset level prediction model is optimized through the loss function, the asset level of the target user is predicted through the asset level prediction model, the asset level conditions corresponding to users with different occupation ratios are obtained through the fitted curve, the regression effect of the model is restored, the target asset level capable of describing the asset level of the user more truly is obtained, and the accuracy of asset level prediction is improved.
Drawings
FIG. 1 is a block diagram of an asset level prediction device for a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of an asset level prediction method of the present invention;
FIG. 3 is a Lorentzian distribution curve;
FIG. 4 is a technical flow diagram of one embodiment of an asset level prediction method of the present invention;
FIG. 5 is a functional block diagram of an embodiment of an asset level prediction method of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of an asset level prediction device of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the asset level prediction apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the asset level prediction device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and an asset level prediction program.
In the asset level prediction device shown in fig. 1, the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the asset level prediction apparatus of the present invention may be provided in the asset level prediction apparatus which calls the asset level prediction program stored in the memory 1005 through the processor 1001 and performs the asset level prediction method provided by the embodiment of the present invention.
An embodiment of the present invention provides an asset level prediction method, and referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of an asset level prediction method according to the present invention.
In this embodiment, the asset level prediction method includes:
step S10, acquiring target data;
step S20, inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores, and sequencing the target asset level scores to obtain corresponding sequencing results;
step S30, inputting the sorting result into a pre-fitted prediction curve to obtain a prediction asset level corresponding to the prediction curve;
and step S40, restoring the predicted asset level to obtain a target asset level.
The asset level prediction method is applied to asset level prediction equipment of financial institutions or financial wind control institutions such as bank systems. In this embodiment, the asset level is data that needs to be referred to in the financial wind control process, i.e., the asset rating of the user. When performing financial wind control, business personnel need to evaluate the financial condition of a user according to the asset condition of the user so as to provide service better. The asset level comprises the payroll income, credit card amount, accumulation level and the like of the user, but most users lack the real asset level indexes, for example, the A user only has the data of the accumulation level and has no other asset information, so that the user needs to acquire other asset indexes of the A user to more completely characterize the asset level of the user. Therefore, in order to obtain the asset indexes of the user, the asset level indexes of the missing user can be made up by training a regression model of a machine learning algorithm on the selected sample data and then predicting the asset level score of the target user based on the user characteristic prediction asset level.
In order to solve the problem of asset level loss, a target asset level score of a target user is obtained through a pre-constructed asset level prediction model, so that the predicted asset level score is as close as possible to an actual score, and then a sequencing result of the target asset level score is input into a pre-constructed prediction curve to repair a regression effect brought by an asset level prediction model, so that the asset level distribution predicted by the asset level prediction model is basically consistent with the actual asset level distribution, the effect of high and low asset level users is also ensured, a service can construct service rules and indexes according to the actual asset level distribution, and the consistency and stability of the service are ensured.
The respective steps will be described in detail below:
step S10, acquiring target data;
in one embodiment, target data to be predicted is obtained. It can be understood that, in order to predict the missing asset level, we need to obtain the target data of the missing user, and the target data provides basic information and asset information for the user, such as: the user's name, gender, age, income, occupation, number, etc.
Step S20, inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores, and sequencing the target asset level scores to obtain corresponding sequencing results;
in one embodiment, the target data is input into a pre-constructed asset level prediction model, the asset level of the sample is predicted through the asset level to obtain a corresponding target asset level score, and the target asset level scores are sorted to obtain a sorting result. Specifically, the target asset level of the target user is used as a Y value and input into an asset level prediction model for prediction to obtain corresponding target asset level scores, the scores are sorted from small to large one by one, and the percentage of the total position is calculated and used for inputting a fitting curve. It will be appreciated that the asset level score fitted by the model tends to be relatively close to the actual asset level, and the order preservation is relatively high, but only the values cluster around the mean, for example: filtering the assets scores of less than 1000 and more than 20000 in the monthly income of the target data, then rounding to 1-20 segments, and the score distribution predicted by the model is more concentrated to 4-6 segments and has larger deviation from the actual distribution. At the same time, the cumulative population distribution of the asset level may also deviate from reality, for example: in statistics, the actual asset level interval is 1-20 when the ordinate represents the asset level interval and the abscissa represents the percentile, but the score distribution interval of the asset level predicted by the model is 3-15, which is more aggregated, and is the regression effect of the model prediction, and the overall predicted value is concentrated to the mean value. Therefore, we need to repair the regression effect of the model by means of fitting curves.
Step S30, inputting the sorting result into a pre-fitted prediction curve to obtain a prediction asset level corresponding to the prediction curve;
in one embodiment, the calculated position percentage is input to a prediction curve to obtain a corresponding predicted asset level. It is understood that, after the prediction curve is fitted based on the sample data in step S20, the predicted asset level can be obtained by the fitted curve, and specifically, the position percentage is input as the x value of the prediction curve to obtain the corresponding y value, i.e. the predicted asset level.
And step S40, restoring the predicted asset level to obtain a target asset level.
In one embodiment, the predicted asset level obtained from the prediction curve is restored to obtain a target asset level. It will be appreciated that the predicted asset level derived from the curve is curve processed to yield a value that is not true. For example, what is obtained by sorting is position percentage data, the position percentage is input into a prediction curve as an x value, and a corresponding y value, that is, a prediction asset level, is obtained, but the y value is obtained through the curve and is different from a real value, for example, the curve is normalized when the value is input, and the corresponding obtained prediction asset level is also a processed numerical value and is not an actual value, so that if the real value is to be obtained, the prediction asset level needs to be restored, for example, the normalization processing is performed before, and then the inverse normalization processing is performed here to restore the real target asset level.
In the embodiment, target data is input into a pre-constructed asset level prediction model to predict the missing asset level scores of the users, so that the predicted target asset level scores are as close as possible to the actual scores, the target asset level scores are sorted, the sorting result is input into a fitting prediction curve, and the regression effect brought by the asset level prediction model is restored, so that the asset level distribution predicted by the model is basically consistent with the actual asset level distribution, and the prediction effect on the users with high and low asset levels is improved.
Further, based on the first embodiment of the asset level prediction method of the present invention, a second embodiment of the asset level prediction method of the present invention is proposed.
The second embodiment of the asset level prediction method differs from the first embodiment of the asset level prediction method in that, prior to the obtaining the target data, the method further comprises:
step S11, acquiring sample asset level data and sample characteristics;
step S12, constructing the asset level prediction model according to the sample asset level data and the sample characteristics.
In the embodiment, the asset level prediction model is constructed according to the sample asset level data and the sample characteristics by acquiring the sample asset level data and the sample characteristics, so as to predict the asset level of the target user.
The respective steps will be described in detail below:
step S11, acquiring sample asset level and sample characteristics;
in one embodiment, sample asset levels and sample characteristics are obtained. It can be understood that, for the purpose of prediction, we need to obtain the data of the sample first, and specifically, the asset level of the sample can be obtained according to the record that the user has transacted business in the past, such as: the annual income of the user A is 30 ten thousand, the monthly income of the user B is 5000 yuan, the equity fund of the user C is 3000 yuan per month, and the like; sample features, which are features used to help us derive the assets level, can be based on the base portrait derived from the user, the assets attributes of the user, the user's usage behavior, such as the user's name, gender, age, occupation, etc. For example, the occupation of the D user is the same as that of the a user, but the income data of the D user is missing, in an embodiment, the annual income of the D user can be also used as data for evaluating the assets level of the D user by 30 ten thousand, and of course, other characteristics can be also used for assisting judgment.
Step S12, constructing the asset level prediction model according to the sample asset level data and the sample characteristics.
In one embodiment, an asset level prediction model is constructed from sample asset level data and corresponding sample features. It can be understood that, since our objective is to obtain the asset level, in the case of incomplete data, the missing sample asset level prediction needs to be performed through the sample asset level and the sample characteristics, and an asset level prediction model can be constructed through a machine learning algorithm. Specifically, regression-based analysis is a method of a predictive modeling technology, and is used for researching a relationship between a dependent variable (target) and an independent variable (predictor) to obtain sample asset level data and sample characteristics, and then the method can be constructed by using methods such as linear regression, logistic regression, decision trees, random forests and the like, and is not limited herein. It will be appreciated that a model may be trained by a machine learning algorithm. Specifically, from the first asset level data and the sample features, a feature set and a label are extracted, a training set, a verification set (development set), and a test set are segmented, where the training set is used for training a model (learning of the model), the verification set is used for tuning parameters (optimization of the model), and the test set is used for practice (application of the model), and a specific segmentation method may use an leave-out method, a cross-validation method, and then an algorithm is selected, for example: and establishing an asset level prediction model by RandomForest, a GBDT (Gradient Boosting Decision Tree) Gradient lifting Tree, a LightGBM and AdaBoost.
Further, in one embodiment, the step of constructing an asset level prediction model based on the sample asset levels and the sample features comprises:
step S121, carrying out first preprocessing on the sample asset level data to obtain first asset level data;
step S122, an initial asset level prediction model is constructed based on the first asset level data and the sample characteristics.
Step S123, inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
step S124, confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
In the embodiment, the first pre-processing is performed on the sample asset level data to obtain the first asset level data, an initial asset level prediction model is constructed based on the first asset level data and the obtained sample characteristics, a predicted value is obtained from the initial asset level prediction model, the model is optimized through a loss function of the initial asset level prediction model and is evaluated, and the model obtained after the initial asset level prediction model is optimized is used as the asset level prediction model.
The respective steps will be described in detail below:
step S121, carrying out first preprocessing on the sample asset level data to obtain first asset level data;
in one embodiment, the acquired sample asset level data is subjected to a first preprocessing to obtain first asset level data. It can be understood that in the process of constructing the model, data needs to be processed, and like the data of the embodiment is usually financial data, the numerical value is large, and it is difficult to perform model training by directly using raw data. For example: the y-value interval of the sample is 1000-40000, the y-value interval is too large to be good for the model, and the ln, log2, log10 can be taken by other methods of narrowing the value range, such as log function, or by the maximum and minimum method, and we prefer to use ln for preprocessing in this embodiment.
Step S122, constructing an initial asset level prediction model based on the first asset level data and the sample characteristics;
in one embodiment, an asset level prediction model is constructed based on the first asset level data and the sample features. In one embodiment, an initial asset level prediction model is constructed based on the first asset level data and the sample features. It can be understood that due to the existence of the regression effect, the initial asset level model constructed by the method needs to be optimized to achieve a better prediction effect. Since business usage requires resolvability and is able to fine-tune, we use the current excellent performance Xgboost model for this purpose. Xgboost is an enhanced decision tree model, adopts the idea of random forest to sample fields, can prevent overfitting and reduce the calculated amount of the model, adds a regular term in a loss function to control the complexity of the model, and further can prevent overfitting of the model, so that the performance is stable, the interpretability is strong, and the operation efficiency is high.
Step S123, inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
in one embodiment, the first asset level data and the sample characteristics are input to an initial asset level prediction model, from which an asset level prediction value is output. It will be appreciated that in order to optimize the model, a prediction value is obtained and analyzed, and therefore, a predicted y-value is obtained after the first asset level data is taken as the y-value and the corresponding sample feature is input into the initial asset level prediction model.
Step S124, confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
In one embodiment, a model loss function is determined according to the first asset level data, then the model loss function is used for optimizing model generation, an evaluation index of the optimized model is calculated, and when the evaluation index meets requirements, namely a preset value, an asset level prediction model is obtained. In machine learning, all machine learning algorithms rely more or less on a process of maximizing or minimizing an objective function, and we often refer to the minimized function as a loss function, which is mainly used to measure the prediction ability of a machine learning model. Therefore, to optimize the initial asset level prediction model, we need to guide the model generation by a loss function, optimizing the gradient. Specifically, the calculation is performed by selecting a regression model loss function, for example: a mean square error loss function (MSE), a mean absolute error loss function (MAE), a Huber loss function, a Log-Cosh, and the like. However, if the wide MSE and MAE are used directly as the loss function, then the asset levels predicted after model training will be concentrated in the mean and median, so-called regression effects will occur, and the overall accuracy will not be too high. This is because the asset level distribution tends to be concentrated in the low-score interval, so we have a greater weight on the low-score interval in the model loss function selection, and the choice of the Tweedie optimization function can solve this problem. The Tweedie distribution is a superposition of poisson and gamma distributions, and different distribution forms are changed according to different p values, and referring to fig. 5, fig. 5 is a tween distribution diagram in the case that p is 1.8, which is similar to the asset level distribution. The Tweedie distribution is defined as follows:
var(Y)=δ2μp
Figure BDA0003422749500000111
Figure BDA0003422749500000112
after model optimization, the model also needs to be evaluated through an evaluation index, which can be evaluated by taking average Absolute Error (Mean Absolute Error), Mean Square Error (Mean Square Error), Root Mean Square Error (Root Mean Square Error), R Squared and the like as evaluation indexes, wherein the selection of the evaluation indexes depends on the problem to be solved, and a loss function can also be directly used as the evaluation indexes in some cases.
In the embodiment, the first asset level data more suitable for model construction is obtained by performing first preprocessing on the sample asset level data, and then the asset level prediction model is constructed according to the first asset level data and the sample characteristics. In order to optimize the model, in the process of constructing the asset level prediction model, an initial prediction model is constructed firstly, then model optimization is carried out through a loss function selected according to first asset level data, effect evaluation is carried out on the model through evaluation indexes, and when the evaluation indexes of the model reach preset values, the asset level prediction model is obtained. The embodiment carries out preprocessing aiming at the characteristics of the sample asset level data, and selects a proper algorithm and a loss function according to the sample asset level data, so that the accuracy of model prediction is improved.
Further, based on the first and second embodiments of the asset level prediction method of the present invention, a third embodiment of the asset level prediction method of the present invention is proposed.
The third embodiment of the asset level prediction method differs from the first and second embodiments of the asset level prediction method in that, before the target data is acquired, the method further comprises:
step S13, acquiring sample asset level data;
step S14, fitting a prediction curve according to the sample asset level data.
In one embodiment, a fitting of a prediction curve is performed based on the obtained sample asset level data. Wherein, after obtaining several discrete data by methods such as sampling, experiment, etc., from these data, we usually want to obtain a continuous function (i.e. curve) or a denser discrete equation fitting with the known data, which is called fitting (fitting). It can be understood that after the sample asset level data is obtained, if statistics is performed on all sample asset level data, the probability that the sample asset level of the sample falls within a certain interval can be obtained, for example, twenty percent of the user income per month is in the area of 2000-5000 yuan, and the asset level distribution of the sample and the sample satisfies a distribution curve. Fitting the prediction curve may be by curve fitting, i.e. fitting with a function, or by piecewise fitting, fitting with different low order polynomials on different segments, etc. Such as: drawing a scatter diagram of data on a coordinate axis, selecting a plurality of suitable curves for fitting respectively by observation, comparing, and fitting by using the curve, wherein the least square index J of the curve is the minimum as the best fitting curve.
Further, in one embodiment, the step of obtaining sample asset level data comprises:
step S131, acquiring initial asset level data;
in one embodiment, initial sample asset level data is obtained. When acquiring data, in one embodiment, sample data acquired from the business database may be directly used as sample asset level data, but in order to use better quality data for analytical prediction, the embodiment uses unprocessed data acquired directly as initial sample asset level data for further processing of the initial sample asset level data in subsequent processes.
And step S132, performing data cleaning on the initial asset level data to obtain sample asset level data.
In one embodiment, we perform data cleansing on the initial sample asset data obtained to obtain sample asset level data. It can be understood that after sample Data is obtained, there may be abnormal values (outliers), Duplicate Data (Duplicate Data) and noise Data in the sample Data in general. The noise data refers to data with errors or anomalies (deviations from expected values) in the data, such as random errors or variances of the measured variables. The data are interfered for analysis, and data cleaning is needed to be carried out on the acquired initial sample asset level to remove abnormal data in the data, and the cleaned data is used as the sample asset level. The data cleansing method may be various, for example: the monthly income of the sample A is 1000 ten thousand, the monthly income of the sample B is 0, and the data can be considered as unreasonable abnormal values, so the data can be removed; or the data of two A samples belongs to repeated data and needs to be subjected to deduplication processing; the noise data can be smooth and ordered data values by a box separation method, can also be visually observed and manually screened by drawing, can also be warned by setting a warning rule and manually processing abnormal values if the noise data is not in the rule range. It should be noted that sample asset level data also needs to be acquired when the asset level prediction model is constructed, and data cleaning can be performed to improve the quality of sample data.
Further, in one embodiment, the step of fitting a prediction curve based on the sample asset level data comprises:
step S141, carrying out second preprocessing on the sample asset level data to obtain second asset level data;
in one embodiment, the second pre-processing is performed on the acquired sample asset level data to obtain second asset level data prior to performing the curve fitting. Specifically, the second preprocessing is performed because some data may affect the fitting ability of the curve or affect the fitting speed. There may be many preprocessing modes, which are not described herein, and this embodiment may normalize the data, where the data normalization is to solve the dimension problem and map the data to the same scale. Extremum normalization, i.e., the sample value minus the sample minimum divided by the difference between the sample maximum and sample minimum, maps the data to between 0-1, is preferably used in this embodiment. Research and development personnel verify that the effect of using extremum normalization is better, and certainly, other normalization methods can also realize data normalization, such as mean square root normalization.
Step S142, sequencing the second asset level data, and calculating quantiles corresponding to the second asset level data;
in one embodiment, the obtained second asset level data is ranked and a ranked quantile is calculated. The Quantile (Quantile), also called Quantile, is a numerical point that divides the probability distribution range of a random variable into n equal parts, and the Quantile is a point in a continuous distribution function, and the point corresponds to the probability p. It will be appreciated that in practice, the overall distribution of these asset metrics tends to be in the form of a Lorentz distribution, and a typical application of the Lorentz distribution curve is to study the problem of unequal revenue and wealth allocation in a country or region, such as the Lorentz distribution curve shown in FIG. 3, where the horizontal axis represents the population percentage accumulation and the vertical axis represents the revenue percentage accumulation, so that a certain point represents that x% of people currently account for y% of the total revenue of the society, and the diagonal lines from the origin of coordinates to the other corresponding vertex of the square are equal lines, i.e., the absolute equal lines of revenue allocation, which generally do not exist. The actual revenue sharing curves, i.e., Lorentzian curves, are all to the right and below the line of equality, e.g., curves 1, 2. Therefore, to fit a prediction curve we can refer to the fitted lorentzian curve and perform similar processing, i.e. sorting and calculating quantiles, on the data.
Step S143, fitting a prediction curve based on the quantile and the second asset level data.
In one embodiment, a Lorentzian curve is fitted based on the quantiles and their corresponding asset levels. And after the quantile is obtained through calculation, performing curve fitting by using the quantile and the corresponding asset level. Specifically, a method of approximating discrete data with an analytical expression, or a least squares method. Because the indicators of asset level are found by research to be substantially lorentz-like curves, in one embodiment we fit a lorentz curve, although other distribution curves other than lorentz may be used.
It should be noted that, in this embodiment, the second pre-processed second asset level data is used as the sample asset level, that is, the original sample asset level data is not directly used, but the second pre-processed data is used. It should be noted that the above-mentioned first preprocessing method may be the same as the second preprocessing method, such as performing normalization processing, and the difference is that the first preprocessing is the sample asset level data processing performed by the model building process, and the second preprocessing is the data processing performed by the curve fitting process.
Further, in one embodiment, the step of ranking the sample asset level data comprises:
step a, dividing the sample asset level data into preset parts, and sequencing each part of the sample asset level data to obtain a local sequencing result corresponding to each part of the sample asset level data;
and b, merging the local sorting results.
In one embodiment, the sample asset level data is divided into preset numbers, then the sample asset level data of each number is locally sorted, and the results obtained by local sorting are spliced together. Since our data volume may be large, it is less efficient if sorting in this way. Assuming that there are 100 hundred million users in our data, if we directly arrange the data from 1 to 100 hundred million, it is very troublesome and time-consuming to compare in the ordering process, we can cut the data into 10 segments, divide the data into orders according to the size, if we normalize the data, we have the data interval between 0 and 1, so we can arrange the data 0-0.1 part, 0.1-0.2 part, 0.3-0.4 part … …, divide the data into ten parts. The original data is restored and sorted once every 10 hundred million, then the data is combined into 100 hundred million, the number of the data does not need to be so much, namely, the subproblems are decomposed, distributed sorting is carried out in ten threads, and finally, the sorted data is spliced back according to the labels, so that the sorting speed is improved.
Referring to fig. 4, fig. 4 is a technical flowchart of an embodiment of the present invention, explaining an asset level prediction method of an embodiment of the present invention, first inputting a sample and sample characteristics, that is, obtaining sample asset characteristic data and sample characteristics, making the sample asset characteristics as a Y value, performing cleaning and noise processing on the Y value, then dividing data processing into two parts, one part being to log the Y value, that is, performing first preprocessing, then setting a tweedie loss function based on an xgboost model, performing training of an asset level prediction model, determining accuracy of the model, and when the accuracy of the asset level prediction model reaches an expected or preset value, performing target user prediction using the model, that is, inputting target data into the asset level prediction model, obtaining asset level prediction scores, and performing level prediction score sorting; and the other part is used for normalizing the Y value of the sample, then sorting and calculating quantiles, fitting a Lorentz curve according to the quantiles to obtain parameters corresponding to the Lorentz curve, inputting the large-disk sorting value obtained by the asset level prediction model into the Lorentz curve obtained by fitting to obtain the Lorentz value, namely the predicted asset level, reducing the Lorentz value to a real asset level value, and outputting a result.
It should be noted that, before obtaining the target data, fitting the prediction curve and constructing the asset level prediction model may be performed simultaneously, or may be performed first, where the execution order is not limited herein, and when performing simultaneously, the sample asset level data may be obtained only once, and the sample characteristics for constructing the asset level prediction model may be obtained separately.
In addition, the prediction curve and the asset level prediction model can be deployed on the offline side and the online side, the online side means that a real-time service is provided, for example, a new user inputs the corresponding characteristics of the user into the asset level prediction model immediately, and then returns an asset level result, if the prediction curve and the asset level prediction model are deployed on the offline side, the real-time result is not returned, and data of a week can be input together at one time to obtain the corresponding asset level. The timeliness of the deployment on the off-line side and the on-line side is different, so that different efficiency and performance requirements can be realized, and the service is more stable.
In the embodiment, the acquired sample asset level data is optimized by performing the second preprocessing on the acquired sample asset level data, then the second asset level data obtained after the second preprocessing is sequenced, and the sample data set is divided into a plurality of parts and locally sequenced when the second asset level is sequenced, so that the rapid sequencing is realized, the quantiles of the second asset level are calculated, the predictive curve fitting is performed, the speed of the whole process of predictive curve fitting is improved, and a more optimal and more accurate curve can be obtained.
The invention also provides an asset level prediction device. Fig. 5 is a functional block diagram of an embodiment of the asset level prediction method of the present invention, as shown in fig. 5.
The asset level prediction device of the present invention comprises:
an obtaining module 10, configured to obtain target data;
a prediction model module 20, configured to input the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores, and sort the target asset level scores to obtain corresponding sorting results;
the prediction module 30 is configured to input the sorting result into a pre-fitted prediction curve to obtain a predicted asset level corresponding to the prediction curve;
and the reduction module 40 is used for reducing the predicted asset level to obtain a target asset level.
Optionally, the obtaining module is further configured to:
acquiring sample asset level data and sample characteristics;
constructing the asset level prediction model from the sample asset level data and the sample features.
Optionally, the obtaining module is further configured to:
performing first preprocessing on the sample asset level data to obtain first asset level data;
constructing an initial asset level prediction model based on the first asset level data and the sample features;
inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
Optionally, the obtaining module is further configured to:
obtaining sample asset level data;
fitting a prediction curve according to the sample asset level data.
Optionally, the obtaining module is further configured to:
acquiring initial asset level data;
and performing data cleaning on the initial asset level data to obtain sample asset level data.
Optionally, the obtaining module is further configured to:
performing second preprocessing on the sample asset level data to obtain second asset level data;
sequencing the second asset level data, and calculating quantiles corresponding to the second asset level data;
fitting a prediction curve based on the quantile and the second asset level data.
Optionally, the obtaining module is further configured to:
dividing the sample asset level data into preset parts, and sequencing each part of the sample asset level data to obtain a local sequencing result corresponding to each part of the sample asset level data;
and merging the local sorting results.
The invention also provides a storage medium.
The storage medium of the present invention has stored thereon an asset level prediction program which, when executed by a processor, implements the steps of the asset level prediction method as described above.
The method implemented when the asset level prediction program running on the processor is executed may refer to various embodiments of the asset level prediction processing method of the present invention, and details are not described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An asset level prediction method, characterized in that the asset level prediction method comprises the steps of:
acquiring target data;
inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores, and sequencing the target asset level scores to obtain corresponding sequencing results;
inputting the sequencing result into a pre-fitted prediction curve to obtain a prediction asset level corresponding to the prediction curve;
and restoring the predicted asset level to obtain a target asset level.
2. The asset level prediction method of claim 1, wherein prior to said obtaining target data, said method further comprises:
acquiring sample asset level data and sample characteristics;
constructing the asset level prediction model from the sample asset level data and the sample features.
3. The asset level prediction method according to claim 2, wherein the step of constructing an asset level prediction model based on the sample asset levels and the sample characteristics comprises:
performing first preprocessing on the sample asset level data to obtain first asset level data;
constructing an initial asset level prediction model based on the first asset level data and the sample features;
inputting the first asset level data and the sample characteristics into the initial asset level prediction model to obtain an asset level prediction value;
confirming a model loss function through the first asset level data, optimizing an initial asset level prediction model through the model loss function, calculating an evaluation index of the initial asset level prediction model, and confirming the asset level prediction model when the evaluation index reaches a preset value.
4. The asset level prediction method of claim 1, wherein prior to said obtaining target data, said method further comprises:
obtaining sample asset level data;
fitting the prediction curve according to the sample asset level data.
5. The method of asset level prediction according to claim 4, wherein said step of obtaining sample asset level data comprises:
acquiring initial asset level data;
and performing data cleaning on the initial asset level data to obtain the sample asset level data.
6. The method of asset level prediction according to claim 4, wherein said step of fitting said prediction curve based on said sample asset level data comprises:
performing second preprocessing on the sample asset level data to obtain second asset level data;
sequencing the second asset level data, and calculating quantiles corresponding to the second asset level data;
fitting the prediction curve based on the quantile and the second asset level data.
7. The method of asset level prediction according to claim 6, wherein said step of ranking said sample asset level data comprises:
dividing the sample asset level data into preset parts, and sequencing each part of the sample asset level data to obtain a local sequencing result corresponding to each part of the sample asset level data;
and merging the local sorting results.
8. An asset level prediction apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring target data;
the prediction model module is used for inputting the target data into a pre-constructed asset level prediction model to obtain corresponding target asset level scores and sequencing the target asset level scores to obtain corresponding sequencing results;
the prediction module is used for inputting the sequencing result into a prediction curve fitted in advance to obtain a prediction asset level corresponding to the prediction curve;
and the reduction module is used for reducing the predicted asset level to obtain a target asset level.
9. An asset level prediction device, characterized in that the device comprises: a memory, a processor, and an asset level prediction program stored on the memory and executable on the processor, the asset level prediction program configured to implement the steps of the asset level prediction method of any of claims 1 to 7.
10. A storage medium having stored thereon an asset level prediction program which, when executed by a processor, carries out the steps of the asset level prediction method according to any one of claims 1 to 7.
CN202111575934.4A 2021-12-21 2021-12-21 Asset level prediction method, device, equipment and storage medium Pending CN114239981A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575934.4A CN114239981A (en) 2021-12-21 2021-12-21 Asset level prediction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575934.4A CN114239981A (en) 2021-12-21 2021-12-21 Asset level prediction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114239981A true CN114239981A (en) 2022-03-25

Family

ID=80760937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575934.4A Pending CN114239981A (en) 2021-12-21 2021-12-21 Asset level prediction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114239981A (en)

Similar Documents

Publication Publication Date Title
CN108564286B (en) Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation
CN110400022B (en) Cash consumption prediction method and device for self-service teller machine
CN110738564A (en) Post-loan risk assessment method and device and storage medium
CN112116184A (en) Factory risk estimation using historical inspection data
CN110738527A (en) feature importance ranking method, device, equipment and storage medium
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN112734559A (en) Enterprise credit risk evaluation method and device and electronic equipment
CN113537807B (en) Intelligent wind control method and equipment for enterprises
CN116485020B (en) Supply chain risk identification early warning method, system and medium based on big data
CN114187120A (en) Vehicle insurance claim settlement fraud risk identification method and device
CN110689437A (en) Communication construction project financial risk prediction method based on random forest
CN111738504A (en) Enterprise financial index fund amount prediction method and device, equipment and storage medium
CA3186873A1 (en) Activity level measurement using deep learning and machine learning
CN111626855A (en) Bond credit interest difference prediction method and system
CN112116185A (en) Test risk estimation using historical test data
CN112037005B (en) Fusion method and device of score cards, computer equipment and storage medium
US20210090101A1 (en) Systems and methods for business analytics model scoring and selection
CN113283673A (en) Model performance attenuation evaluation method, model training method and device
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
CN114239981A (en) Asset level prediction method, device, equipment and storage medium
CN114626940A (en) Data analysis method and device and electronic equipment
CN112506930B (en) Data insight system based on machine learning technology
CN113988459A (en) Small and medium-sized enterprise growth assessment method and system based on electric power marketing data
CA3160715A1 (en) Systems and methods for business analytics model scoring and selection
Carrega et al. Data Streams for Unsupervised Analysis of Company Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination