CN117408736A - Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm - Google Patents

Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm Download PDF

Info

Publication number
CN117408736A
CN117408736A CN202311296560.1A CN202311296560A CN117408736A CN 117408736 A CN117408736 A CN 117408736A CN 202311296560 A CN202311296560 A CN 202311296560A CN 117408736 A CN117408736 A CN 117408736A
Authority
CN
China
Prior art keywords
model
training
prediction
parameter
fund demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311296560.1A
Other languages
Chinese (zh)
Inventor
姜树明
贾其辉
刘向阳
韩露
张艳青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Credit Jinqiao Small And Medium Sized Enterprise Development Service Co ltd
Qilu University of Technology
Original Assignee
Shandong Credit Jinqiao Small And Medium Sized Enterprise Development Service Co ltd
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Credit Jinqiao Small And Medium Sized Enterprise Development Service Co ltd, Qilu University of Technology filed Critical Shandong Credit Jinqiao Small And Medium Sized Enterprise Development Service Co ltd
Priority to CN202311296560.1A priority Critical patent/CN117408736A/en
Publication of CN117408736A publication Critical patent/CN117408736A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Fuzzy Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an enterprise fund demand mining method and medium based on an improved Stacking fusion algorithm, and belongs to the technical field of machine learning. The method specifically comprises the following steps: basic information of an enterprise to be mined is acquired, three prediction models are established, differential modeling is carried out, and the prediction models are trained; an improved Stacking model is established, three prediction models are used as a base learning model of a first layer of the Stacking model, and a kernel ridge regression model is used as an estimation model of a second layer of the Stacking model; training an improved Stacking model through training set data to obtain a fund demand prediction model; and inputting the test set data into a trained fund demand prediction model, setting the threshold value of the model to be 0.7, and setting the prediction result larger than 0.7 as a potential client with the fund demand. The condition of the fund demand of the current enterprise is mined through machine learning and training on the condition of the fund demand of the historical enterprise.

Description

Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm
Technical Field
The invention relates to an enterprise fund demand mining method and medium based on an improved Stacking fusion algorithm, and belongs to the technical field of machine learning.
Background
Stacking is a model fusion algorithm, the basic idea is to fuse the prediction results of several single models by one model, in order to reduce the generalization error of the single models, an efficient integration method, in which predictions generated using various machine learning algorithms are used as input to a second layer learning algorithm. The second layer algorithm is trained to optimally combine model predictions to form a new set of predictions. The current enterprise fund demand lacks a reliable and accurate mining mode, and the fusion effect of the Stacking algorithm can well solve the problem of prediction accuracy.
Disclosure of Invention
The invention aims to provide an enterprise fund demand mining method and medium based on an improved Stacking fusion algorithm, which are used for mining the fund demand situation of a current enterprise through machine learning and training on the situation of historical enterprise fund demands.
The invention aims to achieve the aim, and the aim is achieved by the following technical scheme:
step 1: basic information of an enterprise to be mined, including enterprise business information, recruitment information, judicial risk conditions, news public opinion, government purchasing information and project detail information, is obtained, a feature data set is preprocessed and constructed, and the feature data set is divided into a test set and a training set;
step 2: three prediction models are established, differential modeling is carried out, and the prediction models are trained; the predictive model includes: the system comprises a random forest model, a lightGBM model and an XGBoost model, wherein the characteristic screening mode of the random forest model is based on RFE characteristic screening, single model training is carried out through grid search optimization, the characteristic screening mode of the lightGBM model is based on the lightGBM characteristic screening, single model training is carried out through a Bayesian optimizer, the characteristic screening mode of the XGBoost model is based on XGBoost characteristic screening, and single model training is carried out through the Bayesian optimizer;
step 3: an improved Stacking model is established, three prediction models are used as a base learning model of a first layer of the Stacking model, and a kernel ridge regression model is used as an estimation model of a second layer of the Stacking model;
because the training results of different training samples are different under the same prediction model, weighting and summing are carried out according to the prediction accuracy of the base model, and model parameters are determined;
the method comprises the steps of obtaining a verification set from a first layer of base learning model results through five-fold cross verification, splicing 5 predicted output result longitudinal items on the verification set to serve as input features of a second layer, fusing a Stacking model and a single model Catboost to serve as an improved additional layer model of the Stacking model, carrying out weighted summation on estimation results of all models, and carrying out weighted summation on estimation results of all models; the model weights are distributed by using an exhaustion method, the prediction accuracy of the models of the two additional layer models under different weights is calculated respectively, the weight with the highest model prediction accuracy is selected as the weight coefficient of the model, and the sum of the weight coefficients of the two additional layer models is 1;
step 4: training an improved Stacking model through training set data to obtain a fund demand prediction model;
step 5: and inputting the test set data into a trained fund demand prediction model, setting the threshold value of the model to be 0.7, and setting the prediction result larger than 0.7 as a potential client with the fund demand.
Preferably, the specific mode of performing single model training by the grid search tuning is as follows:
determining tuning parameters and setting a parameter search space, wherein the parameter names and the initial value space are respectively the minimum samples contained in leaf nodes, the minimum sample number which can be divided by the nodes, the maximum leaf node number, the maximum depth of a decision tree, the proportion of evaluation samples, the number of classifiers and the maximum feature number;
the minimum sample parameter search space contained in the leaf node is (1-3), the minimum sample parameter search space separable by the node is (6-8), the maximum leaf node parameter search space is (None, 1, 5, 10), the decision tree maximum depth parameter search space is (10-15), and the estimated sample proportion parameter search space is (0.5, 0.6, 0.7);
model training, namely instantiating a model and an evaluator, taking the set parameter search space into a grid search to train the model, obtaining a model prediction result through an optimal result, and obtaining an optimal value of the search super parameter of the round through the optimal parameter;
adjusting the parameter search space, adjusting the parameter search space according to the value of the super parameter of the previous round, if the value is the maximum value of the parameter search space, increasing the value of the parameter search space, otherwise, reducing the value of the parameter search space, continuing model training, continuously iterating the process, and recording the optimal value of the super parameter and the prediction score of each iteration until the optimal solution of all the parameters is contained in the parameter space to stop iteration;
and substituting the optimal solutions of all the parameters into the model.
Preferably, the bayesian optimizer uses a TPE algorithm as a probabilistic proxy model and EI as an acquisition function.
Preferably, the specific way of performing single model training through the Bayesian optimizer is as follows:
defining a parameter space through a special dictionary form, wherein keys on key value pairs are arbitrarily set, the values of the key value pairs are hp functions, and parameters comprise learning rate, a mode of constructing decision trees, the number of leaves on each tree, maximum depth, regularization coefficients, the minimum number of records possibly possessed by the leaves, minimum gain for describing splitting and data proportion used in each iteration;
inputting the hp function into a TPE algorithm for optimization, training a fund demand prediction model by using training set data, obtaining a prediction result, and correcting the TPE algorithm according to the prediction result;
selecting the most potential super-parameter combination point from the corrected TPE algorithm by using an EI acquisition function;
setting the iteration number of the algorithm as 100, stopping algorithm execution after iteration is completed, and outputting the optimal super-parameter combination and the optimal value of the objective function.
Preferably, the formula of the TPE algorithm is specifically as follows:
where y represents the observed or measured objective function value,representing a threshold in the observation domain, +.>The value of the observation is represented by a value,representing observations +.>Less than->Density estimation of->Representing observations +.>Is greater than or equal to->Is a density composition of (a).
Preferably, the specific formula of the acquisition function EI is as follows:
wherein,a certain quantile representing the TPE algorithm for dividing +.>And->The range is between (0, 1), p (y) is the edge probability distribution;
preferably, the specific way of weighting the prediction accuracy according to the base model of the prediction model trained by different training samples under the same prediction model is as follows:
training the base model through the training set to obtain a training resultAnd calculating the prediction accuracy from the true value tags in the training set +.>Repeating training for five times and recording training result and corresponding prediction accuracy +.>
Prediction accuracy according to pre-trainingAccuracy prediction accuracy in five exercises>The proportion of the total sum of the precision values is used as the precision weight of the training;
training results for each base modelGiving weight, outputting result after giving weight->
The invention has the advantages that: the invention predicts enterprises with current time demand for funds by using a multilayer weighting fusion Stacking algorithm through the advantages of the Stacking fusion algorithm. Firstly, in the first layer of Stacking fusion, three models which adopt different optimizers and different feature screening are fused, and the learners trained by different training samples are weighted according to prediction accuracy under the same learner model; secondly, carrying out second-layer Stacking fusion, taking a Stacking model and a single model as additional layer models of the improved Stacking model, carrying out weighted summation on estimation results of the models, and mining potential customers with current demands on funds.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
FIG. 1 is a schematic flow chart provided in an embodiment of the present method application;
FIG. 2 is a schematic diagram of a conventional Stacking integration;
FIG. 3 is a specific training flow and a weighted calculation schematic diagram of training each base learner by using a single model according to an embodiment of the present application;
FIG. 4 is a schematic overall flow chart of multi-layer model fusion according to an embodiment of the present disclosure;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The enterprise fund demand mining method based on the improved Stacking fusion algorithm comprises the following steps:
step 1: basic information of an enterprise to be mined, including enterprise business information, recruitment information, judicial risk conditions, news public opinion, government purchasing information and project detail information, is obtained, a feature data set is preprocessed and constructed, and the feature data set is divided into a test set and a training set;
step 2: three prediction models are established, differential modeling is carried out, and the prediction models are trained; the predictive model includes: the system comprises a random forest model, a lightGBM model and an XGBoost model, wherein the characteristic screening mode of the random forest model is based on RFE characteristic screening, single model training is carried out through grid search optimization, the characteristic screening mode of the lightGBM model is based on the lightGBM characteristic screening, single model training is carried out through a Bayesian optimizer, the characteristic screening mode of the XGBoost model is based on XGBoost characteristic screening, and single model training is carried out through the Bayesian optimizer;
the feature screening method includes an embedding method and a packaging method (RFE), the embedding method including a LightGBM-based feature screening and an XGBoost-based feature screening.
Specifically, the embedding method is to perform feature selection for the performance of the model, and is a method for the algorithm to decide which features to use, i.e. feature selection and model training are performed simultaneously. Model training is carried out by using the LightGBM and XGBoost algorithm, and weight coefficients (between 0 and 1) of the features are obtained according to the performance of the model. The magnitude of the weight coefficient directly reflects the contribution degree of the feature to the model, and the larger the coefficient is, the more important the feature is. The 40 most important features were chosen for the performance based on the LightGBM model and XGBoost model, respectively.
The algorithm used for screening the features by the wrapping method is not an algorithm used for modeling, but a function specially used for feature screening, and the function is used for selecting the optimal feature subset, and the invention selects a recursive feature elimination method (Recursive feature elimination, abbreviated as RFE). The main idea of RFE is to select a set of initial features, train a model and calculate the importance of each feature in each iteration. The less important features are then deleted and the resulting subset is used as input for the next iteration, and the process is repeated until the desired number of features is reached. In the above procedure, the order in which the features are eliminated is the ordering of the features, and the present invention selects the 40 most important features in the dataset.
The grid search tuning is to sequentially adjust parameters according to steps in a designated parameter range, train a learner by using the adjusted parameters, and find the parameter with the highest precision on the verification set from all the parameters; the specific mode of the grid search tuning for single model training is as follows:
determining tuning parameters and setting a parameter search space, wherein the parameter names and the initial value space are respectively the minimum samples contained in leaf nodes, the minimum sample number which can be divided by the nodes, the maximum leaf node number, the maximum depth of a decision tree, the proportion of evaluation samples, the number of classifiers and the maximum feature number;
the minimum sample parameter search space contained in the leaf node is (1-3), the minimum sample parameter search space separable by the node is (6-8), the maximum leaf node parameter search space is (None, 1, 5, 10), the decision tree maximum depth parameter search space is (10-15), and the estimated sample proportion parameter search space is (0.5, 0.6, 0.7);
model training, namely instantiating a model and an evaluator, taking the set parameter search space into a grid search to train the model, obtaining a model prediction result through an optimal result, and obtaining an optimal value of the search super parameter of the round through the optimal parameter;
adjusting the parameter search space, adjusting the parameter search space according to the value of the super parameter of the previous round, if the value is the maximum value of the parameter search space, increasing the value of the parameter search space, otherwise, reducing the value of the parameter search space, continuing model training, continuously iterating the process, and recording the optimal value of the super parameter and the prediction score of each iteration until the optimal solution of all the parameters is contained in the parameter space to stop iteration;
and substituting the optimal solutions of all the parameters into the model.
The Bayesian optimizer adopts a TPE algorithm as a probability proxy model and EI as an acquisition function.
The specific mode of single model training by the Bayesian optimizer is as follows:
defining a parameter space through a special dictionary form, wherein keys on key value pairs are arbitrarily set, the values of the key value pairs are hp functions, and parameters comprise learning rate, a mode of constructing decision trees, the number of leaves on each tree, maximum depth, regularization coefficients, the minimum number of records possibly possessed by the leaves, minimum gain for describing splitting and data proportion used in each iteration;
inputting the hp function into a TPE algorithm for optimization, training a fund demand prediction model by using training set data, obtaining a prediction result, and correcting the TPE algorithm according to the prediction result;
selecting the most potential super-parameter combination point from the corrected TPE algorithm by using an EI acquisition function;
setting the iteration number of the algorithm as 100, stopping algorithm execution after iteration is completed, and outputting the optimal super-parameter combination and the optimal value of the objective function.
The TPE algorithm formula is specifically as follows:
where y represents the observed or measured objective function value,representing a threshold in the observation domain, +.>The value of the observation is represented by a value,representing observations +.>Less than->Density estimation of->Representing observations +.>Is greater than or equal to->Is a density composition of (a).
The specific formula of the acquisition function EI is as follows:
wherein,a certain quantile representing the TPE algorithm for dividing +.>And->The range is between (0, 1), p (y) is the edge probability distribution;
step 3: an improved Stacking model is established, three prediction models are used as a base learning model of a first layer of the Stacking model, and a kernel ridge regression model is used as an estimation model of a second layer of the Stacking model;
because the training results of different training samples are different under the same prediction model, weighting and summing are carried out according to the prediction accuracy of the base model, and model parameters are determined;
the method comprises the steps of obtaining a verification set from a first layer of base learning model results through five-fold cross verification, splicing 5 predicted output result longitudinal items on the verification set to serve as input features of a second layer, fusing a Stacking model and a single model Catboost to serve as an improved additional layer model of the Stacking model, carrying out weighted summation on estimation results of all models, and carrying out weighted summation on estimation results of all models; the model weights are distributed by using an exhaustion method, the prediction accuracy of the models of the two additional layer models under different weights is calculated respectively, the weight with the highest model prediction accuracy is selected as the weight coefficient of the model, and the sum of the weight coefficients of the two additional layer models is 1;
the invention carries out 5-fold cross validation on the basic learning model. Because the present invention employs 5-fold cross-validation, the training set will be divided into 5 shares. To be randomTraining of forest is exemplified by training random forest 5 times, selecting one time as verification set, namely training set each timeLine, verification set +.>And (3) row. After the first training of the random forest, the output on the validation set is denoted as a1 and the output on the test set is denoted as b1. The above procedure will be carried out 5 times, eventually yielding a1, a2, a3, a4, a5 and b1, b2, b3, b4, b 5. a1, a2, a3, a4 and a5 are output results of the random forest on the verification set after training, and the output results are spliced together to obtain A1, namely the result predicted on the complete original training set after training the random forest. b1, B2, B3, B4 and B5 are output results of the random forest on the test set after training, and B1 is obtained after weighting calculation is carried out on the random forest according to the prediction precision, namely the prediction result of the random forest on the complete original test set after training.
Wherein, the invention uses three basic models, and A1, A2, A3 and B1, B2 and B3 are obtained after the above operation. A1, A2, A3 are combined together as the training set of the next layer, and B1, B2, B3 are the test set of the next layer.
Step 4: training an improved Stacking model through training set data to obtain a fund demand prediction model;
step 5: and inputting the test set data into a trained fund demand prediction model, setting the threshold value of the model to be 0.7, and setting the prediction result larger than 0.7 as a potential client with the fund demand.
The specific mode for weighting the prediction accuracy of the prediction model trained by different training samples according to the base model under the same prediction model is as follows:
training the base model through the training set to obtain a training resultAnd calculating the prediction accuracy from the true value tags in the training set +.>Repeating training for five times and recording training result and corresponding prediction accuracy +.>
Prediction accuracy according to pre-trainingAccuracy prediction accuracy in five exercises>The proportion of the total sum of the precision values is used as the precision weight of the training;
training results for each base modelGiving weight, outputting result after giving weight->
Example 2
Embodiments of the present disclosure provide an enterprise fund demand mining apparatus based on an improved Stacking fusion algorithm, including a processor (processor) and a memory (memory). Optionally, the apparatus may further comprise a communication interface (Communication Interface) and a bus. The processor, the communication interface and the memory can complete communication with each other through the bus. The communication interface may be used for information transfer. The processor may invoke logic instructions in memory to perform the enterprise fund demand mining method of the above embodiments based on the modified Stacking fusion algorithm.
Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product.
The memory is used as a computer readable storage medium for storing a software program, a computer executable program, and program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor executes the program instructions/modules stored in the memory to perform the function applications and data processing, i.e., to implement the enterprise fund demand mining method based on the improved Stacking fusion algorithm in the above embodiments.
The memory may include a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the terminal device, etc. Further, the memory may include a high-speed random access memory, and may also include a nonvolatile memory.
Embodiments of the present disclosure provide a computer readable storage medium storing computer executable instructions configured to perform the above-described enterprise fund demand mining method based on an improved Stacking fusion algorithm.
The computer readable storage medium may be a transitory computer readable storage medium or a non-transitory computer readable storage medium.
Embodiments of the present disclosure may be embodied in a software product stored on a storage medium, including one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of a method according to embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium including: a plurality of media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or a transitory storage medium.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. An enterprise fund demand mining method based on an improved Stacking fusion algorithm is characterized by comprising the following specific steps:
step 1: basic information of an enterprise to be mined, including enterprise business information, recruitment information, judicial risk conditions, news public opinion, government purchasing information and project detail information, is obtained, a feature data set is preprocessed and constructed, and the feature data set is divided into a test set and a training set;
step 2: three prediction models are established, differential modeling is carried out, and the prediction models are trained; the predictive model includes: the system comprises a random forest model, a lightGBM model and an XGBoost model, wherein the characteristic screening mode of the random forest model is based on RFE characteristic screening, single model training is carried out through grid search optimization, the characteristic screening mode of the lightGBM model is based on the lightGBM characteristic screening, single model training is carried out through a Bayesian optimizer, the characteristic screening mode of the XGBoost model is based on XGBoost characteristic screening, and single model training is carried out through the Bayesian optimizer;
step 3: an improved Stacking model is established, three prediction models are used as a base learning model of a first layer of the Stacking model, and a kernel ridge regression model is used as an estimation model of a second layer of the Stacking model;
because the training results of different training samples are different under the same prediction model, weighting and summing are carried out according to the prediction accuracy of the base model, and model parameters are determined;
the method comprises the steps of obtaining a verification set from a first layer of base learning model results through five-fold cross verification, splicing 5 predicted output result longitudinal items on the verification set to serve as input features of a second layer, fusing a Stacking model and a single model Catboost to serve as an improved additional layer model of the Stacking model, carrying out weighted summation on estimation results of all models, and carrying out weighted summation on estimation results of all models; the model weights are distributed by using an exhaustion method, the prediction accuracy of the models of the two additional layer models under different weights is calculated respectively, the weight with the highest model prediction accuracy is selected as the weight coefficient of the model, and the sum of the weight coefficients of the two additional layer models is 1;
step 4: training an improved Stacking model through training set data to obtain a fund demand prediction model;
step 5: and inputting the test set data into a trained fund demand prediction model, setting the threshold value of the model to be 0.7, and setting the prediction result larger than 0.7 as a potential client with the fund demand.
2. The enterprise fund demand mining method based on the improved Stacking fusion algorithm of claim 1, wherein the specific manner of performing single model training for grid search tuning is as follows:
determining tuning parameters and setting a parameter search space, wherein the parameter names and the initial value space are respectively the minimum samples contained in leaf nodes, the minimum sample number which can be divided by the nodes, the maximum leaf node number, the maximum depth of a decision tree, the proportion of evaluation samples, the number of classifiers and the maximum feature number;
the minimum sample parameter search space contained in the leaf node is (1-3), the minimum sample parameter search space separable by the node is (6-8), the maximum leaf node parameter search space is (None, 1, 5, 10), the decision tree maximum depth parameter search space is (10-15), and the estimated sample proportion parameter search space is (0.5, 0.6, 0.7);
model training, namely instantiating a model and an evaluator, taking the set parameter search space into a grid search to train the model, obtaining a model prediction result through an optimal result, and obtaining an optimal value of the search super parameter of the round through the optimal parameter;
adjusting the parameter search space, adjusting the parameter search space according to the value of the super parameter of the previous round, if the value is the maximum value of the parameter search space, increasing the value of the parameter search space, otherwise, reducing the value of the parameter search space, continuing model training, continuously iterating the process, and recording the optimal value of the super parameter and the prediction score of each iteration until the optimal solution of all the parameters is contained in the parameter space to stop iteration;
and substituting the optimal solutions of all the parameters into the model.
3. The enterprise fund demand mining method based on the improved Stacking fusion algorithm of claim 1, wherein the bayesian optimizer employs TPE algorithm as a probabilistic proxy model and EI as an acquisition function.
4. The method for mining enterprise fund requirements based on the improved Stacking fusion algorithm of claim 3, wherein the single model training by the bayesian optimizer is specifically as follows:
defining a parameter space through a special dictionary form, wherein keys on key value pairs are arbitrarily set, the values of the key value pairs are hp functions, and parameters comprise learning rate, a mode of constructing decision trees, the number of leaves on each tree, maximum depth, regularization coefficients, the minimum number of records possibly possessed by the leaves, minimum gain for describing splitting and data proportion used in each iteration;
inputting the hp function into a TPE algorithm for optimization, training a fund demand prediction model by using training set data, obtaining a prediction result, and correcting the TPE algorithm according to the prediction result;
selecting the most potential super-parameter combination point from the corrected TPE algorithm by using an EI acquisition function;
setting the iteration number of the algorithm as 100, stopping algorithm execution after iteration is completed, and outputting the optimal super-parameter combination and the optimal value of the objective function.
5. The enterprise fund demand mining method based on the improved Stacking fusion algorithm of claim 4, wherein the TPE algorithm formula is specifically as follows:
wherein y representsThe observed or measured value of the objective function,representing a threshold in the observation domain, +.>Representing observations->Representing observations +.>Less than->Density estimation of->Representing observations +.>Is greater than or equal to->Is a density composition of (a).
6. The method for mining enterprise fund requirements based on the improved Stacking fusion algorithm of claim 5, wherein the specific formula of the collection function EI is as follows:
;
wherein,a certain quantile representing the TPE algorithm for dividing +.>And->Ranging between (0, 1), p (y) is the edge probability distribution.
7. The enterprise fund demand mining method based on the improved Stacking fusion algorithm of claim 1, wherein the specific way of weighting the prediction accuracy according to the base model of the prediction model trained by different training samples under the same prediction model is as follows:
training the base model through the training set to obtain a training resultAnd calculating the prediction accuracy from the true value tags in the training set +.>Repeating training for five times and recording training result and corresponding prediction accuracy +.>
Prediction accuracy according to pre-trainingAccuracy prediction accuracy in five exercises>The proportion of the total sum of the precision values is used as the precision weight of the training;
training results for each base modelGiving weight, outputting result after giving weight->
8. An enterprise fund demand mining apparatus based on an improved Stacking fusion algorithm, comprising a processor and a memory storing program instructions, characterized in that the processor is configured to execute the enterprise fund demand mining method based on an improved Stacking fusion algorithm as claimed in any one of claims 1 to 7 when running the program instructions.
9. A storage medium storing program instructions which, when executed, perform the enterprise fund demand mining method based on the improved Stacking fusion algorithm of any one of claims 1 to 7.
CN202311296560.1A 2023-10-09 2023-10-09 Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm Pending CN117408736A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311296560.1A CN117408736A (en) 2023-10-09 2023-10-09 Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311296560.1A CN117408736A (en) 2023-10-09 2023-10-09 Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm

Publications (1)

Publication Number Publication Date
CN117408736A true CN117408736A (en) 2024-01-16

Family

ID=89495387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311296560.1A Pending CN117408736A (en) 2023-10-09 2023-10-09 Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm

Country Status (1)

Country Link
CN (1) CN117408736A (en)

Similar Documents

Publication Publication Date Title
TWI769754B (en) Method and device for determining target business model based on privacy protection
US10713597B2 (en) Systems and methods for preparing data for use by machine learning algorithms
US20220092416A1 (en) Neural architecture search through a graph search space
CN111967971B (en) Bank customer data processing method and device
CN113220886A (en) Text classification method, text classification model training method and related equipment
CN111461225B (en) Customer clustering system and method thereof
CN110796485A (en) Method and device for improving prediction precision of prediction model
CN112632984A (en) Graph model mobile application classification method based on description text word frequency
CN111932091A (en) Survival analysis risk function prediction method based on gradient survival lifting tree
CN116522912B (en) Training method, device, medium and equipment for package design language model
CN115907775A (en) Personal credit assessment rating method based on deep learning and application thereof
CN117408736A (en) Enterprise fund demand mining method and medium based on improved Stacking fusion algorithm
CN115660720A (en) Cigarette sales prediction method and equipment
CN113469819A (en) Recommendation method of fund product, related device and computer storage medium
CN114648406A (en) User credit integral prediction method and device based on random forest
Chen et al. Evaluation of customer behaviour with machine learning for churn prediction: The case of bank customer churn in europe
CN110837847A (en) User classification method and device, storage medium and server
CN116862078B (en) Method, system, device and medium for predicting overdue of battery-change package user
CN113011476B (en) User behavior safety detection method based on self-adaptive sliding window GAN
CN117763393A (en) Data classification method and device based on neural network feature selection enhancement
US20200410369A1 (en) Data-driven cross feature generation
CN117436929A (en) Prediction method and device for user repurchase behavior
CN117094828A (en) Financial product recommendation method, apparatus, computer device and storage medium
CN116484857A (en) Text generation method, apparatus, computer device and storage medium
Stepniewski Detecting Innovative Companies using Web Mining and Ensemble Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination