CN117114812A - Financial product recommendation method and device for enterprises - Google Patents
Financial product recommendation method and device for enterprises Download PDFInfo
- Publication number
- CN117114812A CN117114812A CN202311121873.3A CN202311121873A CN117114812A CN 117114812 A CN117114812 A CN 117114812A CN 202311121873 A CN202311121873 A CN 202311121873A CN 117114812 A CN117114812 A CN 117114812A
- Authority
- CN
- China
- Prior art keywords
- model
- optimal
- clustering
- enterprise
- evaluation index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000011156 evaluation Methods 0.000 claims abstract description 127
- 238000013145 classification model Methods 0.000 claims abstract description 31
- 238000007477 logistic regression Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 14
- 238000010801 machine learning Methods 0.000 claims abstract description 13
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 238000012216 screening Methods 0.000 claims abstract description 4
- 238000004138 cluster model Methods 0.000 claims description 22
- 238000012163 sequencing technique Methods 0.000 claims description 15
- 230000000007 visual effect Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 description 7
- 230000004044 response Effects 0.000 description 7
- 238000011835 investigation Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000013210 evaluation model Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000010754 BS 2869 Class F Substances 0.000 description 1
- 101100261006 Salmonella typhi topB gene Proteins 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 244000309464 bull Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 235000019788 craving Nutrition 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000005058 diapause Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 101150032437 top-3 gene Proteins 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P80/00—Climate change mitigation technologies for sector-wide applications
- Y02P80/10—Efficient use of energy, e.g. using compressed air or pressurized fluid as energy carrier
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Technology Law (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a financial product recommendation method and a financial product recommendation device for enterprises, which relate to the technical field of business data processing and comprise the following steps: determining enterprise multi-level attribute tags, screening modeling variables corresponding to evaluation indexes from the enterprise multi-level attribute tags, and performing VIF (virtual linear function) co-linearity analysis on the modeling variables corresponding to each evaluation index to obtain modeling variables for a machine learning model; determining an optimal K value and an optimal clustering model to obtain a clustering output model, and obtaining a clustering result of the first evaluation index through the clustering output model; determining an optimal classification model and an optimal hyper-parameter combination, training to obtain a classification output model corresponding to each second evaluation index, and obtaining a classification result of the second evaluation index; outputting the passing probability of the financial product through a logistic regression model; and recommending the first N financial products with the highest probability to the enterprise. The application can accurately and efficiently recommend proper financial products to enterprises, improve the passing rate of the enterprises for applying the financial products and improve the user experience of the enterprises.
Description
Technical Field
The application relates to the technical field of business data processing, in particular to a financial product recommendation method and device for enterprises.
Background
The demand can be generated to external funds in the development and growth of enterprises, when the financial institutions measure the credit conditions of the enterprises at present, the conditions of unequal and incomplete information exist, so that the financial institutions are not dared to pay money and do not wish to pay money, and the financing demands of the enterprises, especially the financing demands of small micro enterprises without fixed asset mortgages, are difficult to meet. Therefore, to improve the situations of difficult and expensive financing of enterprises, the current situation that the information between the enterprises and the financial institutions is unequal and incomplete needs to be changed, the enterprise information is gathered from multiple dimensions so as to evaluate the credit status of the enterprises, and the financial products are efficiently and accurately recommended according to the credit status of the enterprises, so that the passing rate of the enterprises for applying for the financial products is improved, the user experience of the enterprises is improved, and the risk of the financial institutions on financing of the enterprises is also reduced.
Although the prior art has enterprise user portrait technology, the complete appearance of the enterprise operation status can be obtained through the technology, the label system and the learning model adopted by the prior user portrait technology can not accurately and effectively restore the complete portrait of the enterprise in the loan and finance fields, so that the recommendation of the financial products of the enterprise can not be accurately and efficiently achieved.
Disclosure of Invention
In view of the above-mentioned drawbacks or shortcomings in the prior art, the application provides a method and a device for recommending financial products for enterprises, which can more comprehensively measure the credit status of the enterprises through an innovative label system, an evaluation index and an evaluation index learning model, organically combine the credit status of the enterprises with a financial product recommending system, and intelligently match financial products with higher passing rate for the enterprises.
In one aspect of the present application, there is provided a financial product recommendation method for an enterprise, including the steps of:
determining a multi-level attribute tag of an enterprise according to the acquired enterprise operation data;
screening modeling variables corresponding to the evaluation indexes from the multi-level attribute labels, performing VIF (visual field) collinearity analysis on the modeling variables corresponding to each evaluation index, and removing modeling variables with higher correlation to obtain modeling variables for a machine learning model; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; the second evaluation index is the repayment capability of the enterprise;
determining optimal K values of a plurality of clustering models according to the profile coefficients, the DB indexes and the CH metric values, and comparing the profile coefficients, the DB indexes and the CH metric values of different clustering models under the optimal K values to determine an optimal clustering model; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model;
optimizing the hyper-parameter combinations of the two classification models to obtain an optimal hyper-parameter combination, taking the model with the largest KS value as an optimal two classification model, training to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal hyper-parameter combination, and obtaining a classification result of the second evaluation index through the two classification output models;
taking a logistic regression model as a recommendation model of the financial product, taking a clustering result of the first evaluation index, a classification result of the second evaluation index and related data of the financial product as input variables of the logistic regression model, and outputting the passing probability of the financial product by the logistic regression model;
and sequencing the passing probability of the financial products, and recommending the first N financial products with the highest probability to the enterprise, wherein N is greater than or equal to 1.
Further, the multi-level attribute tags include a primary tag divided according to a data source, a secondary tag divided according to a data dimension, and a tertiary tag divided according to a specific index.
Further, the clustering model comprises a Kmeans model, a SOM model, a DBSCAN model and a GMM model; the classification model includes an XGBoost model, a Catboost model, and a LightGBM model.
Further, the step of determining the optimal K values of the plurality of cluster models according to the contour coefficients, the DB indices and the CH metric values includes:
determining an initial range of K values in the clustering model;
calculating a contour coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
Further, the step of comparing the profile coefficient, the DB index and the CH metric value of different cluster models under the optimal K value to determine an optimal cluster model includes:
and comparing the profile coefficients, the DB indexes and the CH metric values of the various clustering models under the optimal K values, and taking the clustering model with the most optimal values of the profile coefficients, the DB indexes and the CH metric values as the optimal clustering model.
Further, the step of sorting the normalized profile coefficient, CH metric value and DB index from good to bad, correspondingly adding the sorted results, sorting the added results from good to bad, and using the K value corresponding to the optimal value of the added result as the optimal K value of the cluster model includes:
arranging the profile coefficient and the CH metric value after the standardization processing in a sequence from big to small, ordering the DB index after the standardization processing in a sequence from small to big, correspondingly adding the ordered results, and ordering the added results in a sequence from small to big;
and taking the K value corresponding to the minimum value in the added result as the optimal K value of the clustering model.
Further, the financial product recommendation method for enterprises further comprises the step of taking the smallest value among the profile coefficient, the DB index and the CH metric value of each clustering model under the optimal K value as the respective optimal value.
In another aspect of the present application, there is provided a financial product recommendation apparatus for an enterprise, including:
a first module configured to determine a multi-level attribute tag of an enterprise based on the acquired enterprise management data,
the second module is configured to screen modeling variables corresponding to the evaluation indexes from the multi-level attribute tags, perform VIF collinearity analysis on the modeling variables corresponding to each evaluation index, and remove modeling variables with higher correlation to obtain modeling variables for a machine learning model; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; the second evaluation index is the repayment capability of the enterprise;
the third module is configured to determine optimal K values of a plurality of clustering models according to the profile coefficients, the DB indexes and the CH metric values, and determine optimal clustering models by comparing the profile coefficients, the DB indexes and the CH metric values of different clustering models under the optimal K values; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model;
the fourth module is configured to optimize the super-parameter combinations of the two classification models to obtain an optimal super-parameter combination, take the model with the largest KS value as the optimal two classification model, train to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal super-parameter combination, and obtain the classification result of the second evaluation index through the two classification output models;
a fifth module configured to use a logistic regression model as a recommendation model of a financial product, and use a clustering result of the first evaluation index, a classification result of the second evaluation index, and related data of the financial product as input variables of the logistic regression model, wherein the logistic regression model outputs a passing probability of the financial product;
and a sixth module configured to sort the passing probability of the financial products, and recommend the first N financial products with the highest probability to the enterprise, wherein N is greater than or equal to 1.
Further, the third module is configured to:
determining an initial range of K values in the clustering model;
calculating a contour coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
Further, the third module is further configured to:
and comparing the profile coefficients, the DB indexes and the CH metric values of the various clustering models under the optimal K values, and taking the clustering model with the most optimal values of the profile coefficients, the DB indexes and the CH metric values as the optimal clustering model.
In summary, according to the financial product recommendation method and device for enterprises provided by the application, through the innovative label system, the evaluation index and the evaluation index learning model, the credit of the enterprises can be measured more comprehensively and accurately, and the financial products which are suitable for the enterprises and have higher passing rate can be recommended more efficiently and accurately.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a workflow diagram of a method of recommending financial products for an enterprise in accordance with one embodiment of the present application;
FIG. 2 is a flow chart of a method for recommending financial products for an enterprise according to one embodiment of the present application;
fig. 3 is a schematic diagram of a financial product recommendation device for enterprises according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe the acquisition modules, these acquisition modules should not be limited to these terms. These terms are only used to distinguish the acquisition modules from each other.
Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should be noted that, the terms "upper", "lower", "left", "right", and the like in the embodiments of the present application are described in terms of the angles shown in the drawings, and should not be construed as limiting the embodiments of the present application. In addition, in the context, it will also be understood that when an element is referred to as being formed "on" or "under" another element, it can be directly formed "on" or "under" the other element or be indirectly formed "on" or "under" the other element through intervening elements.
Referring to fig. 1, a description will be given below of a workflow of a financial product recommendation method for enterprises according to the present application.
Enterprise-related data may be stored typically on presto, mysql, hive, hbase, etc. platforms that store data. And extracting enterprise data from the data platform, wherein the enterprise data comprises enterprise industry and commerce, tax payment, financial reports, social security, development behaviors, complaints, monitoring in loans, repayment states, loan collection, credit investigation of enterprise owners, complaints, loan multiple heads and the like. Constructing an initial enterprise portrait of the enterprise according to the enterprise data, screening modeling variables and samples, performing VIF collinearity analysis, and removing variables with higher correlation to obtain variables used by a clustering model; determining classified K values by using evaluation indexes such as profile coefficients, DB indexes, CH metric values and the like; model training is carried out by using cluster models such as Kmeans++, SOM, DBSCAN, GMM and the like; and repeating the steps to sequentially generate enterprise evaluation model corresponding to each cluster model, and generating evaluation indexes such as enterprise growth, enterprise stability, enterprise profit, enterprise innovation ability and the like through the enterprise evaluation model. For the enterprise repayment capability labels, after drawing image data, the repayment data of the enterprise are drawn, modeling is carried out by using two classification models, including XGBoost, lightGBM, catboost and other models, parameter optimization is carried out by using the optuna, the optimal model and parameters are selected, after the optimal model is trained by the optimal parameters, the six grades are generated according to the descending order of the overdue probability of users, wherein the overdue probability of the class A users is the lowest, and the overdue probability of the class F users is the highest. And combining the first edition enterprise portrait and the evaluation index generated by the clustering model to generate the final user portrait of the enterprise.
Referring to fig. 2, one embodiment of the present application provides a financial product recommendation method for an enterprise, including the steps of:
step S101, determining the multi-level attribute tags of the enterprises according to the acquired enterprise operation data.
Specifically, after enterprise operation data is obtained, the enterprise operation data is divided into a first-level label according to a data source, a second-level label according to a data dimension and a third-level label according to a specific index.
When constructing a user portrait of an enterprise according to the enterprise data, the enterprise can extract data from a data warehouse, and the data warehouse comprises a presol, spark, mysql platform for storing data; the extracted data comprise business basic information, transaction billing data, tax declaration, financial conditions, judicial complaints, credit investigation information, loan bull, monitoring in loan and the like of enterprises. After the data extraction is finished, the data indexes of all the data sources are combed, and indexes for directly manufacturing the self-owned labels of enterprises are arranged, and the method comprises the following steps:
(1) The business information of the first-level label enterprise comprises a second-level label: enterprise base profiles, qualification ratings, operational risks and negative violations, asset outages and mortgages, government penalties, stakeholder personnel and investment situations, dong Gao personnel and risk situations, business changes, enterprise associated risks, corporate or stakeholder associated enterprise risk situation enterprise comprehensive scores.
(2) The billing information of the primary label enterprise comprises a secondary label: the sales transaction volume, the tendency of sales transaction, the sales frequency, the commodity sales condition, the commodity purchasing condition, the sales balance, the sales transaction abnormality, the false discrimination of sales invoices, the transaction effectiveness with the business industry, the sales single ticket face condition, the transaction counter party condition and the related transaction condition.
(3) The primary label tax declaration information comprises a secondary label: the total amount of value added tax sales revenue, the trend of value added tax sales revenue, the stability of value added tax sales revenue, the tax duration, the tax rate, the total amount of tax, the type of tax payer and tax rating, the current arrearage condition, the tax diapause condition and the tax rate.
(4) The first-level tag judicial complaint information comprises criminal case conditions, loan disputes, non-complaint security conditions, civil administration conditions and case overview.
(5) First-level label enterprise credit includes second-level label: credit investigation duration, enterprise credit investigation conditions, enterprise performance capability, enterprise debt repayment capability and enterprise multi-head application conditions.
(6) The credit and liability conditions of the primary label individuals comprise a secondary label: personal basic information, performance, funds craving, credit account duration, repayment capacity, credit quality.
(7) First-level label social security is paid, contains second grade label: social security payment overview, welfare payment conditions and timeliness of social security payment.
(8) Primary label financing conditions, comprising a secondary label: financing products, financing amount, financing period, financing amount, financing rate, and credit duration.
(9) Primary tag lending monitoring, comprising: real control person basic information, property conditions, other loan requirement conditions, personal risk conditions, enterprise risk conditions and customer evaluation.
(10) First-level label repayment performance includes second-level label: normal repayment condition, advanced repayment condition and overdue repayment condition.
The second-level label is a third-level label of the enterprise, such as industry, region, employee underwriter, registered funds, enterprise type, enterprise scale, etc. with the basic profile of the enterprise down.
Further, after the tag is combed, basic data processing, such as outlier processing, is performed. The data types of the tag are classified into three types of character type data, boolean type data and numerical type data:
(1) For character data such as the type of industry (manufacturing, finance, clothing wholesale, etc.), the data is directly used as a label;
(2) For Boolean data, if the data is a special new giant enterprise, the data is a label;
(3) For numerical data, such as the total amount of invoices of a business in the last year, the data is classified as being below a certain score according to the ten-score, twenty-score, thirty-score, forty-score, fifty-score, sixty-score, seventy-score, eighty-score, ninety-score and ninety-score+ of the data.
Through the above process, the establishment of the enterprise own tag system is completed.
Step S102, modeling variables corresponding to the evaluation indexes are screened out from the multi-level attribute labels, VIF collinearity analysis is carried out on the modeling variables corresponding to each evaluation index, and modeling variables with higher correlation are removed, so that modeling variables for a machine learning model are obtained; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; and the second evaluation index is the repayment capability of the enterprise.
The step S102 specifically includes the following sub-steps:
step S1021, selecting modeling variables corresponding to the evaluation indexes.
For example, in the first evaluation index, the growth of the enterprise can select labels such as the value-added tax response income ring ratio of the enterprise for 12 months, the income tax response income ring ratio of the enterprise for 12 months and the like; the enterprise stability can be selected from labels such as the change times of legal persons of the enterprise in the last 24 months, the client stability of the enterprise in the last 12 months, and the like; the enterprise profit can use the label such as sales transaction rate of the enterprise for 6 months, profit margin of the enterprise for one year; the innovation capability of the enterprise can use labels such as the soft-written number, the patent application number and the like of the enterprise for about 12 months. In the second evaluation index, the repayment capability of the enterprise can be selected from labels such as the date of the enterprise for 6 months and the average account entering running water.
And step S1022, performing VIF collinearity analysis on the modeling variables of each evaluation index, and removing variables with higher VIF values, namely variables with higher correlation, to obtain variables required by a final clustering model.
For example, in the enterprise stability, there is a coefficient of variation (coefficient of variation=standard deviation/mean) of the number of buyers per month of approximately 12 months, and the VIF value is calculated to be 17.5, which indicates that the variable has higher collinearity with other variables selected in the enterprise stability (i.e., the correlation between the variable and the other variables is higher), and the use of the variable in the modeling sample is eliminated.
Step S103, determining optimal K values of a plurality of clustering models according to the profile coefficients, the DB indexes and the CH metric values, and comparing the profile coefficients, the DB indexes and the CH metric values of different clustering models under the optimal K values to determine an optimal clustering model; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model.
Specifically, kmeans, SOM, DBSCAN, GMM and the like are selected as candidate cluster models, and optimal K values of a plurality of cluster models are determined according to the contour coefficients, the DB indexes and the CH metric values, wherein the method comprises the following steps:
firstly, determining an initial range of a K value in a clustering model;
calculating a profile coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
thirdly, sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and fourthly, taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
In an exemplary Kmeans clustering model, K values are selected from 2 to 9, and a contour coefficient, a DB index and a CH metric value corresponding to each K value are sequentially calculated. And setting the optimal value as 1 and the worst value as 8, and sequencing the profile coefficient, the CH metric value and the DB index from good to bad. The greater the profile coefficient and the CH metric value are, the more optimal, the greater the profile coefficient and the CH metric value are, the greater the ranking from big to small, the maximum value is set to 1, the minimum value is set to 1, the maximum value is set to 8, the 3 index results are added, the minimum value in the addition results is the optimal value, and the K value corresponding to the optimal value is selected as the optimal K value of the clustering model. Then, the profile coefficient, the DB index and the CH metric value of various clustering models under the optimal K value are compared, and the clustering model with the most optimal values of the profile coefficient, the DB index and the CH metric value is used as the optimal clustering model. Illustratively, three index values of one cluster model at the optimal K value are [ 3,5 ], three index values of another cluster model at the optimal K value are [ 4,4,3 ], if the smaller the value is, the better the result is, two of the three index values at the optimal value K of the first cluster model are 3, namely, the optimal value is two, the other cluster model has only one optimal value of 3, the first cluster model is the preferred model, and so on. And finally, training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, namely an enterprise growth model, an enterprise stability model, an enterprise profit model and an enterprise innovation ability model, and obtaining a clustering result of the first evaluation index through the clustering output model.
Step S104, optimizing the hyper-parameter combinations of the two classification models to obtain an optimal hyper-parameter combination, taking the model with the largest KS value as the optimal two classification model, training to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal hyper-parameter combination, and obtaining the classification result of the second evaluation index through the two classification output models.
Specifically, after performing VIF co-linearity analysis on the modeling variable of the second evaluation index to remove the variable with higher co-linearity, selecting a plurality of candidate bi-classification models for training, wherein the candidate bi-classification models can be an XGBoost model, a Catboost model, a LightGBM model and the like. Illustratively, with the KS value as an evaluation standard, the larger the KS value is, the stronger the classification ability of the two-classification model is. An automatic super-parameter optimization framework optuna based on Python is used for parameter optimization, the size of KS values is compared, a model with the largest KS value is selected as a final model, optimal parameters and the model are selected to train data again, grading is carried out according to probability values, the overdue probability is divided into six grades by using an optimal tree box, the six grades are A, B, C, D, E, F respectively, and the six grades are final enterprise repayment capability labels. It should be noted that the division into six levels is merely exemplary, and theoretically the number of levels can be arbitrarily divided, typically not more than 10.
The automatic hyper-parameter optimization framework based on Python is based on a Bayesian optimization algorithm, each test has a hyper-parameter combination, and the performance evaluation of a model constructed by the hyper-parameter combination is evaluated according to KS value. By modeling the selected distribution of the superparameters, continuously updating the distribution, and selecting the most likely to reach the optimal superparameter combination before evaluating the next superparameter combination, the optimal solution can be found more efficiently.
Step S105, using a logistic regression model as a recommendation model of the financial product, and using the clustering result of the first evaluation index, the classification result of the second evaluation index, and the related data of the financial product as input variables of the logistic regression model, wherein the logistic regression model outputs the passing probability of the financial product.
Specifically, the data is firstly divided into boxes, according to modeling experience, general data is divided into 5 and 6 boxes, then WOE evidence weight conversion is carried out on each box of data, the WOE calculation formula is WOE=ln ((bad/bad total)/(good/good total)), wherein bad and good respectively represent the number of bad samples (overdue clients) and good samples (non overdue clients) in the current box, bad sum good always respectively represents the number of bad samples (overdue clients) and good samples (non overdue clients) in all samples, and then the data of the current box is replaced by the calculated WOE value. For example, the income ring ratio of the value-added tax response of the enterprise for 12 months is divided into 6 boxes, wherein one box of data ranges are 0.2 and 0.4, and the WOE value of the box is calculated to be 0.183, so that the data in the box is replaced by 0.183.
The logistic regression model is used as a recommendation model of a financial product, the logistic regression model is used for training, the main optimized parameters are max_iter, C value and tol value, a range is set for each parameter, for example, max_iter is [10,100], C is [0.00001,0.0001,0.001.0.01,0.1], tol is [0.1,1,10,100,1000], gridSearchCV is used for parameter optimization, gridSearchCV algorithm can sequentially combine the values of max_iter, C value and tol value, AUC is used as an optimization target for training, the optimal parameters are output from all parameter combinations, and then the logistic regression model is trained by using the optimal parameters.
And after the logistic regression model is trained, the logistic regression model is on line after being qualified in evaluation. The modeling variables of the current customer are input into the recommendation model, and when the model is fed into the online model, the model outputs the passing probability of each funding party (namely financial products).
And S106, sorting the passing probability of the financial products, and recommending the first N financial products with the largest probability to the enterprise, wherein N is greater than or equal to 1.
Specifically, the fund parties (financial products) are ranked according to the probability output by the model, and the fund parties (financial products) with the probability of top3 passing are recommended preferentially.
According to the financial product recommendation method for the enterprises, through the innovative label system, the evaluation index and the evaluation index learning model, the enterprise credit can be measured more comprehensively and accurately, and financial products which are suitable for the enterprises and have higher loan passing rate can be recommended more efficiently and accurately.
Referring to fig. 3, another embodiment of the present application further provides a financial product recommendation device 200 for enterprises, including: a first module 201, a second module 202, a third module 203, a fourth module 204, a fifth module 205, a sixth module 206. The recommendation device 200 is configured to perform the steps of the method embodiments described above.
Specifically, the financial product recommendation device 200 for enterprises includes:
a first module 201 configured to determine a multi-level attribute tag of an enterprise according to the acquired enterprise operation data;
a second module 202, configured to screen modeling variables corresponding to the evaluation indexes from the multi-level attribute tags, perform VIF collinearity analysis on the modeling variables corresponding to each evaluation index, and remove modeling variables with higher correlation to obtain modeling variables for a machine learning model; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; the second evaluation index is the repayment capability of the enterprise;
a third module 203 configured to determine optimal K values of a plurality of the cluster models according to the profile coefficient, the DB index and the CH metric value, and determine an optimal cluster model by comparing the profile coefficient, the DB index and the CH metric value of different cluster models under the optimal K values; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model;
a fourth module 204, configured to optimize the superparameter combinations of the two classification models to obtain an optimal superparameter combination, and use the model with the largest KS value as the optimal two classification model, train to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal superparameter combination, and obtain the classification result of the second evaluation index through the two classification output models;
a fifth module 205, configured to use a logistic regression model as a recommendation model of a financial product, and use a clustering result of the first evaluation index, a classification result of the second evaluation index, and related data of the financial product as input variables of the logistic regression model, where the logistic regression model outputs a passing probability of the financial product;
a sixth module 206, configured to sort the passing probability sizes of the financial products, and recommend the first N financial products with the largest probability to the enterprise, where N is greater than or equal to 1.
Further, the third module 203 is further configured to:
determining an initial range of K values in the clustering model;
calculating a contour coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
Further, the third module 203 is further configured to:
and comparing the profile coefficients, the DB indexes and the CH metric values of the various clustering models under the optimal K values, and taking the clustering model with the most optimal values of the profile coefficients, the DB indexes and the CH metric values as the optimal clustering model.
It should be noted that, the technical solutions corresponding to the financial product recommendation device 200 for enterprises provided in this embodiment that may be used to execute the embodiments of the methods are similar to the methods in terms of implementation principle and technical effects, and are not repeated here.
The foregoing description is only of the preferred embodiments of the application. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present application is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.
Claims (10)
1. A financial product recommendation method for an enterprise, comprising:
determining a multi-level attribute tag of an enterprise according to the acquired enterprise operation data;
screening modeling variables corresponding to the evaluation indexes from the multi-level attribute labels, performing VIF (visual field) collinearity analysis on the modeling variables corresponding to each evaluation index, and removing modeling variables with higher correlation to obtain modeling variables for a machine learning model; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; the second evaluation index is the repayment capability of the enterprise;
determining optimal K values of a plurality of clustering models according to the profile coefficients, the DB indexes and the CH metric values, and comparing the profile coefficients, the DB indexes and the CH metric values of different clustering models under the optimal K values to determine an optimal clustering model; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model;
optimizing the hyper-parameter combinations of the two classification models to obtain an optimal hyper-parameter combination, taking the model with the largest KS value as an optimal two classification model, training to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal hyper-parameter combination, and obtaining a classification result of the second evaluation index through the two classification output models;
taking a logistic regression model as a recommendation model of the financial product, taking a clustering result of the first evaluation index, a classification result of the second evaluation index and related data of the financial product as input variables of the logistic regression model, and outputting the passing probability of the financial product by the logistic regression model;
and sequencing the passing probability of the financial products, and recommending the first N financial products with the highest probability to the enterprise, wherein N is greater than or equal to 1.
2. The method of claim 1, wherein the multi-level attribute tags include a primary tag divided by data source, a secondary tag divided by data dimension, and a tertiary tag divided by specific index.
3. The method of claim 1, wherein the cluster model comprises Kmeans model, SOM model, DBSCAN model, and GMM model; the classification model includes an XGBoost model, a Catboost model, and a LightGBM model.
4. The method of claim 1, wherein the determining optimal K values of the plurality of cluster models according to the profile factor, the DB index, and the CH metric comprises:
determining an initial range of K values in the clustering model;
calculating a contour coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
5. The method according to claim 4, wherein the step of determining the optimal cluster model by comparing the profile coefficient, DB index and CH metric values of the different cluster models at the optimal K value comprises:
and comparing the profile coefficients, the DB indexes and the CH metric values of the various clustering models under the optimal K values, and taking the clustering model with the most optimal values of the profile coefficients, the DB indexes and the CH metric values as the optimal clustering model.
6. The method according to claim 4, wherein the step of sorting the standardized profile coefficients, CH metric values, and DB indices from good to bad, adding the sorted results, and sorting the added results from good to bad, wherein the K value corresponding to the optimal value of the added result is used as the optimal K value of the cluster model, comprises:
arranging the profile coefficient and the CH metric value after the standardization processing in a sequence from big to small, ordering the DB index after the standardization processing in a sequence from small to big, correspondingly adding the ordered results, and ordering the added results in a sequence from small to big;
and taking the K value corresponding to the minimum value in the added result as the optimal K value of the clustering model.
7. The method according to claim 5, wherein the lowest value among the profile coefficient, the DB index and the CH metric value of each cluster model at the optimal K value is used as the optimal value.
8. A financial product recommendation device for an enterprise, comprising:
a first module configured to determine a multi-level attribute tag of an enterprise based on the acquired enterprise management data,
the second module is configured to screen modeling variables corresponding to the evaluation indexes from the multi-level attribute tags, perform VIF collinearity analysis on the modeling variables corresponding to each evaluation index, and remove modeling variables with higher correlation to obtain modeling variables for a machine learning model; wherein the evaluation index comprises a first evaluation index and a second evaluation index, the machine learning model comprises a clustering model for generating an evaluation result of the first evaluation index and a classification model for generating an evaluation result of the second evaluation index; the first evaluation index is enterprise growth, enterprise stability, enterprise profit and enterprise innovation ability; the second evaluation index is the repayment capability of the enterprise;
the third module is configured to determine optimal K values of a plurality of clustering models according to the profile coefficients, the DB indexes and the CH metric values, and determine optimal clustering models by comparing the profile coefficients, the DB indexes and the CH metric values of different clustering models under the optimal K values; training to obtain a clustering output model corresponding to each first evaluation index according to the optimal clustering model and the optimal K value, and obtaining a clustering result of the first evaluation index through the clustering output model;
the fourth module is configured to optimize the super-parameter combinations of the two classification models to obtain an optimal super-parameter combination, take the model with the largest KS value as the optimal two classification model, train to obtain two classification output models corresponding to each second evaluation index according to the optimal two classification model and the optimal super-parameter combination, and obtain the classification result of the second evaluation index through the two classification output models;
a fifth module configured to use a logistic regression model as a recommendation model of a financial product, and use a clustering result of the first evaluation index, a classification result of the second evaluation index, and related data of the financial product as input variables of the logistic regression model, wherein the logistic regression model outputs a passing probability of the financial product;
and a sixth module configured to sort the passing probability of the financial products, and recommend the first N financial products with the highest probability to the enterprise, wherein N is greater than or equal to 1.
9. The financial product recommendation device for an enterprise of claim 8, wherein the third module is further configured to:
determining an initial range of K values in the clustering model;
calculating a contour coefficient, a DB index and a CH metric value corresponding to each K value in the initial range, and performing data standardization processing;
sequencing the standardized profile coefficient, the CH metric value and the DB index from good to bad, correspondingly adding the sequenced results, and sequencing the added results from good to bad;
and taking the K value corresponding to the optimal value of the added result as the optimal K value of the clustering model.
10. The financial product recommendation device for an enterprise of claim 9, wherein the third module is further configured to:
and comparing the profile coefficients, the DB indexes and the CH metric values of the various clustering models under the optimal K values, and taking the clustering model with the most optimal values of the profile coefficients, the DB indexes and the CH metric values as the optimal clustering model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311121873.3A CN117114812A (en) | 2023-08-31 | 2023-08-31 | Financial product recommendation method and device for enterprises |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311121873.3A CN117114812A (en) | 2023-08-31 | 2023-08-31 | Financial product recommendation method and device for enterprises |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117114812A true CN117114812A (en) | 2023-11-24 |
Family
ID=88794551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311121873.3A Pending CN117114812A (en) | 2023-08-31 | 2023-08-31 | Financial product recommendation method and device for enterprises |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117114812A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118096223A (en) * | 2024-04-23 | 2024-05-28 | 紫金诚征信有限公司 | Financial product marketing method and device based on artificial intelligence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726749A (en) * | 2018-12-21 | 2019-05-07 | 齐鲁工业大学 | A kind of Optimal Clustering selection method and device based on multiple attribute decision making (MADM) |
CN112348654A (en) * | 2020-09-23 | 2021-02-09 | 民生科技有限责任公司 | Automatic assessment method, system and readable storage medium for enterprise credit line |
CN113837859A (en) * | 2021-08-25 | 2021-12-24 | 天元大数据信用管理有限公司 | Small and micro enterprise portrait construction method |
KR20220000475A (en) * | 2020-06-26 | 2022-01-04 | 미래에셋증권 주식회사 | System and method for recommendation of customized financial products |
CN115271442A (en) * | 2022-07-28 | 2022-11-01 | 江西省智能产业技术创新研究院 | Modeling method and system for evaluating enterprise growth based on natural language |
CN116226531A (en) * | 2023-02-28 | 2023-06-06 | 深圳微众信用科技股份有限公司 | Intelligent recommendation method for financial products of small and micro enterprises and related products |
-
2023
- 2023-08-31 CN CN202311121873.3A patent/CN117114812A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109726749A (en) * | 2018-12-21 | 2019-05-07 | 齐鲁工业大学 | A kind of Optimal Clustering selection method and device based on multiple attribute decision making (MADM) |
KR20220000475A (en) * | 2020-06-26 | 2022-01-04 | 미래에셋증권 주식회사 | System and method for recommendation of customized financial products |
CN112348654A (en) * | 2020-09-23 | 2021-02-09 | 民生科技有限责任公司 | Automatic assessment method, system and readable storage medium for enterprise credit line |
CN113837859A (en) * | 2021-08-25 | 2021-12-24 | 天元大数据信用管理有限公司 | Small and micro enterprise portrait construction method |
CN115271442A (en) * | 2022-07-28 | 2022-11-01 | 江西省智能产业技术创新研究院 | Modeling method and system for evaluating enterprise growth based on natural language |
CN116226531A (en) * | 2023-02-28 | 2023-06-06 | 深圳微众信用科技股份有限公司 | Intelligent recommendation method for financial products of small and micro enterprises and related products |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118096223A (en) * | 2024-04-23 | 2024-05-28 | 紫金诚征信有限公司 | Financial product marketing method and device based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Amani et al. | Data mining applications in accounting: A review of the literature and organizing framework | |
CN111861698B (en) | Pre-loan approval early warning method and system based on loan multi-head data | |
CN112700319A (en) | Enterprise credit line determination method and device based on government affair data | |
CN109146611B (en) | E-commerce product quality evaluation method and system | |
CN112613977A (en) | Personal credit loan admission credit granting method and system based on government affair data | |
CN110689437A (en) | Communication construction project financial risk prediction method based on random forest | |
CN112419030A (en) | Method, system and equipment for evaluating financial fraud risk | |
Hu | Predicting and improving invoice-to-cash collection through machine learning | |
CN117114812A (en) | Financial product recommendation method and device for enterprises | |
CN116468273A (en) | Customer risk identification method and device | |
CN114612239A (en) | Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence | |
Mousaeirad | Intelligent vector-based customer segmentation in the banking industry | |
CN113506173A (en) | Credit risk assessment method and related equipment thereof | |
CN117973846A (en) | Enterprise risk prediction method and system based on industrial chain | |
Güntay et al. | An explainable credit scoring framework: A use case of addressing challenges in applied machine learning | |
CN116304929A (en) | Financial manipulation recognition method and device based on A-stock market | |
Najadat et al. | Performance evaluation of industrial firms using DEA and DECORATE ensemble method. | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
Terzi et al. | Comparison of financial distress prediction models: Evidence from turkey | |
Gebru | Association pattern discovery of import export items in ethiopia | |
Bakhshi et al. | Developing a hybrid approach to credit priority based on accounting variables (using analytical network process (ANP) and multi-criteria decision-making) | |
Hytis et al. | Automated identification of fraudulent financial statements by analyzing data traces | |
Zhang et al. | Application of Adaboost Algorithm in Enterprise Financial Risk Analysis Model | |
CN118014372B (en) | Labour and capital dispute prediction method, equipment and storage medium based on one standard three facts | |
CN118333739A (en) | Method for constructing retail credit risk prediction model and retail credit business Scoremult model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |