CN114266643A - Enterprise mining method, device, equipment and storage medium based on fusion algorithm - Google Patents

Enterprise mining method, device, equipment and storage medium based on fusion algorithm Download PDF

Info

Publication number
CN114266643A
CN114266643A CN202111529428.1A CN202111529428A CN114266643A CN 114266643 A CN114266643 A CN 114266643A CN 202111529428 A CN202111529428 A CN 202111529428A CN 114266643 A CN114266643 A CN 114266643A
Authority
CN
China
Prior art keywords
enterprise
preset
features
value
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111529428.1A
Other languages
Chinese (zh)
Inventor
李潇
岳帅
吴艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Fuli Technology Co ltd
Original Assignee
Shanghai Fuli Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Fuli Technology Co ltd filed Critical Shanghai Fuli Technology Co ltd
Priority to CN202111529428.1A priority Critical patent/CN114266643A/en
Publication of CN114266643A publication Critical patent/CN114266643A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computers, and discloses an enterprise mining method, device, equipment and storage medium based on a fusion algorithm. The method comprises the following steps: acquiring target characteristic data of an enterprise to be mined; inputting the target characteristics into a preset enterprise classification model to obtain derived characteristics and corresponding characteristic importance; obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the corresponding feature importance; and finishing screening of the enterprises to be mined according to the preset distinguishing value, the preset accuracy and the target tag value. By the method, the target characteristic data are subjected to characteristic derivation, and the target label value, the preset accuracy and the preset distinguishing value are determined according to the derived characteristics and the corresponding characteristic importance, so that screening of the enterprise to be mined is completed, accurate identification can be performed when the data characteristics of the enterprise are in nonlinear distribution, and the accuracy and the bad client proportion of the enterprise during mining identification are improved.

Description

Enterprise mining method, device, equipment and storage medium based on fusion algorithm
Technical Field
The invention relates to the technical field of computers, in particular to an enterprise mining method, device, equipment and storage medium based on a fusion algorithm.
Background
In the credit application process of the small and micro enterprise customers, in order to ensure that the small and micro enterprise customers applying the credit are good customers (i.e. customers who can make a payment after applying for a loan), it is urgently required that the financial platform can design a set of model which can identify all bad customers (i.e. customers who do not make a payment after applying for a loan). The performance of the model for identifying bad customers is generally measured by using KS value (the absolute value of the maximum difference value between the identified number of bad customers and the identified number of good customers, and the value range is 0-1) and AUC value (the accuracy of the identified number of good customers and the accuracy of the identified number of bad customers, and the value range is 0-1). If the KS index is 1 and the AUC index is 1 at the same time, the model identifies all bad customers; KS index is 0, AUC index is 0 at the same time, which indicates that a bad client of the model is not identified. Through investigation, the model deployed by banks and financial institutions in the market is generally mined by a logistic regression model, the identified bad customers are low in proportion, the logistic regression scoring card model is a linear model, and the performance of the model is not optimal when the logistic regression scoring card model presents a real scene of nonlinear distribution for the customer data feature set of a small micro-enterprise.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an enterprise mining method, device, equipment and storage medium based on a fusion algorithm, and aims to solve the technical problem that the proportion of bad clients is not high when enterprises are mined in the prior art.
In order to achieve the purpose, the invention provides an enterprise mining method based on a fusion algorithm, which comprises the following steps:
acquiring target characteristic data of an enterprise to be mined;
inputting the target feature data into a preset enterprise training model to obtain derivative features and feature importance degrees corresponding to the derivative features;
obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features;
and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
Optionally, the inputting the target feature data into a preset enterprise training model to obtain derived features and feature importance degrees corresponding to the derived features includes:
carrying out coding derivation according to the target characteristic data to obtain initial derivation characteristics;
merging the initial derivative features and the target features in the target feature data to obtain derivative features;
and carrying out importance training on the derived features to obtain the derived features and feature importance corresponding to the derived features.
Optionally, the obtaining a preset discrimination value, a preset accurate value and a target tag value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features includes:
screening the derived features according to a preset importance threshold and the feature importance corresponding to the derived features to obtain training features and the feature importance corresponding to the training features;
and carrying out classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset distinguishing value, preset accuracy and a target label value corresponding to the enterprise to be mined.
Optionally, the performing classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset discrimination value, a preset accuracy and a target tag value corresponding to the enterprise to be mined includes:
acquiring time information of the training features;
partitioning the training features according to the time information and the preset time to obtain test features and verification features;
performing classification training according to the test features and feature importance degrees corresponding to the test features to obtain an initial discrimination value, initial accuracy and a target label value corresponding to the enterprise to be mined;
and verifying the initial distinguishing value and the initial accuracy according to the verification features and the feature importance degrees corresponding to the verification features to obtain a preset distinguishing value and a preset accuracy.
Optionally, the performing classification training according to the test features and the feature importance degrees corresponding to the test features to obtain an initial discrimination value, an initial accuracy, and a target tag value corresponding to the enterprise to be mined includes:
acquiring a training threshold;
performing regression training according to the test features and feature importance degrees corresponding to the test features to obtain a target label value;
performing regression classification on the target label value according to the training threshold value to obtain a target curve;
and determining an initial discrimination value and initial accuracy according to the target curve.
Optionally, the verifying the initial distinguishing value and the initial accuracy according to the verification feature and the feature importance corresponding to the verification feature to obtain a preset distinguishing value and a preset accuracy includes:
determining a verification curve according to the verification characteristics and the characteristic importance degrees corresponding to the verification characteristics;
determining a verification distinguishing value and verification accuracy according to the verification curve;
acquiring a distinguishing difference value of the initial distinguishing value and the verification distinguishing value;
obtaining an accurate difference value between the initial accuracy and the verification accuracy;
and if the distinguishing difference value and the accurate difference value are within the corresponding preset difference value range, taking the initial distinguishing value as a preset distinguishing value, and taking the initial accuracy as a preset accuracy.
Optionally, the screening of the to-be-mined enterprise according to the preset discrimination value, the preset accuracy and the target tag value is completed, including:
sequencing the enterprises to be mined according to the target tag values to obtain a sequencing result;
determining the percentage of the target enterprise according to the preset discrimination value and the preset accuracy;
and determining target enterprises in the enterprises to be mined according to the sequencing result and the target enterprise percentage, and finishing the screening of the enterprises to be mined.
In addition, in order to achieve the above object, the present invention further provides an enterprise mining apparatus based on a fusion algorithm, where the enterprise mining apparatus based on a fusion algorithm includes:
the acquisition module is used for acquiring target characteristic data of the enterprise to be mined;
the derivative module is used for inputting the target characteristic data into a preset enterprise training model to obtain derivative characteristics and characteristic importance degrees corresponding to the derivative characteristics;
the classification module is used for obtaining a preset discrimination value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance degrees corresponding to the derived features;
and the screening module is used for finishing screening of the enterprise to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
In addition, in order to achieve the above object, the present invention further provides an enterprise mining device based on a fusion algorithm, where the enterprise mining device based on the fusion algorithm includes: a memory, a processor, and a fusion algorithm based enterprise mining program stored on the memory and executable on the processor, the fusion algorithm based enterprise mining program configured to implement a fusion algorithm based enterprise mining method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, on which an enterprise mining program based on a fusion algorithm is stored, and when being executed by a processor, the enterprise mining program based on the fusion algorithm implements the enterprise mining method based on the fusion algorithm as described above.
The method comprises the steps of obtaining target characteristic data of an enterprise to be mined; inputting the target feature data into a preset enterprise training model to obtain derivative features and feature importance degrees corresponding to the derivative features; obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features; and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value. The method comprises the steps of performing characteristic derivation on target characteristic data of an enterprise to be mined, obtaining characteristic importance of the derived characteristics, and determining a corresponding target label value, preset accuracy and a preset distinguishing value according to the derived characteristics and the corresponding characteristic importance, so that screening of the enterprise to be mined is completed, the enterprise can be accurately identified when the data characteristics of the enterprise are in nonlinear distribution, and the accuracy of mining identification of the enterprise and the proportion of bad clients are improved.
Drawings
Fig. 1 is a schematic structural diagram of an enterprise mining device based on a fusion algorithm in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of the enterprise mining method based on the fusion algorithm according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the enterprise mining method based on fusion algorithm according to the present invention;
FIG. 4 is a graph diagram illustrating an embodiment of an enterprise mining method based on a fusion algorithm according to the present invention;
fig. 5 is a block diagram of a first embodiment of an enterprise mining device based on a fusion algorithm according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an enterprise mining device based on a fusion algorithm in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the enterprise mining device based on the fusion algorithm may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation of a fusion algorithm based enterprise mining device, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and an enterprise mining program based on a convergence algorithm.
In the enterprise mining device based on the fusion algorithm shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the enterprise mining device based on the fusion algorithm may be arranged in the enterprise mining device based on the fusion algorithm, and the enterprise mining device based on the fusion algorithm calls the enterprise mining program based on the fusion algorithm stored in the memory 1005 through the processor 1001 and executes the enterprise mining method based on the fusion algorithm provided by the embodiment of the present invention.
The embodiment of the invention provides an enterprise mining method based on a fusion algorithm, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the enterprise mining method based on the fusion algorithm.
In this embodiment, the enterprise mining method based on the fusion algorithm includes the following steps:
step S10: and acquiring target characteristic data of the enterprise to be mined.
It should be noted that the execution subject of the embodiment is a terminal device, and the terminal device can be a series of devices such as a computer and a mobile phone, which is not limited in this embodiment, however, the embodiment is described by taking a computer as an example, and an enterprise mining system based on a fusion algorithm exists in the terminal device, the terminal device obtains target feature data of the enterprise to be mined after receiving a service application of the enterprise to be mined, inputting the target characteristic data into a preset enterprise classification model to obtain derivative characteristics and characteristic importance degrees corresponding to the derivative characteristics, and based on the derived features and the feature importance corresponding to the derived features, and according to the derived features and the feature importance corresponding to the derived features, obtaining a preset discrimination value, a preset accuracy and a target label value through a preset enterprise classification model, therefore, screening of the enterprises to be mined is completed according to the preset distinguishing value, the preset accuracy and the target tag value.
It can be understood that the enterprise to be mined refers to an enterprise which submits loan applications to the financial platform, the target feature data refers to enterprise basic information and corresponding variable values, enterprise owner mobile phone software liveness and corresponding variable values and application time in modeling field information of the enterprise to be mined, and the modeling field information of the enterprise to be mined includes, but is not limited to, enterprise basic information of the enterprise to be mined, three-party credit scores of the enterprise to be mined, three-party abnormal details of the enterprise to be mined, and enterprise owner mobile phone software liveness of the enterprise to be mined. The basic information of the enterprise comprises but is not limited to personal information, financial information and loan information; the three-party credit score includes but is not limited to a Codun credit probability score, a Baidu credit probability score, a Hempur credit probability score and a Baiwei credit probability score; three-party exception details include, but are not limited to, dispute hits and forensic detail hits.
In the specific implementation, after receiving a loan application sent by an enterprise to be mined to a financial platform, the terminal device acquires target characteristic data of the enterprise to be mined, and stores the target characteristic data of the enterprise to be mined and a corresponding variable value in a data warehouse corresponding to a local data platform.
Step S20: and inputting the target characteristic data into a preset enterprise training model to obtain derivative characteristics and characteristic importance degrees corresponding to the derivative characteristics.
It should be noted that target feature data of all the enterprises to be mined are acquired, the target feature data and corresponding variables of the enterprises to be mined are input to a preset enterprise classification model, and derived features and feature importance degrees corresponding to the derived features are acquired, wherein the derived features include the target feature data and initial derived features derived from the target feature data.
It can be understood that the preset enterprise training model refers to a model which is obtained by performing sample feature data training in advance by using a Light Gradient Boosting Machine (Light weight Gradient elevator) nonlinear algorithm and an onehot encoder coding rule, and can perform feature derivation and feature importance calculation.
In the specific implementation, after the target characteristic data is obtained, the target characteristics in the target characteristic data, namely personal information, financial information, loan information and the activity of mobile phone software of a business owner, are input into a preset enterprise training model, so that the derived characteristics and the importance corresponding to the derived characteristics are obtained.
It should be noted that, in order to obtain accurate derived features and feature importance degrees corresponding to the derived features, the step of inputting the target feature data into a preset enterprise training model to obtain derived features and feature importance degrees corresponding to the derived features includes: carrying out coding derivation according to the target characteristic data to obtain initial derivation characteristics; merging the initial derivative features and the target features in the target feature data to obtain derivative features; and carrying out importance training on the derived features to obtain the derived features and feature importance corresponding to the derived features.
It can be understood that the initial derived features refer to features derived by encoding the preset enterprise training model according to four variables of personal information, financial information, loan information and activity of mobile phone software of a business owner, and the features only contain two new elements, namely 0 and 1.
In specific implementation, after the initial derived features are obtained, the initial derived features and the target features in the target feature data are combined to obtain derived features, and feature training is performed through a preset enterprise training model to obtain feature importance degrees corresponding to the derived features.
Step S30: and obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features.
It should be noted that the preset enterprise classification model refers to a model which is trained by using logistic regression algorithm to perform sample feature training and can obtain a label value corresponding to an enterprise.
It can be understood that after the derived features and the feature importance degrees corresponding to the derived features are obtained, the derived features and the feature importance degrees corresponding to the derived features are input into a preset enterprise classification model, a target tag value corresponding to an enterprise to be mined is obtained, a target curve, namely a receiver operating characteristic curve (ROC) curve is output, and a preset distinguishing value and preset accuracy are obtained according to the output ROC curve.
In a specific implementation, the preset discrimination value refers to an absolute value of a maximum difference between the identified number of bad customers and the identified number of good customers, and a value range is 0 to 1, and the preset accuracy refers to an accuracy of the identified number of good customers and the identified number of good customers, and a value range is 0 to 1.
Step S40: and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
It should be noted that after the preset distinguishing value, the preset accuracy and the target tag value are obtained, the screening of the enterprise to be mined is completed according to the target tag value.
It can be understood that, in order to obtain an accurate screening result, further, the screening of the enterprise to be mined according to the preset discrimination value, the preset accuracy and the target tag value is completed, including: sequencing the enterprises to be mined according to the target tag values to obtain a sequencing result; determining the percentage of the target enterprise according to the preset discrimination value and the preset accuracy; and determining target enterprises in the enterprises to be mined according to the sequencing result and the target enterprise percentage, and finishing the screening of the enterprises to be mined.
In the specific implementation, all the enterprises to be mined have corresponding target tag values, represent evaluation scores of the enterprises, sort the enterprises to be mined according to the target tag values to obtain a sorting result, determine a target enterprise percentage according to a preset distinguishing value and preset accuracy, wherein the target enterprise percentage refers to the proportion of good customers in the enterprises to be mined, determine the target enterprises in the enterprises to be mined according to the sorting result and the target enterprise percentage, and the target enterprises are the good customers, so that screening of the enterprises to be mined is completed. For example, the preset discrimination value is 0.426, the captured bad clients account for 66.8% of all the bad clients, that is, the last 25% of the enterprises to be mined in the ranking result are rejected, and the first 75% of the enterprises are target enterprises.
The method comprises the steps of obtaining target characteristic data of an enterprise to be mined; inputting the target feature data into a preset enterprise training model to obtain derivative features and feature importance degrees corresponding to the derivative features; obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features; and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value. The method comprises the steps of performing characteristic derivation on target characteristic data of an enterprise to be mined, obtaining characteristic importance of the derived characteristics, and determining a corresponding target label value, preset accuracy and a preset distinguishing value according to the derived characteristics and the corresponding characteristic importance, so that screening of the enterprise to be mined is completed, the enterprise can be accurately identified when the data characteristics of the enterprise are in nonlinear distribution, and the accuracy of mining identification of the enterprise and the proportion of bad clients are improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the enterprise mining method based on the fusion algorithm according to the present invention.
Based on the first embodiment, the step S30 in the enterprise mining method based on the fusion algorithm in this embodiment includes:
step S31: and screening the derived features according to a preset importance threshold and the feature importance corresponding to the derived features to obtain training features and the feature importance corresponding to the training features.
It should be noted that after the derived features and the feature importance degrees corresponding to the derived features are obtained, the derived features are screened according to a preset importance degree threshold, so as to obtain the training features and the feature importance degrees corresponding to the training features. For example, if the feature importance of the derived feature 1 is 72, the feature importance of the derived feature 2 is 68, the feature importance of the derived feature 3 is 52, the feature importance of the derived feature 4 is 40, the feature importance of the derived feature 5 is 4, the feature importance of the derived feature 6 is 3, and the preset importance threshold is 5, the training features are paired with the derived features 1 to 4, and the feature importance corresponding to the training features is obtained.
Step S32: and carrying out classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset distinguishing value, preset accuracy and a target label value corresponding to the enterprise to be mined.
It should be noted that after the training features and the feature importance degrees corresponding to the training features are obtained, logistic regression training is performed through a preset enterprise classification model according to the training features and the feature importance degrees corresponding to the training features to obtain target label values and target curves corresponding to the enterprises to be mined, and preset distinguishing values and preset accuracy degrees are obtained according to the target curves.
It can be understood that, in order to obtain accurate preset discrimination values and preset accuracy, further, the classification training is performed according to the training features and the feature importance degrees corresponding to the training features to obtain preset discrimination values, preset accuracy degrees and target tag values corresponding to the enterprises to be mined, including: acquiring time information of the training features; partitioning the training features according to the time information and the preset time to obtain test features and verification features; performing classification training according to the test features and feature importance degrees corresponding to the test features to obtain an initial discrimination value, initial accuracy and a target label value corresponding to the enterprise to be mined; and verifying the initial distinguishing value and the initial accuracy according to the verification features and the feature importance degrees corresponding to the verification features to obtain a preset distinguishing value and a preset accuracy.
In the specific implementation, as the enterprise to be mined corresponding to the training features has the corresponding application time, the time information corresponding to the training features is obtained, and the training features are partitioned according to the time information and the preset time (time for verifying training) to obtain the test features and the verification features. For example, if the application time information corresponding to the training feature 1 is 2018, 11/29/day, the application time information corresponding to the training feature 2 is 2018, 11/28/day, the application time information corresponding to the training feature 3 is 2018, 11/25/day, the application time information corresponding to the training feature 4 is 2018, 11/15/day, the application time information corresponding to the training feature 5 is 2018, 11/30/day, the application time information corresponding to the training feature 6 is 2018, 11/30/day, and the preset time is 2018, 11/30/day, the test feature is the training features 1 to 4, and the verification feature is the training features 5 to 6.
It should be noted that, classification training is performed according to the test features and the feature importance degrees corresponding to the test features, so as to obtain an initial discrimination value, an initial accuracy and a target label value. And verifying the initial distinguishing value and the initial accuracy according to the verification features and the feature importance degrees corresponding to the verification features to obtain a preset distinguishing value and a preset accuracy, and simultaneously obtaining a target tag value of the enterprise to be mined corresponding to the verification features.
It can be understood that, in order to obtain an accurate initial distinguishing value and initial accuracy, further, the classifying training is performed according to the test features and the feature importance degrees corresponding to the test features to obtain an initial distinguishing value, initial accuracy and a target tag value corresponding to the enterprise to be mined, including: acquiring a training threshold; performing regression training according to the test features and feature importance degrees corresponding to the test features to obtain a target label value; performing regression classification on the target label value according to the training threshold value to obtain a target curve; and determining an initial discrimination value and initial accuracy according to the target curve.
In particular implementations, the training threshold refers to a threshold for classifying a target label value. The target curve refers to a curve on a two-dimensional coordinate axis, the abscissa is a pseudo positive rate (FPR, FPR ═ FP/(FP + TN)), the predicted positive but actually negative samples account for the proportion of all negative samples, and the ordinate is a True positive rate (TPR, TPR ═ TP/(TP + FN)), the predicted positive and actually positive samples account for the proportion of all positive samples. For example, the training threshold is 0.6, positive classes are defined when the target label value is 0.6 or more, and negative classes are defined when the target label value is less than 0.6. Accordingly, a set of (FPR, TPR) points is calculated, and corresponding coordinate points are obtained in the plane. As the training threshold is gradually decreased, more and more enterprises to be excavated are classified into positive classes, but the positive classes are also doped with true negative enterprises to be excavated, i.e. the TPR and the FPR are increased simultaneously. When the threshold is maximum, the corresponding coordinate point is (0,0), and when the threshold is minimum, the corresponding coordinate point is (1, 1). In a truly ideal case, the TPR should be close to 1 and the FPR close to 0, the closer the target curve is to the (0,1) point, the better the deviation from the diagonal of 45 degrees.
It should be noted that the initial accuracy is an area under the target curve, and after the target curve is obtained, the initial division value is MAX | TPR-FPR |; when the threshold is decreased, the TPR and FPR are simultaneously decreased, and when the threshold is increased, the TPR and FPR are simultaneously increased.
It can be understood that, in order to obtain an accurate preset discrimination value and a preset accuracy, further, the verifying the initial discrimination value and the initial accuracy according to the verification feature and the feature importance corresponding to the verification feature may obtain the preset discrimination value and the preset accuracy, including: determining a verification curve according to the verification characteristics and the characteristic importance degrees corresponding to the verification characteristics; determining a verification distinguishing value and verification accuracy according to the verification curve; acquiring a distinguishing difference value of the initial distinguishing value and the verification distinguishing value; obtaining an accurate difference value between the initial accuracy and the verification accuracy; and if the distinguishing difference value and the accurate difference value are within the corresponding preset difference value range, taking the initial distinguishing value as a preset distinguishing value, and taking the initial accuracy as a preset accuracy.
In the specific implementation, a verification curve is determined according to the verification and the feature importance corresponding to the verification feature, a verification distinguishing value and verification accuracy are determined according to the verification curve, if the difference value between the verification distinguishing value and the initial distinguishing value is within a preset difference value range, the initial distinguishing value is used as the preset distinguishing value, and if the difference value between the verification accuracy and the initial accuracy is within the preset difference value range, the initial accuracy is used as the preset accuracy. As shown in fig. 4, if the difference values respectively corresponding to the accuracy and the discrimination value obtained according to the verification curve and the target curve are both within the preset difference value range, the initial discrimination value is used as the preset discrimination value, and the initial accuracy is used as the preset accuracy.
In this embodiment, the derived features are screened according to a preset importance threshold and the feature importance corresponding to the derived features, so as to obtain training features and the feature importance corresponding to the training features; and carrying out classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset distinguishing value, preset accuracy and a target label value corresponding to the enterprise to be mined. The derived features are screened through a preset importance threshold, so that the training features and feature importance corresponding to the training features are obtained for classification training, the risk that the generated multi-dimensional features are over-fitted is avoided, and the accuracy of the training process is improved.
In addition, referring to fig. 5, this embodiment further provides an enterprise mining device based on a fusion algorithm, where the enterprise mining device based on the fusion algorithm includes:
the acquisition module 10 is configured to acquire target feature data of an enterprise to be mined.
And the derivation module 20 is configured to input the target feature data into a preset enterprise training model to obtain derived features and feature importance degrees corresponding to the derived features.
And the classification module 30 is configured to obtain a preset discrimination value, a preset accuracy and a target tag value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features.
And the screening module 40 is used for finishing screening of the enterprise to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
The method comprises the steps of obtaining target characteristic data of an enterprise to be mined; inputting the target feature data into a preset enterprise training model to obtain derivative features and feature importance degrees corresponding to the derivative features; obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features; and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value. The method comprises the steps of performing characteristic derivation on target characteristic data of an enterprise to be mined, obtaining characteristic importance of the derived characteristics, and determining a corresponding target label value, preset accuracy and a preset distinguishing value according to the derived characteristics and the corresponding characteristic importance, so that screening of the enterprise to be mined is completed, the enterprise can be accurately identified when the data characteristics of the enterprise are in nonlinear distribution, and the accuracy of mining identification of the enterprise and the proportion of bad clients are improved.
In an embodiment, the derivation module 20 is further configured to perform encoding derivation according to the target feature data to obtain an initial derived feature;
merging the initial derivative features and the target features in the target feature data to obtain derivative features;
and carrying out importance training on the derived features to obtain the derived features and feature importance corresponding to the derived features.
In an embodiment, the classification module 30 is further configured to filter the derived features according to a preset importance threshold and feature importance corresponding to the derived features, so as to obtain training features and feature importance corresponding to the training features;
and carrying out classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset distinguishing value, preset accuracy and a target label value corresponding to the enterprise to be mined.
In an embodiment, the classification module 30 is further configured to obtain time information of the training features;
partitioning the training features according to the time information and the preset time to obtain test features and verification features;
performing classification training according to the test features and feature importance degrees corresponding to the test features to obtain an initial discrimination value, initial accuracy and a target label value corresponding to the enterprise to be mined;
and verifying the initial distinguishing value and the initial accuracy according to the verification features and the feature importance degrees corresponding to the verification features to obtain a preset distinguishing value and a preset accuracy.
In an embodiment, the classification module 30 is further configured to obtain a training threshold;
performing regression training according to the test features and feature importance degrees corresponding to the test features to obtain a target label value;
performing regression classification on the target label value according to the training threshold value to obtain a target curve;
and determining an initial discrimination value and initial accuracy according to the target curve.
In an embodiment, the classification module 30 is further configured to determine a verification curve according to the verification feature and a feature importance degree corresponding to the verification feature;
determining a verification distinguishing value and verification accuracy according to the verification curve;
acquiring a distinguishing difference value of the initial distinguishing value and the verification distinguishing value;
obtaining an accurate difference value between the initial accuracy and the verification accuracy;
and if the distinguishing difference value and the accurate difference value are within the corresponding preset difference value range, taking the initial distinguishing value as a preset distinguishing value, and taking the initial accuracy as a preset accuracy.
In an embodiment, the screening module 40 is further configured to sort the to-be-mined enterprises according to the target tag values to obtain a sorting result;
determining the percentage of the target enterprise according to the preset discrimination value and the preset accuracy;
and determining target enterprises in the enterprises to be mined according to the sequencing result and the target enterprise percentage, and finishing the screening of the enterprises to be mined.
Since the present apparatus employs all technical solutions of all the above embodiments, at least all the beneficial effects brought by the technical solutions of the above embodiments are achieved, and are not described in detail herein.
In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores an enterprise mining program based on a fusion algorithm, and the enterprise mining program based on the fusion algorithm implements the steps of the enterprise mining method based on the fusion algorithm as described above when executed by a processor.
Since the storage medium adopts all technical solutions of all the embodiments, at least all the beneficial effects brought by the technical solutions of the embodiments are achieved, and no further description is given here.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the enterprise mining method based on the fusion algorithm provided in any embodiment of the present invention, and are not described herein again.
Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An enterprise mining method based on a fusion algorithm is characterized in that the enterprise mining identification method based on the fusion algorithm comprises the following steps:
acquiring target characteristic data of an enterprise to be mined;
inputting the target feature data into a preset enterprise training model to obtain derivative features and feature importance degrees corresponding to the derivative features;
obtaining a preset distinguishing value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance corresponding to the derived features;
and finishing the screening of the enterprises to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
2. The enterprise mining method based on the fusion algorithm as claimed in claim 1, wherein the step of inputting the target feature data into a preset enterprise training model to obtain derived features and feature importance degrees corresponding to the derived features comprises:
carrying out coding derivation according to the target characteristic data to obtain initial derivation characteristics;
merging the initial derivative features and the target features in the target feature data to obtain derivative features;
and carrying out importance training on the derived features to obtain the derived features and feature importance corresponding to the derived features.
3. The enterprise mining method based on the fusion algorithm according to claim 1, wherein the obtaining of the preset discrimination value, the preset accuracy value and the target tag value corresponding to the enterprise to be mined based on the derived features and the feature importance corresponding to the derived features through a preset enterprise classification model comprises:
screening the derived features according to a preset importance threshold and the feature importance corresponding to the derived features to obtain training features and the feature importance corresponding to the training features;
and carrying out classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset distinguishing value, preset accuracy and a target label value corresponding to the enterprise to be mined.
4. The method as claimed in claim 3, wherein the performing classification training according to the training features and the feature importance degrees corresponding to the training features to obtain a preset discrimination value, a preset accuracy and a target label value corresponding to the to-be-mined enterprise comprises:
acquiring time information of the training features;
partitioning the training features according to the time information and the preset time to obtain test features and verification features;
performing classification training according to the test features and feature importance degrees corresponding to the test features to obtain an initial discrimination value, initial accuracy and a target label value corresponding to the enterprise to be mined;
and verifying the initial distinguishing value and the initial accuracy according to the verification features and the feature importance degrees corresponding to the verification features to obtain a preset distinguishing value and a preset accuracy.
5. The enterprise mining method based on the fusion algorithm as claimed in claim 4, wherein the performing classification training according to the test features and the feature importance degrees corresponding to the test features to obtain an initial discrimination value, an initial accuracy and a target label value corresponding to the enterprise to be mined comprises:
acquiring a training threshold;
performing regression training according to the test features and feature importance degrees corresponding to the test features to obtain a target label value;
performing regression classification on the target label value according to the training threshold value to obtain a target curve;
and determining an initial discrimination value and initial accuracy according to the target curve.
6. The enterprise mining method based on fusion algorithm as claimed in claim 4, wherein said verifying said initial discrimination value and said initial accuracy according to said verification feature and feature importance corresponding to said verification feature to obtain a preset discrimination value and a preset accuracy comprises:
determining a verification curve according to the verification characteristics and the characteristic importance degrees corresponding to the verification characteristics;
determining a verification distinguishing value and verification accuracy according to the verification curve;
acquiring a distinguishing difference value of the initial distinguishing value and the verification distinguishing value;
obtaining an accurate difference value between the initial accuracy and the verification accuracy;
and if the distinguishing difference value and the accurate difference value are within the corresponding preset difference value range, taking the initial distinguishing value as a preset distinguishing value, and taking the initial accuracy as a preset accuracy.
7. The enterprise mining method based on the fusion algorithm according to any one of claims 1 to 6, wherein the screening of the enterprise to be mined according to the preset discrimination value, the preset accuracy and the target tag value comprises:
sequencing the enterprises to be mined according to the target tag values to obtain a sequencing result;
determining the percentage of the target enterprise according to the preset discrimination value and the preset accuracy;
and determining target enterprises in the enterprises to be mined according to the sequencing result and the target enterprise percentage, and finishing the screening of the enterprises to be mined.
8. An enterprise mining device based on a fusion algorithm is characterized in that the enterprise mining device based on the fusion algorithm comprises:
the acquisition module is used for acquiring target characteristic data of the enterprise to be mined;
the derivative module is used for inputting the target characteristic data into a preset enterprise training model to obtain derivative characteristics and characteristic importance degrees corresponding to the derivative characteristics;
the classification module is used for obtaining a preset discrimination value, a preset accuracy and a target label value corresponding to the enterprise to be mined through a preset enterprise classification model based on the derived features and the feature importance degrees corresponding to the derived features;
and the screening module is used for finishing screening of the enterprise to be excavated according to the preset distinguishing value, the preset accuracy and the target tag value.
9. An enterprise mining device based on a fusion algorithm, the device comprising: a memory, a processor, and a fusion algorithm based enterprise mining program stored on the memory and executable on the processor, the fusion algorithm based enterprise mining program configured to implement the fusion algorithm based enterprise mining method of any one of claims 1 to 7.
10. A storage medium having stored thereon a fusion algorithm based enterprise mining program, which when executed by a processor implements the fusion algorithm based enterprise mining method according to any one of claims 1 to 7.
CN202111529428.1A 2021-12-14 2021-12-14 Enterprise mining method, device, equipment and storage medium based on fusion algorithm Pending CN114266643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111529428.1A CN114266643A (en) 2021-12-14 2021-12-14 Enterprise mining method, device, equipment and storage medium based on fusion algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111529428.1A CN114266643A (en) 2021-12-14 2021-12-14 Enterprise mining method, device, equipment and storage medium based on fusion algorithm

Publications (1)

Publication Number Publication Date
CN114266643A true CN114266643A (en) 2022-04-01

Family

ID=80827099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111529428.1A Pending CN114266643A (en) 2021-12-14 2021-12-14 Enterprise mining method, device, equipment and storage medium based on fusion algorithm

Country Status (1)

Country Link
CN (1) CN114266643A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943607A (en) * 2022-06-02 2022-08-26 支付宝(杭州)信息技术有限公司 Feature discovery method, attribute prediction method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943607A (en) * 2022-06-02 2022-08-26 支付宝(杭州)信息技术有限公司 Feature discovery method, attribute prediction method and device

Similar Documents

Publication Publication Date Title
CN108876133B (en) Risk assessment processing method, device, server and medium based on business information
CN110992167B (en) Bank customer business intention recognition method and device
WO2018166116A1 (en) Car damage recognition method, electronic apparatus and computer-readable storage medium
CN109949154B (en) Customer information classification method, apparatus, computer device and storage medium
CN112837069B (en) Block chain and big data based secure payment method and cloud platform system
CN109816200B (en) Task pushing method, device, computer equipment and storage medium
CN111915437A (en) RNN-based anti-money laundering model training method, device, equipment and medium
US10671831B2 (en) High speed reference point independent database filtering for fingerprint identification
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN114612251A (en) Risk assessment method, device, equipment and storage medium
CN111445058A (en) Data analysis method, device, equipment and computer readable storage medium
CN113553583A (en) Information system asset security risk assessment method and device
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN114266643A (en) Enterprise mining method, device, equipment and storage medium based on fusion algorithm
CN111191889A (en) Scoring card development method based on logistic regression and voting type model integration
CN114399367A (en) Insurance product recommendation method, device, equipment and storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN113011961A (en) Method, device and equipment for monitoring risk of company associated information and storage medium
CN115277205B (en) Model training method and device and port risk identification method
Tapia et al. Face feature visualisation of single morphing attack detection
CN114511022B (en) Feature screening, behavior recognition model training and abnormal behavior recognition method and device
CN115630708A (en) Model updating method and device, electronic equipment, storage medium and product
CN110472680B (en) Object classification method, device and computer-readable storage medium
CN115018625A (en) Credit fusion report generation method, device, equipment and storage medium
CN117058432B (en) Image duplicate checking method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination