CN110400215B - Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model - Google Patents
Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model Download PDFInfo
- Publication number
- CN110400215B CN110400215B CN201910700190.0A CN201910700190A CN110400215B CN 110400215 B CN110400215 B CN 110400215B CN 201910700190 A CN201910700190 A CN 201910700190A CN 110400215 B CN110400215 B CN 110400215B
- Authority
- CN
- China
- Prior art keywords
- data
- model
- credit
- evaluation
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000011156 evaluation Methods 0.000 claims abstract description 59
- 238000013210 evaluation model Methods 0.000 claims abstract description 48
- 238000010276 construction Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000007637 random forest analysis Methods 0.000 claims abstract description 21
- 238000005457 optimization Methods 0.000 claims abstract description 17
- 238000002790 cross-validation Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims description 32
- 230000009467 reduction Effects 0.000 claims description 14
- 238000000546 chi-square test Methods 0.000 claims description 12
- 238000004140 cleaning Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 230000002159 abnormal effect Effects 0.000 claims description 8
- 230000003542 behavioural effect Effects 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 7
- 238000010191 image analysis Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000008878 coupling Effects 0.000 claims description 6
- 238000010168 coupling process Methods 0.000 claims description 6
- 238000005859 coupling reaction Methods 0.000 claims description 6
- 230000000452 restraining effect Effects 0.000 claims description 5
- 230000006399 behavior Effects 0.000 description 19
- 230000006872 improvement Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a system for constructing a small micro-enterprise credit evaluation model oriented to an enterprise family, belongs to the field of enterprise credit evaluation, and aims to solve the technical problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise family. The method comprises the following steps: acquiring sample data; dividing the sample data into a training set and an evaluation set; constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model; training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm; and taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method. The system comprises a data acquisition module, a sample division module, a model construction module, a model training module and a model optimization module.
Description
Technical Field
The invention relates to the field of enterprise credit evaluation, in particular to a method and a system for constructing a small micro enterprise credit evaluation model facing an enterprise family.
Background
The small micro enterprises of the enterprise owner family play important roles in enlarging employment, improving folk life, promoting stability and the like, but the small micro enterprises cannot meet the credit giving policy of banks due to the problems of weak risk resistance, less quality-resisting deposit, irregular operation management, opaque credit information, high credit evaluation difficulty and the like, so that bank loans cannot be obtained to a great extent. Therefore, the small and micro enterprise credit evaluation method for the enterprise owner family is provided, credit evaluation is carried out from the enterprise owner and enterprise behaviors by means of large data of operators, bank credit standard is supplemented, bank loan passing rate of the small and micro enterprises is improved, and the small and micro enterprises are better served.
Based on the above, how to realize the credit evaluation of small micro enterprises facing the enterprise family is a technical problem to be solved.
Disclosure of Invention
Aiming at the defects, the technical task of the invention is to provide a method and a system for constructing a small micro-enterprise credit evaluation model oriented to enterprise families, so as to solve the problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise families.
In a first aspect, the present invention provides a method for constructing a small micro-enterprise credit assessment model facing to an enterprise family, including the following steps:
acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, and the plurality of credit data are from a sum credit and an internet platform of an operator;
dividing the sample data into a training set and an evaluation set;
constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and sets an influence weight, and takes the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
and taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method to obtain a final evaluation model.
In the above embodiment, the sample data includes two parts, one part is a feature index, the feature index is a credit assessment attribute extracted from the portrait feature of the small micro enterprise, the other part is credit data, the credit data is derived from the operator and is used as a credit sample of the small micro enterprise owner, and the internet data such as the credit data of the enterprise searching platform is used as a credit sample of the first sub-model; training and optimizing the two constructed sub-models through the sample data to obtain a final evaluation model, wherein the credit evaluation output by the evaluation model is the weighted sum of the two sub-models. The main credit and the behavior credit of the small micro-enterprises can be evaluated through the evaluation model.
Preferably, the sample data is preprocessed before being divided into a training set and an evaluation set;
the pretreatment comprises the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
Preferably, the feature index is subjected to dimension reduction treatment after the sample data are preprocessed;
the dimension reduction processing for the characteristic index comprises the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
Preferably, before sample data are acquired, a sample library is constructed, and portrait analysis is carried out on the sample library to obtain portrait features of the small micro enterprises;
the construction of the sample library comprises the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
and summarizing the fused data based on the data granularity to obtain a sample library.
Preferably, the operator data, the internet data and the industry data are collected through offline collection, real-time collection, crawlers and partner introduction modes.
Preferably, the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
In a second aspect, the present invention provides a small micro-enterprise credit assessment system for an enterprise family, comprising:
the data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small and micro enterprises, and the credit data come from operators, credit and Internet platforms;
the sample dividing module is used for dividing the collected sample data into a training set and an evaluation set;
the assessment model construction module is used for constructing an assessment model through a random forest method, wherein the assessment model consists of a business owner credit assessment sub-model and an enterprise behavior credit assessment sub-model, and is used for setting an influence weight, and taking the weighted sum of the business owner credit assessment sub-model and the enterprise behavior credit assessment sub-model credit as an assessment output of the assessment model;
the model training module is used for training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
and the model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model.
More preferably, the system also comprises a sample data processing module, wherein the sample data processing module comprises a data preprocessing module and a data dimension reduction module,
the data preprocessing module is used for preprocessing sample data through the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value in the current index value;
the data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
More preferably, the system further comprises a characteristic index construction module, wherein the characteristic index construction module comprises a sample library construction module and a portrait analysis module;
the sample library construction module is used for constructing a sample library through the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
the image analysis module is used for carrying out image analysis on the sample library to obtain image features.
Preferably, the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
The method and the system for constructing the enterprise family-oriented small micro enterprise credit assessment model have the following advantages: the method has the advantages that the current situation that various consideration of banks to credit small micro-enterprises and low bank loan passing rate are deeply analyzed, the small micro-enterprise credit of the enterprise owner family is accurately estimated, the credit estimation of the enterprise owner and enterprise behaviors is integrated, the bank credit standard is supplemented, the multidimensional financing channel is realized, and the problem that the small micro-enterprise financing is difficult and expensive is really solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for constructing a small micro enterprise credit assessment model for an enterprise family according to embodiment 1.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.
It should be appreciated that in the description of embodiments of the invention, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not for indicating or implying any relative importance or order. "plurality" in the embodiments of the present invention means two or more.
The embodiment of the invention provides a method and a system for constructing a small micro-enterprise credit evaluation model oriented to an enterprise family, which are used for solving the technical problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise family.
Example 1:
as shown in fig. 1, the method for constructing the enterprise family-oriented small micro enterprise credit assessment model comprises the following steps:
s100, acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, the plurality of credit data are obtained from a sum credit and internet platform of an operator, and the internet platform is used for enterprise investigation;
s200, dividing the sample data into a training set and an evaluation set;
s300, constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and setting an influence weight, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
s400, training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
s500, taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model through a grid search method to obtain a final evaluation model.
In this embodiment, the portrait features include: the operators mainly take 18-45 years old groups as home roof beams and columns, and are the main force of Chinese labor force; is full of vitality and confidence, and works for more than 8 hours per day; the registration time of the industry and commerce is 1 year or more; the distribution is carried out in three-four-five line cities; the number of staff is less than 5 people, and most of staff stores or relatives and friends are operated together; the operation scale is small, the transaction is frequent, the amount is not large, and the payment treasures/WeChat are used more; company name contains no keywords: the allied store, office, department of the market and branch are free from sales and financial index restrictions; the enterprise has good market prospect.
And carrying out credit assessment attribute extraction on the portrait features to obtain a plurality of characteristic indexes, wherein the characteristic indexes form a characteristic index set, and the characteristic indexes comprise, but are not limited to, identity information, position information, social information, consumption information, credit history, behavior information, business information and industry information.
After the sample data are obtained, the sample data are divided into a training set and an evaluation set, the training set is used for training an evaluation model, and the evaluation set is used for carrying out parameter optimization on the trained evaluation model to obtain a final evaluation model.
And constructing a business owner credit assessment sub-model and a business behavior credit assessment sub-model by a random forest method, wherein the two sub-models form an assessment model, characteristic indexes assessed by the business owner credit assessment sub-model comprise identity information (such as age), position information (such as day time residence, home city and small position movement track), behavior information (such as payment treasures/WeChat payments and business object class searching), consumption information (such as income class, consumption class, telephone traffic packages, star level users and whether fixed wages are provided or not), social information (such as social circle stability, social influence and social circle credit), and characteristic indexes assessed by the business behavior credit assessment sub-model comprise business information and industry information, such as business registration time, company name, financial index, operation scale, transaction type, transaction frequency, transaction amount, market prospect and the like.
The assessment output of the assessment model is a weighted sum of business owner credit assessment sub-model and business behavioral credit assessment sub-model credits.
As a first modification of the present embodiment, the sample data is preprocessed before being divided, specifically including the steps of:
(1) Filling the sample data with NULL values, wherein the filling index is the average value of all the sample data;
(2) Performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
(3) And carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
In view of the large number of feature indexes, the feature indexes are subjected to dimension reduction treatment to remove the feature indexes which have no significant relation with credit, and the method specifically comprises the following steps:
(1) Grouping the preprocessed sample data by a spark feature label grouping method, and separating the characteristic index data from the credit data;
(2) Calculating the importance of each characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
(3) Calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
The sample data is preprocessed through the steps, the characteristic indexes are screened to obtain processed sample data, and the constructed estimated friction model is trained and optimized through the processed sample data.
The method is used for improving the first improvement, and further comprises the steps of constructing a sample library, obtaining image features of the small micro-enterprises through image analysis of the sample library, and further obtaining a feature index set through credit assessment attribute extraction of the image features.
Wherein the construction of the sample library comprises the following steps:
(1) Acquiring operator data, internet data and industry data through offline acquisition, real-time acquisition, crawlers and a mode of partner introduction, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high isomerism principles;
(2) Carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
(3) Carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
(4) And summarizing the fused data based on the data granularity to obtain a sample library.
In the step (1), a data acquisition program is set, and the acquired sources are operator, internet and industry data; setting a data loading program, uniformly loading data of different sources into a data sharing platform, and uniformly storing the data in a file system or database and other forms.
In the step (2), the abnormal data is cleaned, wherein the abnormal data is represented by the null user number.
In the step (3), the data of the three sources are independently collected and loaded, and data fusion is required to be carried out through a data association means, and the data are integrated into a wide data table through user number or name association.
In the step (4), the operator data has small granularity and 5 minutes granularity, but in this embodiment, the data is required to have the granularity of month, so that the data is summarized, for example, the usage behaviors of the payment device with the granularity of 5 minutes are summarized as the usage times of the payment device with the granularity of month.
After the sample library is constructed through the steps, portrait analysis is carried out on the sample library, and portrait characteristics are obtained.
Example 2:
the invention discloses an enterprise family-oriented small micro enterprise credit assessment model construction system which comprises a data acquisition module, a sample division module, a model construction module, a model training module and a model optimization module.
The data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, the credit data come from the sum credit and internet platform of operators, and the internet platform is used for enterprise investigation.
The sample dividing module is used for dividing the acquired sample data into a training set and an evaluation set.
The model construction module is used for constructing an evaluation model through a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and is used for setting an influence weight, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as the evaluation output of the evaluation model.
The model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model.
Wherein, the portrait characteristic includes: the operators mainly take 18-45 years old groups as home roof beams and columns, and are the main force of Chinese labor force; is full of vitality and confidence, and works for more than 8 hours per day; the registration time of the industry and commerce is 1 year or more; the distribution is carried out in three-four-five line cities; the number of staff is less than 5 people, and most of staff stores or relatives and friends are operated together; the operation scale is small, the transaction is frequent, the amount is not large, and the payment treasures/WeChat are used more; company name contains no keywords: the allied store, office, department of the market and branch are free from sales and financial index restrictions; the enterprise has good market prospect.
The plurality of characteristic indicators comprise a set of characteristic indicators including, but not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
The model construction module constructs a business owner credit assessment sub-model and a business behavior credit assessment sub-model through a random forest method, wherein the two sub-models form an assessment model, characteristic indexes assessed by the business owner credit assessment sub-model comprise identity information (such as age), position information (such as daytime residence, home city and small position movement track), behavior information (such as payment treasures/WeChat payments and business object class searching), consumption information (such as income class, consumption class, telephone traffic package, star-class users and whether fixed wage release days exist), social information (such as social circle stability, social influence and social circle credit), and characteristic indexes assessed by the business behavior credit assessment sub-model comprise business information and industry information, such as business registration time, company name, financial index, operation scale, transaction type, transaction frequency, transaction amount, market prospect and the like.
The assessment output of the assessment model is a weighted sum of business owner credit assessment sub-model and business behavioral credit assessment sub-model credits.
The enterprise family oriented small micro enterprise credit assessment model construction system of the present embodiment can support the assessment model construction method disclosed in embodiment 1.
As a first improvement of the above embodiment, the building system further includes a sample data processing module, where the sample data processing module includes a data preprocessing module and a data dimension reduction module.
The data preprocessing module is used for preprocessing sample data by the following steps:
(1) Filling NULL values of sample data, wherein a filling index is the average value of the sample data;
(2) Performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
(3) And carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
The data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
The data acquisition module acquires sample data, then transmits the sample data to the data processing module, preprocesses the sample data through the data processing module, screens characteristic indexes, outputs final sample data, divides the final sample data through the sample dividing module, and obtains a training set and an evaluation set so as to train and optimize an evaluation model.
The first modified small micro-enterprise credit assessment model construction system for enterprise families according to this embodiment may support the first modified assessment model construction method disclosed in embodiment 1.
As a further improvement of the above improved embodiment, the construction system further includes a feature index construction module, and the feature index construction module includes a sample library construction module and a portrait analysis module.
The sample library construction module is used for constructing a sample library by the following steps:
(1) Acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
(2) Carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
(3) Carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
(4) And summarizing the fused data based on the data granularity to obtain a sample library.
The image analysis module is used for carrying out image analysis on the sample library to obtain image characteristics.
The characteristic index construction module is connected with the constructed data sharing platform and the data acquisition module, and is used for loading data from the data sharing platform and inputting the obtained characteristic index into the data acquisition module.
The second modified small micro-enterprise credit assessment model construction system for enterprise families of this embodiment can support the second modified assessment model construction method disclosed in embodiment 1.
The above-described embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.
Claims (2)
1. The method for constructing the enterprise family-oriented small micro enterprise credit assessment model is characterized by comprising the following steps of:
acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, and the plurality of credit data are from a sum credit and an internet platform of an operator;
dividing the sample data into a training set and an evaluation set;
constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and sets an influence weight, and takes the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method to obtain a final evaluation model;
preprocessing the sample data before dividing the sample data into a training set and an evaluation set;
the pretreatment comprises the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representation ofMinimum value, x in the current index value max Representing the maximum value in the current index value;
preprocessing sample data, and then performing dimension reduction processing on the characteristic indexes;
the dimension reduction processing for the characteristic index comprises the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value;
before sample data are acquired, a sample library is constructed, and portrait analysis is carried out on the sample library to obtain portrait features of the small and micro enterprises;
the construction of the sample library comprises the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
acquiring operator data, internet data and industry data in an offline acquisition, real-time acquisition, crawler and partner introduction mode;
the plurality of characteristic indicators include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
2. The enterprise family oriented small micro enterprise credit assessment model construction system is characterized by comprising:
the data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small and micro enterprises, and the credit data come from operators, credit and Internet platforms;
the sample dividing module is used for dividing the collected sample data into a training set and an evaluation set;
the model construction module is used for constructing an evaluation model through a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model and is used for setting an influence weight, and the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit score is used as evaluation output of the evaluation model;
the model training module is used for training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
the model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model;
the system also comprises a sample data processing module, wherein the sample data processing module comprises a data preprocessing module and a data dimension reduction module,
the data preprocessing module is used for preprocessing sample data through the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value in the current index value;
the data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value;
the system also comprises a characteristic index construction module, wherein the characteristic index construction module comprises a sample library construction module and a portrait analysis module;
the sample library construction module is used for constructing a sample library through the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
the image analysis module is used for carrying out image analysis on the sample library to obtain image characteristics;
the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910700190.0A CN110400215B (en) | 2019-07-31 | 2019-07-31 | Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910700190.0A CN110400215B (en) | 2019-07-31 | 2019-07-31 | Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110400215A CN110400215A (en) | 2019-11-01 |
CN110400215B true CN110400215B (en) | 2023-11-03 |
Family
ID=68326933
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910700190.0A Active CN110400215B (en) | 2019-07-31 | 2019-07-31 | Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110400215B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241746B (en) * | 2020-01-09 | 2024-01-26 | 深圳前海微众银行股份有限公司 | Forward model selection method, apparatus, and readable storage medium |
CN111241745B (en) * | 2020-01-09 | 2024-05-24 | 深圳前海微众银行股份有限公司 | Gradual model selection method, equipment and readable storage medium |
CN113537666B (en) * | 2020-04-16 | 2024-05-03 | 马上消费金融股份有限公司 | Evaluation model training method, evaluation and business auditing method, device and equipment |
CN111768298A (en) * | 2020-06-30 | 2020-10-13 | 中国建设银行股份有限公司 | Transaction data quota determining method, device, equipment and medium |
CN112017023A (en) * | 2020-07-15 | 2020-12-01 | 北京淇瑀信息科技有限公司 | Method and device for determining resource limit of small and micro enterprise and electronic equipment |
CN112232724B (en) * | 2020-12-17 | 2021-03-26 | 平安科技(深圳)有限公司 | Quantitative evaluation method, system, equipment and storage medium for personnel ability |
CN112633709A (en) * | 2020-12-26 | 2021-04-09 | 中国农业银行股份有限公司 | Enterprise credit investigation evaluation method and device |
CN112861056A (en) * | 2021-02-07 | 2021-05-28 | 杭州云搜网络技术有限公司 | Enterprise website construction information display and release system and method |
CN113269514A (en) * | 2021-05-13 | 2021-08-17 | 企家有道网络技术(北京)有限公司 | Enterprise health degree measuring method, device and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105719073A (en) * | 2016-01-18 | 2016-06-29 | 苏州汇誉通数据科技有限公司 | Enterprise credit evaluation system and method |
CN107392456A (en) * | 2017-07-14 | 2017-11-24 | 武汉理工大学 | A kind of multi-angle rating business credit modeling method for merging internet information |
CN108256993A (en) * | 2017-12-29 | 2018-07-06 | 浪潮天元通信信息系统有限公司 | A kind of credit score appraisal procedure and credit score Evaluation Platform |
CN108550077A (en) * | 2018-04-27 | 2018-09-18 | 信雅达系统工程股份有限公司 | A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data |
CN109002839A (en) * | 2018-06-22 | 2018-12-14 | 杭州电子科技大学 | Efficient feature selection method under a kind of more attributive character environment |
CN109409677A (en) * | 2018-09-27 | 2019-03-01 | 深圳壹账通智能科技有限公司 | Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium |
CN109685526A (en) * | 2018-12-12 | 2019-04-26 | 税友软件集团股份有限公司 | A kind of method for evaluating credit rating of enterprise, device and relevant device |
CN110046984A (en) * | 2019-03-01 | 2019-07-23 | 安徽省优质采科技发展有限责任公司 | Enterprise credit risk system and evaluation method |
-
2019
- 2019-07-31 CN CN201910700190.0A patent/CN110400215B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105719073A (en) * | 2016-01-18 | 2016-06-29 | 苏州汇誉通数据科技有限公司 | Enterprise credit evaluation system and method |
CN107392456A (en) * | 2017-07-14 | 2017-11-24 | 武汉理工大学 | A kind of multi-angle rating business credit modeling method for merging internet information |
CN108256993A (en) * | 2017-12-29 | 2018-07-06 | 浪潮天元通信信息系统有限公司 | A kind of credit score appraisal procedure and credit score Evaluation Platform |
CN108550077A (en) * | 2018-04-27 | 2018-09-18 | 信雅达系统工程股份有限公司 | A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data |
CN109002839A (en) * | 2018-06-22 | 2018-12-14 | 杭州电子科技大学 | Efficient feature selection method under a kind of more attributive character environment |
CN109409677A (en) * | 2018-09-27 | 2019-03-01 | 深圳壹账通智能科技有限公司 | Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium |
CN109685526A (en) * | 2018-12-12 | 2019-04-26 | 税友软件集团股份有限公司 | A kind of method for evaluating credit rating of enterprise, device and relevant device |
CN110046984A (en) * | 2019-03-01 | 2019-07-23 | 安徽省优质采科技发展有限责任公司 | Enterprise credit risk system and evaluation method |
Non-Patent Citations (3)
Title |
---|
小微企业信用综合评级模型的构建;刘敏;《西南师范大学学报(自然科学版)》;20170920(第09期);全文 * |
蒋卫祥."大数据机器学习系统".《大数据时代计算机数据处理技术探究》.2019, * |
黄冬梅 等."面向大数据的机器学习算法与实例".《案例驱动的大数据原理技术及应用》.2018, * |
Also Published As
Publication number | Publication date |
---|---|
CN110400215A (en) | 2019-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110400215B (en) | Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model | |
US7617156B1 (en) | Method for minimizing overdraft charge-off | |
US20160225076A1 (en) | System and method for building and validating a credit scoring function | |
CN104321794B (en) | A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading | |
CN108629413A (en) | Neural network model training, trading activity Risk Identification Method and device | |
CN109829721B (en) | Online transaction multi-subject behavior modeling method based on heterogeneous network characterization learning | |
CN111882420A (en) | Generation method of response rate, marketing method, model training method and device | |
CN110728301A (en) | Credit scoring method, device, terminal and storage medium for individual user | |
CN112907356A (en) | Overdue collection method, device and system and computer readable storage medium | |
CN112990989B (en) | Value prediction model input data generation method, device, equipment and medium | |
CN112927071A (en) | Post-loan behavior feature processing method and device | |
CN111738610A (en) | Public opinion data-based enterprise loss risk early warning system and method | |
Zhai et al. | Big data analysis of accounting forecasting based on machine learning | |
CN114219630A (en) | Service risk prediction method, device, equipment and medium | |
CN110570301B (en) | Risk identification method, device, equipment and medium | |
CN113592140A (en) | Electric charge payment prediction model training system and electric charge payment prediction model | |
CN112508689A (en) | Method for realizing decision evaluation based on multiple dimensions | |
CN113064883A (en) | Method for constructing logistics wind control model, computer equipment and storage medium | |
CN113052422A (en) | Wind control model training method and user credit evaluation method | |
CN112232944B (en) | Method and device for creating scoring card and electronic equipment | |
KR102249015B1 (en) | Calculation System for Corporate Debt Payment Capability | |
KR20230094936A (en) | Activist alternative credit scoring system model using work behavior data and method for providing the same | |
CN115187253A (en) | Entity risk identification method and device, computer readable medium and electronic equipment | |
Klemens | Disparate Outcomes from US Domestic Migration | |
Abney | Broadband’s Role in Agricultural Job Postings In US Counties |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |