CN110400215B - Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model - Google Patents

Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model Download PDF

Info

Publication number
CN110400215B
CN110400215B CN201910700190.0A CN201910700190A CN110400215B CN 110400215 B CN110400215 B CN 110400215B CN 201910700190 A CN201910700190 A CN 201910700190A CN 110400215 B CN110400215 B CN 110400215B
Authority
CN
China
Prior art keywords
data
model
credit
evaluation
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910700190.0A
Other languages
Chinese (zh)
Other versions
CN110400215A (en
Inventor
沈林江
张笑笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201910700190.0A priority Critical patent/CN110400215B/en
Publication of CN110400215A publication Critical patent/CN110400215A/en
Application granted granted Critical
Publication of CN110400215B publication Critical patent/CN110400215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a system for constructing a small micro-enterprise credit evaluation model oriented to an enterprise family, belongs to the field of enterprise credit evaluation, and aims to solve the technical problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise family. The method comprises the following steps: acquiring sample data; dividing the sample data into a training set and an evaluation set; constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model; training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm; and taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method. The system comprises a data acquisition module, a sample division module, a model construction module, a model training module and a model optimization module.

Description

Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model
Technical Field
The invention relates to the field of enterprise credit evaluation, in particular to a method and a system for constructing a small micro enterprise credit evaluation model facing an enterprise family.
Background
The small micro enterprises of the enterprise owner family play important roles in enlarging employment, improving folk life, promoting stability and the like, but the small micro enterprises cannot meet the credit giving policy of banks due to the problems of weak risk resistance, less quality-resisting deposit, irregular operation management, opaque credit information, high credit evaluation difficulty and the like, so that bank loans cannot be obtained to a great extent. Therefore, the small and micro enterprise credit evaluation method for the enterprise owner family is provided, credit evaluation is carried out from the enterprise owner and enterprise behaviors by means of large data of operators, bank credit standard is supplemented, bank loan passing rate of the small and micro enterprises is improved, and the small and micro enterprises are better served.
Based on the above, how to realize the credit evaluation of small micro enterprises facing the enterprise family is a technical problem to be solved.
Disclosure of Invention
Aiming at the defects, the technical task of the invention is to provide a method and a system for constructing a small micro-enterprise credit evaluation model oriented to enterprise families, so as to solve the problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise families.
In a first aspect, the present invention provides a method for constructing a small micro-enterprise credit assessment model facing to an enterprise family, including the following steps:
acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, and the plurality of credit data are from a sum credit and an internet platform of an operator;
dividing the sample data into a training set and an evaluation set;
constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and sets an influence weight, and takes the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
and taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method to obtain a final evaluation model.
In the above embodiment, the sample data includes two parts, one part is a feature index, the feature index is a credit assessment attribute extracted from the portrait feature of the small micro enterprise, the other part is credit data, the credit data is derived from the operator and is used as a credit sample of the small micro enterprise owner, and the internet data such as the credit data of the enterprise searching platform is used as a credit sample of the first sub-model; training and optimizing the two constructed sub-models through the sample data to obtain a final evaluation model, wherein the credit evaluation output by the evaluation model is the weighted sum of the two sub-models. The main credit and the behavior credit of the small micro-enterprises can be evaluated through the evaluation model.
Preferably, the sample data is preprocessed before being divided into a training set and an evaluation set;
the pretreatment comprises the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
Preferably, the feature index is subjected to dimension reduction treatment after the sample data are preprocessed;
the dimension reduction processing for the characteristic index comprises the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
Preferably, before sample data are acquired, a sample library is constructed, and portrait analysis is carried out on the sample library to obtain portrait features of the small micro enterprises;
the construction of the sample library comprises the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
and summarizing the fused data based on the data granularity to obtain a sample library.
Preferably, the operator data, the internet data and the industry data are collected through offline collection, real-time collection, crawlers and partner introduction modes.
Preferably, the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
In a second aspect, the present invention provides a small micro-enterprise credit assessment system for an enterprise family, comprising:
the data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small and micro enterprises, and the credit data come from operators, credit and Internet platforms;
the sample dividing module is used for dividing the collected sample data into a training set and an evaluation set;
the assessment model construction module is used for constructing an assessment model through a random forest method, wherein the assessment model consists of a business owner credit assessment sub-model and an enterprise behavior credit assessment sub-model, and is used for setting an influence weight, and taking the weighted sum of the business owner credit assessment sub-model and the enterprise behavior credit assessment sub-model credit as an assessment output of the assessment model;
the model training module is used for training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
and the model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model.
More preferably, the system also comprises a sample data processing module, wherein the sample data processing module comprises a data preprocessing module and a data dimension reduction module,
the data preprocessing module is used for preprocessing sample data through the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value in the current index value;
the data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
More preferably, the system further comprises a characteristic index construction module, wherein the characteristic index construction module comprises a sample library construction module and a portrait analysis module;
the sample library construction module is used for constructing a sample library through the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
the image analysis module is used for carrying out image analysis on the sample library to obtain image features.
Preferably, the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
The method and the system for constructing the enterprise family-oriented small micro enterprise credit assessment model have the following advantages: the method has the advantages that the current situation that various consideration of banks to credit small micro-enterprises and low bank loan passing rate are deeply analyzed, the small micro-enterprise credit of the enterprise owner family is accurately estimated, the credit estimation of the enterprise owner and enterprise behaviors is integrated, the bank credit standard is supplemented, the multidimensional financing channel is realized, and the problem that the small micro-enterprise financing is difficult and expensive is really solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for constructing a small micro enterprise credit assessment model for an enterprise family according to embodiment 1.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.
It should be appreciated that in the description of embodiments of the invention, the words "first," "second," and the like are used merely for distinguishing between the descriptions and not for indicating or implying any relative importance or order. "plurality" in the embodiments of the present invention means two or more.
The embodiment of the invention provides a method and a system for constructing a small micro-enterprise credit evaluation model oriented to an enterprise family, which are used for solving the technical problem of how to realize the small micro-enterprise credit evaluation oriented to the enterprise family.
Example 1:
as shown in fig. 1, the method for constructing the enterprise family-oriented small micro enterprise credit assessment model comprises the following steps:
s100, acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, the plurality of credit data are obtained from a sum credit and internet platform of an operator, and the internet platform is used for enterprise investigation;
s200, dividing the sample data into a training set and an evaluation set;
s300, constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and setting an influence weight, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
s400, training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
s500, taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model through a grid search method to obtain a final evaluation model.
In this embodiment, the portrait features include: the operators mainly take 18-45 years old groups as home roof beams and columns, and are the main force of Chinese labor force; is full of vitality and confidence, and works for more than 8 hours per day; the registration time of the industry and commerce is 1 year or more; the distribution is carried out in three-four-five line cities; the number of staff is less than 5 people, and most of staff stores or relatives and friends are operated together; the operation scale is small, the transaction is frequent, the amount is not large, and the payment treasures/WeChat are used more; company name contains no keywords: the allied store, office, department of the market and branch are free from sales and financial index restrictions; the enterprise has good market prospect.
And carrying out credit assessment attribute extraction on the portrait features to obtain a plurality of characteristic indexes, wherein the characteristic indexes form a characteristic index set, and the characteristic indexes comprise, but are not limited to, identity information, position information, social information, consumption information, credit history, behavior information, business information and industry information.
After the sample data are obtained, the sample data are divided into a training set and an evaluation set, the training set is used for training an evaluation model, and the evaluation set is used for carrying out parameter optimization on the trained evaluation model to obtain a final evaluation model.
And constructing a business owner credit assessment sub-model and a business behavior credit assessment sub-model by a random forest method, wherein the two sub-models form an assessment model, characteristic indexes assessed by the business owner credit assessment sub-model comprise identity information (such as age), position information (such as day time residence, home city and small position movement track), behavior information (such as payment treasures/WeChat payments and business object class searching), consumption information (such as income class, consumption class, telephone traffic packages, star level users and whether fixed wages are provided or not), social information (such as social circle stability, social influence and social circle credit), and characteristic indexes assessed by the business behavior credit assessment sub-model comprise business information and industry information, such as business registration time, company name, financial index, operation scale, transaction type, transaction frequency, transaction amount, market prospect and the like.
The assessment output of the assessment model is a weighted sum of business owner credit assessment sub-model and business behavioral credit assessment sub-model credits.
As a first modification of the present embodiment, the sample data is preprocessed before being divided, specifically including the steps of:
(1) Filling the sample data with NULL values, wherein the filling index is the average value of all the sample data;
(2) Performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
(3) And carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
In view of the large number of feature indexes, the feature indexes are subjected to dimension reduction treatment to remove the feature indexes which have no significant relation with credit, and the method specifically comprises the following steps:
(1) Grouping the preprocessed sample data by a spark feature label grouping method, and separating the characteristic index data from the credit data;
(2) Calculating the importance of each characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
(3) Calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
The sample data is preprocessed through the steps, the characteristic indexes are screened to obtain processed sample data, and the constructed estimated friction model is trained and optimized through the processed sample data.
The method is used for improving the first improvement, and further comprises the steps of constructing a sample library, obtaining image features of the small micro-enterprises through image analysis of the sample library, and further obtaining a feature index set through credit assessment attribute extraction of the image features.
Wherein the construction of the sample library comprises the following steps:
(1) Acquiring operator data, internet data and industry data through offline acquisition, real-time acquisition, crawlers and a mode of partner introduction, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high isomerism principles;
(2) Carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
(3) Carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
(4) And summarizing the fused data based on the data granularity to obtain a sample library.
In the step (1), a data acquisition program is set, and the acquired sources are operator, internet and industry data; setting a data loading program, uniformly loading data of different sources into a data sharing platform, and uniformly storing the data in a file system or database and other forms.
In the step (2), the abnormal data is cleaned, wherein the abnormal data is represented by the null user number.
In the step (3), the data of the three sources are independently collected and loaded, and data fusion is required to be carried out through a data association means, and the data are integrated into a wide data table through user number or name association.
In the step (4), the operator data has small granularity and 5 minutes granularity, but in this embodiment, the data is required to have the granularity of month, so that the data is summarized, for example, the usage behaviors of the payment device with the granularity of 5 minutes are summarized as the usage times of the payment device with the granularity of month.
After the sample library is constructed through the steps, portrait analysis is carried out on the sample library, and portrait characteristics are obtained.
Example 2:
the invention discloses an enterprise family-oriented small micro enterprise credit assessment model construction system which comprises a data acquisition module, a sample division module, a model construction module, a model training module and a model optimization module.
The data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, the credit data come from the sum credit and internet platform of operators, and the internet platform is used for enterprise investigation.
The sample dividing module is used for dividing the acquired sample data into a training set and an evaluation set.
The model construction module is used for constructing an evaluation model through a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and is used for setting an influence weight, and taking the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as the evaluation output of the evaluation model.
The model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model.
Wherein, the portrait characteristic includes: the operators mainly take 18-45 years old groups as home roof beams and columns, and are the main force of Chinese labor force; is full of vitality and confidence, and works for more than 8 hours per day; the registration time of the industry and commerce is 1 year or more; the distribution is carried out in three-four-five line cities; the number of staff is less than 5 people, and most of staff stores or relatives and friends are operated together; the operation scale is small, the transaction is frequent, the amount is not large, and the payment treasures/WeChat are used more; company name contains no keywords: the allied store, office, department of the market and branch are free from sales and financial index restrictions; the enterprise has good market prospect.
The plurality of characteristic indicators comprise a set of characteristic indicators including, but not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
The model construction module constructs a business owner credit assessment sub-model and a business behavior credit assessment sub-model through a random forest method, wherein the two sub-models form an assessment model, characteristic indexes assessed by the business owner credit assessment sub-model comprise identity information (such as age), position information (such as daytime residence, home city and small position movement track), behavior information (such as payment treasures/WeChat payments and business object class searching), consumption information (such as income class, consumption class, telephone traffic package, star-class users and whether fixed wage release days exist), social information (such as social circle stability, social influence and social circle credit), and characteristic indexes assessed by the business behavior credit assessment sub-model comprise business information and industry information, such as business registration time, company name, financial index, operation scale, transaction type, transaction frequency, transaction amount, market prospect and the like.
The assessment output of the assessment model is a weighted sum of business owner credit assessment sub-model and business behavioral credit assessment sub-model credits.
The enterprise family oriented small micro enterprise credit assessment model construction system of the present embodiment can support the assessment model construction method disclosed in embodiment 1.
As a first improvement of the above embodiment, the building system further includes a sample data processing module, where the sample data processing module includes a data preprocessing module and a data dimension reduction module.
The data preprocessing module is used for preprocessing sample data by the following steps:
(1) Filling NULL values of sample data, wherein a filling index is the average value of the sample data;
(2) Performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
(3) And carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value of the current index values.
The data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value.
The data acquisition module acquires sample data, then transmits the sample data to the data processing module, preprocesses the sample data through the data processing module, screens characteristic indexes, outputs final sample data, divides the final sample data through the sample dividing module, and obtains a training set and an evaluation set so as to train and optimize an evaluation model.
The first modified small micro-enterprise credit assessment model construction system for enterprise families according to this embodiment may support the first modified assessment model construction method disclosed in embodiment 1.
As a further improvement of the above improved embodiment, the construction system further includes a feature index construction module, and the feature index construction module includes a sample library construction module and a portrait analysis module.
The sample library construction module is used for constructing a sample library by the following steps:
(1) Acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
(2) Carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
(3) Carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
(4) And summarizing the fused data based on the data granularity to obtain a sample library.
The image analysis module is used for carrying out image analysis on the sample library to obtain image characteristics.
The characteristic index construction module is connected with the constructed data sharing platform and the data acquisition module, and is used for loading data from the data sharing platform and inputting the obtained characteristic index into the data acquisition module.
The second modified small micro-enterprise credit assessment model construction system for enterprise families of this embodiment can support the second modified assessment model construction method disclosed in embodiment 1.
The above-described embodiments are merely preferred embodiments for fully explaining the present invention, and the scope of the present invention is not limited thereto. Equivalent substitutions and modifications will occur to those skilled in the art based on the present invention, and are intended to be within the scope of the present invention. The protection scope of the invention is subject to the claims.

Claims (2)

1. The method for constructing the enterprise family-oriented small micro enterprise credit assessment model is characterized by comprising the following steps of:
acquiring sample data, wherein the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the plurality of characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small enterprise, and the plurality of credit data are from a sum credit and an internet platform of an operator;
dividing the sample data into a training set and an evaluation set;
constructing an evaluation model by a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model, and sets an influence weight, and takes the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit as evaluation output of the evaluation model;
training the evaluation model by using a training set as input through a ten-fold cross-validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross-validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
taking the evaluation set as input, and performing parameter fine adjustment on the initial evaluation model by a grid search method to obtain a final evaluation model;
preprocessing the sample data before dividing the sample data into a training set and an evaluation set;
the pretreatment comprises the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representation ofMinimum value, x in the current index value max Representing the maximum value in the current index value;
preprocessing sample data, and then performing dimension reduction processing on the characteristic indexes;
the dimension reduction processing for the characteristic index comprises the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value;
before sample data are acquired, a sample library is constructed, and portrait analysis is carried out on the sample library to obtain portrait features of the small and micro enterprises;
the construction of the sample library comprises the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
acquiring operator data, internet data and industry data in an offline acquisition, real-time acquisition, crawler and partner introduction mode;
the plurality of characteristic indicators include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
2. The enterprise family oriented small micro enterprise credit assessment model construction system is characterized by comprising:
the data acquisition module is used for acquiring sample data, the sample data comprises a plurality of characteristic indexes and a plurality of credit data, the characteristic indexes are credit assessment attributes extracted based on the portrait characteristics of the small and micro enterprises, and the credit data come from operators, credit and Internet platforms;
the sample dividing module is used for dividing the collected sample data into a training set and an evaluation set;
the model construction module is used for constructing an evaluation model through a random forest method, wherein the evaluation model consists of a business owner credit evaluation sub-model and an enterprise behavior credit evaluation sub-model and is used for setting an influence weight, and the weighted sum of the business owner credit evaluation sub-model and the enterprise behavior credit evaluation sub-model credit score is used as evaluation output of the evaluation model;
the model training module is used for training the evaluation model by using a training set as input through a ten-fold cross validation algorithm and a parameter grid optimization algorithm, and restraining the ten-fold cross validation algorithm and the parameter grid optimization algorithm by adopting a regularization method;
the model optimization module is used for carrying out parameter fine adjustment on the initial evaluation model by taking the evaluation set as input through a grid search method to obtain a final evaluation model;
the system also comprises a sample data processing module, wherein the sample data processing module comprises a data preprocessing module and a data dimension reduction module,
the data preprocessing module is used for preprocessing sample data through the following steps:
filling NULL values of sample data, wherein a filling index is the average value of the sample data;
performing outlier processing on the sample data, and deleting the sample data which are different from a threshold value, wherein the threshold value is determined through an index slope map;
and carrying out normalization processing on the sample data based on an index value standardization formula so as to limit the sample data in the same interval, wherein the index value standardization formula is as follows:
wherein x represents an index value before normalization processing,indicating normalized index value, x min Representing the minimum value, x, in the current index value max Representing the maximum value in the current index value;
the data dimension reduction module is used for carrying out dimension reduction processing on the characteristic indexes through the following steps:
grouping the preprocessed sample data by a spark feature label grouping method, and separating characteristic index data from credit data;
calculating the importance of each piece of characteristic index data, and eliminating the characteristic index data with low importance, wherein the calculation formula of the importance of the characteristic index data is as follows:
importance of feature index data = Σ (errOOB 2-errOOB 1)/Ntree
error obtained by calculating random forest out-of-bag data is errOOB1, error obtained by adding noise interference to out-of-bag data is errOOB2, and Ntree is the number of trees in a random forest algorithm;
calculating a chi-square value of the characteristic index data through a chi-square test formula, and selecting the index characteristic data with obvious relation with credit score data based on the confidence coefficient, the chi-square value and the degree of freedom, wherein the chi-square test formula is as follows:
wherein X represents characteristic index data, observed represents an observed value, and expected represents a theoretical value;
the system also comprises a characteristic index construction module, wherein the characteristic index construction module comprises a sample library construction module and a portrait analysis module;
the sample library construction module is used for constructing a sample library through the following steps:
acquiring operator data, internet data and industry data, and loading the operator data, the internet data and the industry data into a data sharing platform, wherein the data sharing platform is constructed based on multiple data sources, loose coupling and high heterogeneous principles;
carrying out abnormal data cleaning on the loaded data according to a specified cleaning rule to obtain cleaned data;
carrying out data fusion on the cleaned data by a data association method, and integrating the loaded data into a data table;
summarizing the fused data based on data granularity to obtain a sample library;
the image analysis module is used for carrying out image analysis on the sample library to obtain image characteristics;
the characteristic metrics include, but are not limited to, identity information, location information, social information, consumption information, credit history, behavioral information, business information, and industry information.
CN201910700190.0A 2019-07-31 2019-07-31 Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model Active CN110400215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910700190.0A CN110400215B (en) 2019-07-31 2019-07-31 Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910700190.0A CN110400215B (en) 2019-07-31 2019-07-31 Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model

Publications (2)

Publication Number Publication Date
CN110400215A CN110400215A (en) 2019-11-01
CN110400215B true CN110400215B (en) 2023-11-03

Family

ID=68326933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910700190.0A Active CN110400215B (en) 2019-07-31 2019-07-31 Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model

Country Status (1)

Country Link
CN (1) CN110400215B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241746B (en) * 2020-01-09 2024-01-26 深圳前海微众银行股份有限公司 Forward model selection method, apparatus, and readable storage medium
CN111241745B (en) * 2020-01-09 2024-05-24 深圳前海微众银行股份有限公司 Gradual model selection method, equipment and readable storage medium
CN113537666B (en) * 2020-04-16 2024-05-03 马上消费金融股份有限公司 Evaluation model training method, evaluation and business auditing method, device and equipment
CN111768298A (en) * 2020-06-30 2020-10-13 中国建设银行股份有限公司 Transaction data quota determining method, device, equipment and medium
CN112017023A (en) * 2020-07-15 2020-12-01 北京淇瑀信息科技有限公司 Method and device for determining resource limit of small and micro enterprise and electronic equipment
CN112232724B (en) * 2020-12-17 2021-03-26 平安科技(深圳)有限公司 Quantitative evaluation method, system, equipment and storage medium for personnel ability
CN112633709A (en) * 2020-12-26 2021-04-09 中国农业银行股份有限公司 Enterprise credit investigation evaluation method and device
CN112861056A (en) * 2021-02-07 2021-05-28 杭州云搜网络技术有限公司 Enterprise website construction information display and release system and method
CN113269514A (en) * 2021-05-13 2021-08-17 企家有道网络技术(北京)有限公司 Enterprise health degree measuring method, device and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719073A (en) * 2016-01-18 2016-06-29 苏州汇誉通数据科技有限公司 Enterprise credit evaluation system and method
CN107392456A (en) * 2017-07-14 2017-11-24 武汉理工大学 A kind of multi-angle rating business credit modeling method for merging internet information
CN108256993A (en) * 2017-12-29 2018-07-06 浪潮天元通信信息系统有限公司 A kind of credit score appraisal procedure and credit score Evaluation Platform
CN108550077A (en) * 2018-04-27 2018-09-18 信雅达系统工程股份有限公司 A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data
CN109002839A (en) * 2018-06-22 2018-12-14 杭州电子科技大学 Efficient feature selection method under a kind of more attributive character environment
CN109409677A (en) * 2018-09-27 2019-03-01 深圳壹账通智能科技有限公司 Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN109685526A (en) * 2018-12-12 2019-04-26 税友软件集团股份有限公司 A kind of method for evaluating credit rating of enterprise, device and relevant device
CN110046984A (en) * 2019-03-01 2019-07-23 安徽省优质采科技发展有限责任公司 Enterprise credit risk system and evaluation method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105719073A (en) * 2016-01-18 2016-06-29 苏州汇誉通数据科技有限公司 Enterprise credit evaluation system and method
CN107392456A (en) * 2017-07-14 2017-11-24 武汉理工大学 A kind of multi-angle rating business credit modeling method for merging internet information
CN108256993A (en) * 2017-12-29 2018-07-06 浪潮天元通信信息系统有限公司 A kind of credit score appraisal procedure and credit score Evaluation Platform
CN108550077A (en) * 2018-04-27 2018-09-18 信雅达系统工程股份有限公司 A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data
CN109002839A (en) * 2018-06-22 2018-12-14 杭州电子科技大学 Efficient feature selection method under a kind of more attributive character environment
CN109409677A (en) * 2018-09-27 2019-03-01 深圳壹账通智能科技有限公司 Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN109685526A (en) * 2018-12-12 2019-04-26 税友软件集团股份有限公司 A kind of method for evaluating credit rating of enterprise, device and relevant device
CN110046984A (en) * 2019-03-01 2019-07-23 安徽省优质采科技发展有限责任公司 Enterprise credit risk system and evaluation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
小微企业信用综合评级模型的构建;刘敏;《西南师范大学学报(自然科学版)》;20170920(第09期);全文 *
蒋卫祥."大数据机器学习系统".《大数据时代计算机数据处理技术探究》.2019, *
黄冬梅 等."面向大数据的机器学习算法与实例".《案例驱动的大数据原理技术及应用》.2018, *

Also Published As

Publication number Publication date
CN110400215A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN110400215B (en) Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model
US7617156B1 (en) Method for minimizing overdraft charge-off
US20160225076A1 (en) System and method for building and validating a credit scoring function
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
CN108629413A (en) Neural network model training, trading activity Risk Identification Method and device
CN109829721B (en) Online transaction multi-subject behavior modeling method based on heterogeneous network characterization learning
CN111882420A (en) Generation method of response rate, marketing method, model training method and device
CN110728301A (en) Credit scoring method, device, terminal and storage medium for individual user
CN112907356A (en) Overdue collection method, device and system and computer readable storage medium
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN112927071A (en) Post-loan behavior feature processing method and device
CN111738610A (en) Public opinion data-based enterprise loss risk early warning system and method
Zhai et al. Big data analysis of accounting forecasting based on machine learning
CN114219630A (en) Service risk prediction method, device, equipment and medium
CN110570301B (en) Risk identification method, device, equipment and medium
CN113592140A (en) Electric charge payment prediction model training system and electric charge payment prediction model
CN112508689A (en) Method for realizing decision evaluation based on multiple dimensions
CN113064883A (en) Method for constructing logistics wind control model, computer equipment and storage medium
CN113052422A (en) Wind control model training method and user credit evaluation method
CN112232944B (en) Method and device for creating scoring card and electronic equipment
KR102249015B1 (en) Calculation System for Corporate Debt Payment Capability
KR20230094936A (en) Activist alternative credit scoring system model using work behavior data and method for providing the same
CN115187253A (en) Entity risk identification method and device, computer readable medium and electronic equipment
Klemens Disparate Outcomes from US Domestic Migration
Abney Broadband’s Role in Agricultural Job Postings In US Counties

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant