CN112017025A

CN112017025A - Enterprise credit assessment method based on fusion of deep learning and logistic regression

Info

Publication number: CN112017025A
Application number: CN202010868081.2A
Authority: CN
Inventors: 尹盼盼; 边松华; 崔乐乐
Original assignee: Tianyuan Big Data Credit Management Co Ltd
Current assignee: Tianyuan Big Data Credit Management Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-12-01
Anticipated expiration: 2040-08-26

Abstract

The invention discloses an enterprise credit assessment method based on the integration of deep learning and logistic regression, which relates to the technical field of financial credit and comprises the following steps: storing government, internet and third-party data of an enterprise in a relational data sub-table, and merging a plurality of sub-tables into a total table to be stored in a standard data warehouse; screening warehouse data, and constructing an enterprise credit index system; manually marking whether an enterprise violates a contract or not, and randomly dividing corresponding data into a training sample and a prediction sample; performing exploratory data analysis and data cleaning based on a training sample and an index system, and determining a primary mould entering index; constructing an enterprise credit evaluation model integrating deep learning and logistic regression, training the model based on the training sample and the preliminary model entry index, and outputting a final model entry index and an optimal model; the optimal model predicts the default probability of the enterprise and converts the default probability into credit scores. The invention can improve the credit scoring accuracy of enterprises and provide important assessment for financial credit of the enterprises.

Description

Enterprise credit assessment method based on fusion of deep learning and logistic regression

Technical Field

The invention relates to the technical field of financial credit, in particular to an enterprise credit evaluation method based on the fusion of deep learning and logistic regression.

Background

The enterprise credit score is one of important links for credit risk management and control of enterprises, overdue probability index reference is provided according to existing data, a means for measuring risk probability in a score mode is adopted, and generally, the higher the score is, the safer the score is. The enterprise credit score modeling usually adopts a machine learning modeling method of logistic regression, decision trees and combined models. With the popularization of the application of the artificial intelligence technology in the field of financial wind control, a credit scoring model based on the deep learning technology is also widely applied. The credit finance industry has the characteristic of small amount dispersion, a user sinks more, and needs to continuously perfect intellectualization in each link of loan, examination and approval, customer service and post-loan management, so that the risk of the user is reduced, the high-dimensional characteristics of the user are deeply excavated by using a deep learning technology to analyze the potential risk of the user, and the credit examination and approval service is more efficient and faster.

Deep learning is derived from a neural network, and recognition of a specific mode is realized by simulating the ability of human brain to learn and process knowledge. Compared with the traditional scoring method, the deep learning parallel distribution processing method has strong parallel distribution processing capacity and strong distribution storage and learning capacity, can be used in the supervision field (classification and prediction) and the unsupervised field (feature derivation), and can learn the intricate and complex hidden feature association and mode features in a large number of data features. The enterprise credit score based on deep learning is one of the extended applications of the deep learning technology in the enterprise credit score, and lays a foundation for establishing various models in the enterprise wind control field by applying the deep learning technology based on a large amount of data and characteristics in the later period.

The application field of the deep neural network is focused on a plurality of fields of image recognition, voice recognition, natural language processing and the like, and the Sun 'Min' and Wanglan propose a novel efficient iris image quality evaluation method based on the deep neural network. The deep neural network is also widely applied in the credit finance field, the difference between image identification and financial risk assessment is that the stages of data preprocessing are different, deep excavation and analysis aiming at features are communicated after feature vector extraction is completed, and an algorithm can be reused. Based on the method, research and development personnel apply the deep neural network learning technology to enterprise risk evaluation in the field of financial credit, deeply excavate and learn the model-entering characteristics through the deep learning technology, and comprehensively evaluate the credit risk condition of an enterprise.

Disclosure of Invention

Aiming at the requirements and the defects of the prior art development, the invention provides an enterprise credit evaluation method based on the fusion of deep learning and logistic regression.

The invention discloses an enterprise credit assessment method based on the integration of deep learning and logistic regression, which adopts the following technical scheme for solving the technical problems:

an enterprise credit assessment method based on integration of deep learning and logistic regression comprises the following steps:

s1, acquiring government data, internet data and third-party data of multiple enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table;

s2, converging, aligning and fusing the relational data sub-tables of the plurality of enterprises into at least one relational data general table, and storing the relational data general table in a standard data warehouse;

s3, screening data contained in all relational data general tables in the standard data warehouse respectively, and constructing an enterprise credit index system with three levels of indexes;

s4, manually marking the enterprise as a default user or a conservative user based on the data contained in the relational data sub-table of the enterprise, and then respectively marking the related data of the default user and the conservative user in a relational data general table stored in a standard data warehouse;

s5, dividing default users and conservative users into training samples and prediction samples at random, wherein the number of users contained in the training samples is more than that of users contained in the prediction samples, dividing the relational data total table into two relational data tables according to the result of random division, and correspondingly storing the two divided relational data tables into the training samples and the prediction samples;

s6, performing exploratory data analysis and data cleaning on data contained in the relational data table of the training sample and three-layer indexes of an enterprise credit index system, and determining a primary modeling index;

s7, constructing an enterprise credit evaluation model integrating deep learning and logistic regression based on the neural network;

s8, training the enterprise credit evaluation model constructed in the step S7 based on the training samples and the preliminary model entry indexes determined in the step S6, and outputting final model entry indexes and an optimal enterprise credit evaluation model;

and S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model obtained by training in the step S8, converting the default probability into standard credit score, carrying out normal distribution test on the whole enterprise credit score distribution, and determining the final enterprise credit score.

Optionally, in step S1, the government data of the enterprise includes various information of industry and commerce, public deposit, social security, committee for modification, bank supervision, and administrative penalty;

the internet data of the enterprise comprises E-commerce data, marketing information, affirmation information, online store information, lawsuits, information of information;

the third-party data of the enterprise comprises enterprise business information, personnel information and various items of information of the human-enterprise relationship data.

Further optionally, in step S2, the relational data sub-tables of multiple enterprises are aggregated, aligned, and fused to obtain at least one relational data total table, where the specific operations include:

s2.1, data aggregation stage: enterprise data is collected, wherein the enterprise data comprises government data, internet data and third-party data of an enterprise, the government data of the enterprise is in butt joint in an interface form and covers accumulation fund, social security, industrial and commercial, tax, food and drug administration and bank protection administration, the internet data of the enterprise covers enterprise background, E-commerce data, referee documents, bid and judicial data, and the third-party data of the enterprise is in butt joint in an interface form and covers enterprise industrial and commercial information, personnel information and human-enterprise relationship data;

s2.2, data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third-party data which are put in a warehouse of an enterprise, and carrying out treatment processing on the three-party data through an ETL data treatment tool;

s2.3, data fusion stage: the method comprises the steps of performing horizontal and vertical data fusion on government data, internet data and third-party data of multiple enterprises, fusing and converging the government data, the internet data and the third-party data into at least one relational data general table, uniformly storing the at least one relational data general table into a standard data warehouse, and storing three items of information of standard library data, an index library and a feature library which are obtained by processing after the three-party data are fused in the standard data warehouse.

Optionally, step S3 is to construct an enterprise credit index system with three levels of indexes, where the specific operations include:

s3.1, based on the business objective of enterprise credit evaluation, respectively combing each form field of the relational data total form in the standard data warehouse to determine the original index,

s3.2, deriving the original index to form three-level index content,

s3.3, abstracting and summarizing the three-level indexes to form second-level index content,

s3.4, analyzing the evaluation dimension of the enterprise credit embodied by the indexes by combining the contents of the third-level indexes and the second-level indexes, determining the contents of the first-level indexes,

and S3.5, constructing an enterprise credit index system covering three layers of indexes based on the contents of the three-level indexes, the contents of the two-level indexes and the contents of the first-level indexes.

Preferably, the contents of the third level index, the second level index and the first level index are reduced in sequence, wherein,

the content of the third-level indexes comprises specific enterprise credit indexes extracted through a relational data total table;

the content of the second-level index is an enterprise credit index which is integrated with business knowledge classification and arrangement on the basis of the third-level index;

the content of the primary index is an index finally determined by evaluating the credit risk of the enterprise, and the primary index comprises 7 indexes of repayment, industry, operation, performance, region, cash flow and operation, and is applied to radar map display of an enterprise portrait to evaluate the credit risk condition of the enterprise on each subdivision dimension.

Optionally, in step S6, exploratory data analysis is performed on data included in the relational data table of the training sample and three-layer indexes of the enterprise credit index system, and the specific operations are as follows:

s6.1.1, describing and counting the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system;

s6.1.2, analyzing the description statistics of step S6.1.1, referring the index containing time information as a specific index, and segmenting the description data of the specific index to further deeply analyze the dynamic change situation of the data and the value taking situation under a certain specific condition;

s6.1.3, drawing a histogram curve of the univariates and a relation curve of the univariates and the target variable so as to perform visual analysis on the three-layer indexes.

Further optionally, in step S6, data cleaning is performed on data included in the relational data table of the training sample and three-level indexes of the enterprise credit index system, and the specific operations are as follows:

s6.2.1, carrying out invalid value processing on the three-layer indexes of the enterprise credit index system based on the data contained in the relational data table of the training sample;

s6.2.2, carrying out numerical quantification on three layers of quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample;

s6.2.3, performing missing value statistics on three-layer indexes of the enterprise credit index system based on data contained in the relational data table of the training sample, and removing the three-layer indexes with missing values larger than 60%;

s6.2.4, carrying out statistics of the same-value rate on the remaining three-layer indexes in the enterprise credit index system after step S6.2.3 based on the data contained in the relational data table of the training sample, removing the characteristic that the attribute has only one value, and removing the three-layer indexes of which the attribute same-value rate is more than 60%;

s6.2.5, removing unreasonable indexes determined in the process of exploratory data analysis for the three layers of indexes left after step S6.2.4, and then performing VIF collinearity analysis;

s6.2.6, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample based on the missing value statistics carried out in the step S6.2.3, and removing the data set with the data missing ratio larger than 50%;

s6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, carrying out abnormal value detection on the three-layer indexes left in the enterprise credit index system after the step S6.2.5 by adopting a quartile distance method of a box diagram, screening abnormal values of partial indexes according to the quartile standard, filling the screened abnormal values serving as missing values with a specific value "-999",

s6.2.8, using a random forest method to take the characteristics of the indexes without missing values in the relational data table of the training sample after step S6.2.7 as characteristic variables, selecting the indexes with missing values in the relational data table of the training sample as target functions, taking the characteristic variables and the non-missing values of the target functions as training samples, training a random forest model, and completing the filling of the missing indexes in all the training samples, wherein the trained random forest model can predict the missing values of the missing characteristics;

s6.2.9, carrying out Z-Score standardization treatment on the training samples filled with the missing values to form standardized training sample vectors containing the preliminary model-entering indexes for carrying out the training of the enterprise credit evaluation model.

Further optionally, in step S7, an enterprise credit evaluation model integrating deep learning and logistic regression is constructed based on the neural network, and the process includes three stages of determining the neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;

s7.1, determining a neural network stage:

the neural network selects a multilayer fully-connected neural network, which comprises an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary mode-entering indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in a training sample, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;

s4.2, determining an activation function stage of the neural network:

the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model integrating deep learning and logistic regression is constructed;

s4.3, determining the weight search strategy stage of the neural network:

based on the enterprise credit evaluation model constructed in the step S4.2, the stage of determining the weight search strategy of the neural network includes four aspects of determining a loss function, an optimizer, a learning rate and an iteration number, wherein,

a) determining a categorical cross-entropy classification function as a loss function for the enterprise credit assessment model,

b) determine an optimizer of an enterprise credit assessment model to be tf.keras.optimizers.adam, to find an optimal value of the weight according to a change of the loss function,

c) it is determined that the learning rate is 0.001,

d) and determining the number of iterations of the enterprise credit evaluation model to be 10000.

Further optionally, in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model entry index and the optimal enterprise credit assessment model are output, where the process specifically includes:

s8.1, training an enterprise credit evaluation model: adopting an open source package of tensierflow and keras to train the enterprise credit evaluation model constructed in the step S7, selecting python as a development training language of the enterprise credit evaluation model, selecting a training sample and a prediction sample to carry out 10000 times of iterative training of the enterprise credit evaluation model, drawing a learning curve of the enterprise credit evaluation model in the training process, observing a loss function, the accuracy rate of the training sample and the accuracy rate of the prediction sample in the training process of the enterprise credit evaluation model, and finally judging whether the enterprise credit evaluation model is converged and is over-fitted;

s8.2, carrying out importance evaluation on the initial mould entering index: (1) performing Z-Score standardization processing on training samples filled with missing values to form training sample vectors containing preliminary model-entering indexes after standardization, (2) randomly generating a row of disturbance variables to sequentially replace each row of index vectors of the preliminary model-entering indexes in the training sample vectors and generate new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating loss functions according to the input vectors and the output predicted values, (3) circularly generating 100 times of disturbance variables for each model-entering index, circularly executing the step (2), calculating the average value of the obtained loss functions under the 100 times of newly generated training sample vectors, and evaluating the importance of each preliminary model-entering index;

s8.3, performing iterative tuning on the enterprise credit evaluation model: and sorting the initial modeling indexes according to the obtained loss function average value, sequentially selecting different threshold values to screen the initial modeling indexes, inputting the initial modeling indexes of the prediction samples into the enterprise credit evaluation model for training, comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit evaluation model, determining the final modeling indexes, and generating the optimal enterprise credit evaluation model.

Optionally, in step S9, the enterprise default probability predicted by the optimal enterprise credit evaluation model is converted into a standard credit score, and there are the following two methods:

s9a, calculating a feature score through a WOE value and a coefficient of a feature obtained by predicting an optimal enterprise credit evaluation model based on a WOE conversion method;

and S9b, predicting the obtained default probability of the enterprise based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.

Compared with the prior art, the enterprise credit assessment method based on the integration of deep learning and logistic regression has the beneficial effects that:

1) the enterprise multi-source data fusion-based enterprise multi-dimensional credit assessment method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise multi-dimensional credit assessment system is established on the basis of the multi-source data fusion, the enterprise credit assessment dimension is richer, the assessment index is more comprehensive, and the defect that a single data source covers the credit assessment dimension and is more comprehensive is overcome;

2) on one hand, the neural network based on deep learning can deeply excavate important features in multi-dimensional features, can automatically extract and derive the features to obtain more important information, overcomes the defects that the traditional high-dimensional data feature importance evaluation and feature extraction method is complex and non-automatic, is also suitable for parallel implementation of large-scale training samples, has higher operation efficiency, and expands the implementation path and implementation scene of enterprise credit evaluation based on the high-dimensional features and massive training samples;

3) the enterprise credit evaluation method can be used for scoring the credit of the enterprise and assisting the enterprise to finish financial credit, is particularly suitable for wind control prediction of large-data mass enterprises, and has a wide application prospect.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to make the technical scheme, the technical problems to be solved and the technical effects of the present invention more clearly apparent, the following technical scheme of the present invention is clearly and completely described with reference to the specific embodiments.

The first embodiment is as follows:

with reference to fig. 1, the embodiment provides an enterprise credit evaluation method based on the fusion of deep learning and logistic regression, which includes the following steps:

and S1, acquiring government data, internet data and third-party data of a plurality of enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table.

In the step, the government data of the enterprise comprises various information of industry and commerce, public accumulation fund, social security, committee for change, bank security supervision and administrative penalty; the internet data of the enterprise comprises E-commerce data, marketing information, affirmation information, online store information, lawsuits, information of information; the third-party data of the enterprise comprises enterprise business information, personnel information and various items of information of the human-enterprise relationship data.

S2, the relational data sub-tables of multiple enterprises are converged, aligned and fused into at least one relational data total table, and the relational data total table is stored in a standard data warehouse, and the specific operation of the relational data total table comprises the following steps:

S3, screening data contained in all relational data general tables in the standard data warehouse respectively, and constructing an enterprise credit index system with three-layer indexes, wherein the specific operation comprises the following steps:

s3.1, based on the business objective of enterprise credit evaluation, respectively combing each form field of a relational data total form in a standard data warehouse to determine an original index;

s3.2, deriving the original index to form third-level index content;

s3.3, abstracting and summarizing the three-level indexes to form second-level index content;

s3.4, analyzing the evaluation dimension of the enterprise credit embodied by the indexes by combining the contents of the third-level indexes and the second-level indexes, and determining the contents of the first-level indexes;

From steps S3.1-S3.5, it can be seen that the contents of the tertiary index, the secondary index, and the primary index decrease in sequence, wherein,

And S4, manually marking the enterprises as default users or conservation users based on the data contained in the relational data sub-tables of the enterprises, and then respectively marking the relevant data of the default users and the conservation users in the relational data general tables stored in the standard data warehouse.

S5, dividing default users and conservative users into training samples and prediction samples at random, wherein the number of users contained in the training samples is more than that of users contained in the prediction samples, then dividing the relational data total table into two relational data tables according to the result of random division, and correspondingly storing the two divided relational data tables into the training samples and the prediction samples.

And S6, performing exploratory data analysis and data cleaning on data contained in the relational data table of the training sample and three-layer indexes of an enterprise credit index system, and determining a primary modeling index.

In this step, the specific operations for performing exploratory data analysis are as follows:

s6.1.1, describing and counting the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system, such as the variance, the mean, the median and the data distribution of each index;

In this step, the specific operations for data cleaning are as follows:

s6.2.1, based on the data contained in the relational data table of the training sample, carrying out invalid value processing on the three-layer indexes of the enterprise credit index system,

s6.2.2, based on the data contained in the relational data table of the training sample, the three layers of indexes which can be quantified in the enterprise credit index system are quantified numerically,

s6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, carrying out abnormal value detection on the residual three-layer indexes in the enterprise credit index system after the step S6.2.5 by adopting a quartile distance method of a box diagram, screening abnormal values of partial indexes according to the quartile standard, and filling the screened abnormal values serving as missing values with a specific numerical value "-999";

S7, constructing an enterprise credit evaluation model integrating deep learning and logistic regression based on the neural network, wherein the process comprises three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network;

s7.1, determining a neural network stage:

s4.2, determining an activation function stage of the neural network:

s4.3, determining the weight search strategy stage of the neural network:

c) it is determined that the learning rate is 0.001,

d) and determining the number of iterations of the enterprise credit evaluation model to be 10000. The iteration times determine whether the learning process is finished or not in the training process of the neural network model.

S8, training the enterprise credit evaluation model constructed in the step S7 based on the training samples and the preliminary model entry indexes determined in the step S6, and outputting the final model entry indexes and the optimal enterprise credit evaluation model, wherein the process specifically comprises the following steps:

And S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model obtained by training in the step S8, converting the default probability into standard credit score, carrying out normal distribution test on the whole enterprise credit score distribution, and determining the final enterprise credit score. There are two methods for converting the default probability into the standard credit score:

The enterprise credit evaluation method of the embodiment specifically executes the following processes:

and (I) selecting government data, internet data and third-party data of 20 ten thousand enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table to obtain 20 ten thousand relational data sub-tables.

And (II) after data contained in 20 ten thousand relational data sub-tables are gathered, aligned and fused, a relational data total table is obtained and stored in a standard data warehouse.

(III) manually screening data contained in the relational data general table, and constructing an enterprise credit index system with three layers of indexes, wherein: 1042 tertiary indexes, such as number of customs enterprise grades acquired in the last year, real payment capital of an enterprise, duration, personnel scale, blacklisting, number of times of close contractual credit rating in the last year, number of times of executed persons brought into the last year, amount of executed targets in the last year, number of times of change of important business in the last year, number of external guarantee in the last year, number of times of change of common business in the last three years, number of times of bulletin of court in the last 6 months, change of operating range in the last 6 months, historical accumulated branch number, number of overhead branch structures, number of branches in operation, accumulated associated transaction number and the like; the second-level indexes are 17 indexes obtained after the third-level indexes are summarized, such as risks, legal representatives, association relations, management layers, industries, legality, stability, collateral products, management, regions and the like; the primary indexes are finally determined indexes for evaluating the credit risk of the enterprise, and comprise 7 indexes in total, namely, repayment, industry, operation, performance, region, cash flow and operation.

(IV) manually marking the enterprise as default users or conservative users based on the data contained in 20 ten thousand relational data tables, wherein the default users are 5 thousands of families, and the conservative users are 15 thousands of families; subsequently, in a relational data summary table stored in a standard data warehouse, the relevant data of 5 ten thousand default users and 15 ten thousand conservative users are respectively marked.

And (V) randomly dividing 5 million defaulting users and 15 million family users into training samples and prediction samples according to a ratio of 7:3, wherein 3.2 million defaulting users and 10.8 million family users are divided into training samples, the rest 1.8 million defaulting users and 4.2 million family users are divided into prediction samples, then, according to a random division result, finding corresponding data in a relational data total table, dividing the relational data total table into a relational data table I and a relational data table II, storing the relational data table I into the training samples, and storing the relational data table II into the prediction samples. It should be noted here that all the relevant data of a certain enterprise in the relational data total table are divided into training samples or prediction samples.

Carrying out exploratory data analysis on data contained in a relational data table of the training sample and three-layer indexes of an enterprise credit index system to remove 11 unreasonable indexes;

data contained in a relational data table I of a training sample and three-layer indexes of an enterprise credit index system are subjected to data cleaning, and 546 indexes are left after invalid value processing, numerical value quantification and missing value processing; after the screening of the same value rate, remaining 40 indexes; adding 11 unreasonable indexes removed in the exploratory data analysis process, and then performing VIF collinearity analysis to remove relevant characteristics, and then remaining 17 mold-entering indexes; after screening and filtering and abnormal value detection of training samples, 10 indexes in 17 in-mold indexes have missing values, 7 indexes have no missing values, the characteristics of 7 non-missing indexes are used as characteristic variables, 10 missing indexes are respectively selected as target functions, aiming at the remaining training samples after screening and filtering, the characteristic variables and the non-missing values of the target functions are used as training samples, a RandomForest model is trained, the trained RandomForest model can predict the missing values of the missing characteristics, and filling of the missing indexes in all the training samples is completed; and filling missing values of 17 modulus-entering indexes to serve as a preliminary modulus-entering index.

Seventhly, determining that the number of input layer nodes of the neural network is 17 on the basis of the number of the preliminary mode entering indexes; determining the number of output layer nodes and the number of hidden layer layers of the neural network to be 2 respectively based on the fact that the training samples comprise positive samples and negative samples, and determining the number of nodes of the hidden layers to be 17 x 2 x 100 x 5 based on the learning speed, the number of initial mode-entering indexes and the number of hidden layer layers of the neural network;

the most common neural network activation functions include Sigmoid, Tanh, Softplus, Relu (Rectifier Liner Units), and the like, in this embodiment, it is determined that hidden layer output of the neural network is activated by using the Relu function, an output layer of the neural network is processed by using a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model with deep learning and logistic regression fused is constructed;

determining a weight search strategy of the neural network by determining four aspects of a loss function, an optimizer, a learning rate and an iteration number, wherein,

c) it is determined that the learning rate is 0.001,

(VIII) training the enterprise credit evaluation model by using the training samples and the prediction samples, and judging whether the enterprise credit evaluation model is converged and is over-fitted; based on the initial model-entering index filled with the missing value, sequentially replacing each column of index vectors of the initial model-entering index in the training sample vector by randomly generating a column of disturbance variables, inputting the generated new training sample vector into a determined neural network to obtain a predicted value, calculating a loss function according to the input vector and the output predicted value, circularly generating 100 times of disturbance variables for each model-entering index, calculating the average value of the loss functions obtained under 100 times of newly generated training sample vectors, and evaluating the importance of each initial model-entering index; screening preliminary model entry indexes by setting different threshold values, training an enterprise credit evaluation model by the preliminary model entry indexes of the prediction samples, comparing the manual labeling result of the prediction samples with the prediction result of the enterprise credit evaluation model, determining the final model entry indexes, and generating an optimal enterprise credit evaluation model.

When the credit of a certain enterprise is evaluated, government data, internet data and third-party data of the enterprise need to be converged, aligned and fused into a relational data sub-table, then the data of the relational data sub-table is input into an optimal enterprise credit evaluation model, and the optimal enterprise credit evaluation model predicts the default probability of the enterprise and passes through the default probability

A. Based on a WOE conversion method, the calculation of the feature score is carried out through the WOE value and the coefficient of the feature obtained by the prediction of the optimal enterprise credit evaluation model,

alternatively, the first and second electrodes may be,

B. based on the enterprise default probability predicted by the optimal enterprise credit evaluation model, the standard score is converted according to the default probability,

and obtaining standard credit scores, and then carrying out normal distribution test on the overall enterprise credit score distribution to determine final enterprise credit scores.

In conclusion, the enterprise credit assessment method based on the combination of deep learning and logistic regression can overcome the defect that a single data source covers a credit assessment dimension in one aspect, improve the credit scoring accuracy of enterprises and provide important assessment for financial credits of the enterprises.

Based on the above embodiments of the present invention, those skilled in the art should make any improvements and modifications to the present invention without departing from the principle of the present invention, and therefore, the present invention should fall into the protection scope of the present invention.

Claims

1. An enterprise credit assessment method based on the fusion of deep learning and logistic regression is characterized by comprising the following steps:

2. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 1, wherein in step S1,

the government data of the enterprise comprises various information of industry and commerce, public accumulation fund, social security, issuing and modifying commission, bank protection supervision and administrative penalty;

3. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 2, wherein in step S2, the relational data sub-tables of multiple enterprises are converged, aligned and fused to obtain at least one relational data total table, which is specifically operated as follows:

4. The method for enterprise credit assessment based on deep learning and logistic regression fusion as claimed in claim 1, wherein step S3 is to construct an enterprise credit index system with three-level indexes, and the specific operations thereof include:

s3.2, deriving the original index to form three-level index content,

5. The method according to claim 4, wherein the contents of the three-level index, the two-level index and the one-level index are sequentially decreased, wherein,

6. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 1, wherein in step S6, exploratory data analysis is performed on the data contained in the relational data table of the training sample and the three-tier indexes of the enterprise credit index system, and the method specifically comprises:

7. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 6, wherein in step S6, the data included in the relational data table of the training sample and the three-tier index of the enterprise credit index system are subjected to data cleansing, which is specifically performed by:

s6.2.3, performing missing value statistics on the three-layer indexes of the enterprise credit index system based on the data contained in the relational data table of the training sample, removing the three-layer indexes with missing values larger than 60%,

s6.2.4, based on the data contained in the relation data table of the training sample, making statistics of the same-value rate of the remaining three-layer indexes in the enterprise credit index system after step S6.2.3, removing the characteristic that the attribute has only one value, removing the three-layer indexes with the attribute same-value rate more than 60%,

s6.2.5, removing unreasonable indexes determined in the process of exploratory data analysis for the three layers of indexes left after step S6.2.4, performing VIF collinearity analysis,

8. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 7, wherein in step S7, an enterprise credit evaluation model based on the fusion of deep learning and logistic regression is constructed, and the process includes three stages of determining a neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;

s7.1, determining a neural network stage:

s4.2, determining an activation function stage of the neural network:

s4.3, determining the weight search strategy stage of the neural network:

c) it is determined that the learning rate is 0.001,

9. The method of claim 8, wherein in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final modeling index and the optimal enterprise credit assessment model are output, and the process specifically includes:

10. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 1, wherein in step S9, the enterprise default probability predicted by the optimal enterprise credit assessment model is converted into the standard credit score by the following two methods: