CN112017025B

CN112017025B - Enterprise credit assessment method based on fusion of deep learning and logistic regression

Info

Publication number: CN112017025B
Application number: CN202010868081.2A
Authority: CN
Inventors: 尹盼盼; 边松华; 崔乐乐
Original assignee: Tianyuan Big Data Credit Management Co Ltd
Current assignee: Tianyuan Big Data Credit Management Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-05-14
Anticipated expiration: 2040-08-26
Also published as: CN112017025A

Abstract

The invention discloses an enterprise credit assessment method based on fusion of deep learning and logistic regression, which relates to the technical field of financial credit and comprises the following steps: storing government, internet and third party data of enterprises in relational data sub-forms, and integrating a plurality of sub-forms into a total form to be stored in a standard data warehouse; screening warehouse data and constructing an enterprise credit index system; manually marking whether the enterprise violates, and randomly dividing corresponding data into a training sample and a prediction sample; based on the training sample and the index system, exploratory data analysis and data cleaning are carried out, and a preliminary modeling index is determined; establishing an enterprise credit assessment model integrating deep learning and logistic regression, training the model based on training samples and preliminary modeling indexes, and outputting final modeling indexes and an optimal model; the optimal model predicts the default probability of the enterprise and converts the default probability into credit scores. The invention can improve the credit scoring accuracy of enterprises and provide important evaluation for the financial credit of the enterprises.

Description

Enterprise credit assessment method based on fusion of deep learning and logistic regression

Technical Field

The invention relates to the technical field of financial credit, in particular to an enterprise credit assessment method based on fusion of deep learning and logistic regression.

Background

The credit score of an enterprise is one of important links for managing and controlling credit risks of the enterprise, overdue probability index references are provided according to existing data, the credit score is a means for measuring risk probabilities in a score mode, and generally the higher the score is, the safer the higher the score is. The enterprise credit score modeling usually adopts a machine learning modeling method of logistic regression, decision trees and combined models. With the popularization of the artificial intelligence technology in the field of financial wind control, a credit scoring model mainly based on the deep learning technology is also widely applied. The credit finance industry has the advantages that due to the characteristic of scattered small amount, users sink more, intelligent management is required to be continuously performed in each link of loan, approval, customer service and post-loan management, the risk of the users is reduced, and the deep learning technology is used for deep mining of the high-dimensional characteristics of the users to analyze the potential risk of the users, so that the credit approval service is more efficient and quick.

Deep learning is derived from neural networks, which enable the recognition of specific patterns by simulating the ability of the human brain to learn and process knowledge. Compared with the traditional scoring method, the deep learning parallel distribution processing capability is strong, the distribution storage and learning capability is strong, the method can be used in the supervision field (classification and prediction) and the non-supervision field (feature derivation), and the method can learn the feature association and pattern feature which are complicated and hidden in a large number of data features. The enterprise credit score based on the deep learning is one of expansion application of the deep learning technology in the enterprise credit score, and lays a foundation for building various models in the enterprise wind control field by applying the deep learning technology based on a large amount of data and features in the later stage.

The application field of the deep neural network is concentrated in a plurality of fields such as image recognition, voice recognition, natural language processing and the like, sun Zhena, wang Leyuan and the like propose a novel efficient iris image quality evaluation method based on the deep neural network, a feature extraction model is used for extracting a feature map of an iris image in an image, a reconstruction model is used for estimating an iris effective area thermodynamic diagram from the feature map of the iris image, and finally a quality prediction model takes the iris effective area as an interested area, and the integral quality fraction of the iris image is calculated from the feature map. The deep neural network is widely applied to the field of credit finance, the difference between image recognition and financial risk assessment is that the data preprocessing stage is different, the deep mining and analysis of the features after feature vector extraction are completed are communicated, and the algorithm can be reused. Based on the method, the deep neural network learning technology is applied to enterprise risk assessment in the field of financial credit by the research and development personnel, deep mining learning is conducted on the model entering features through the deep learning technology, and credit risk conditions of enterprises are comprehensively assessed.

Disclosure of Invention

Aiming at the needs and the shortcomings of the prior art development, the invention provides an enterprise credit assessment method based on the fusion of deep learning and logistic regression.

The invention discloses an enterprise credit assessment method based on fusion of deep learning and logistic regression, which solves the technical problems by adopting the following technical scheme:

An enterprise credit assessment method based on fusion of deep learning and logistic regression comprises the following steps:

S1, acquiring government data, internet data and third party data of a plurality of enterprises, and storing the government data, the Internet data and the third party data of the same enterprise in the same relational data sub-table;

S2, gathering, aligning and fusing relational data sub-tables of a plurality of enterprises into at least one relational data total table, and storing the relational data total table in a standard data warehouse;

S3, respectively screening data contained in all the relational data total tables in the standard data warehouse, and constructing an enterprise credit index system with three layers of indexes;

s4, marking the enterprise as an offending user or follow treaty user manually based on the data contained in the relational data sub-table of the enterprise, and then marking the related data of the offending user and follow treaty user in the relational data total table stored in the standard data warehouse respectively;

S5, randomly dividing default users and follow treaty users into a training sample and a prediction sample, wherein the number of users contained in the training sample is greater than that of users contained in the prediction sample, then splitting a relational data total table into two relational data tables according to a random division result, and correspondingly storing the two relational data tables obtained by splitting into the training sample and the prediction sample;

s6, exploratory data analysis and data cleaning are carried out on data contained in the relational data form of the training sample and three-layer indexes of the enterprise credit index system, and a preliminary modeling index is determined;

s7, constructing an enterprise credit assessment model integrating deep learning and logistic regression based on a neural network;

s8, training the enterprise credit assessment model constructed in the step S7 based on the training sample and the preliminary modeling index determined in the step S6, and outputting a final modeling index and an optimal enterprise credit assessment model;

S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model trained in the step S8, converting the default probability into a standard credit score, carrying out normal distribution inspection on the distribution of the credit scores of the whole enterprise, and determining the final credit score of the enterprise.

Optionally, in step S1, government data of the enterprise includes information of industry and commerce, public accumulation, social security, issuing and modifying commission, banking and protecting supervision and administrative penalty;

The internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, executed trust loss and bidding various information;

the third party data of the enterprise comprises enterprise business information, personnel information and personnel relationship data.

Further optionally, in step S2, the aggregation, alignment and fusion are performed on the relational data sub-tables of the multiple enterprises to obtain at least one relational data total table, and the specific operations include:

S2.1, a data aggregation stage: collecting enterprise data, wherein the enterprise data comprises government data, internet data and third party data of an enterprise, the government data of the enterprise are in butt joint in an interface form, the enterprise data cover enterprise background, electronic business data, judge documents, bidding and judicial data, and the third party data of the enterprise are in butt joint in an interface form, and the enterprise business information, personnel information and personnel relationship data are covered;

s2.2, a data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third party data in enterprise warehouse entry, and carrying out treatment processing on the three party data through an ETL data treatment tool;

S2.3, data fusion stage: and (3) carrying out horizontal and vertical data fusion on government data, internet data and third party data of a plurality of enterprises, fusing and converging the government data, the Internet data and the third party data into at least one relational data total table, uniformly storing the at least one relational data total table into a standard data warehouse, and storing three information of a standard library data, a processed index library and a characteristic library after the three party data fusion in the standard data warehouse.

Optionally, step S3 constructs an enterprise credit index system with three layers of indexes, which specifically includes:

s3.1, based on the business objective of enterprise credit evaluation, respectively combing all table fields of the relational data total table in the standard data warehouse to determine an original index,

S3.2, deriving the original index to form three-level index content,

S3.3, abstract and summarize the three-level index to form a second-level index content,

S3.4, analyzing the evaluation dimension of the enterprise credit embodied by the index by combining the contents of the third-level index and the second-level index to determine the content of the first-level index,

S3.5, constructing an enterprise credit index system covering three layers of indexes based on the three-level index content, the two-level index content and the first-level index content.

Preferably, the contents of the three-level index, the two-level index and the first-level index are sequentially reduced, wherein,

The content of the three-level index comprises specific enterprise credit indexes extracted through a relational data total table;

The content of the second-level index is enterprise credit index which is classified and arranged based on the third-level index and is integrated with business knowledge;

the content of the first-level index is an index for evaluating final determination of credit risk of an enterprise, and the first-level index comprises 7 indexes including repayment, industry, operation, performance, area, cash flow and operation, and is applied to radar chart display of an enterprise portrait for evaluating credit risk conditions of the enterprise in each subdivision dimension.

Optionally, in step S6, exploratory data analysis is performed on the data included in the relational data table of the training sample and the three-layer index of the enterprise credit index system, which specifically includes:

S6.1.1, carrying out description statistics on three layers of indexes of the data and enterprise credit index system contained in the relational data table of the training sample;

s6.1.2, analyzing the descriptive statistics of the step S6.1.1, namely, dividing the descriptive data of the specific index by using the index containing the time information as the specific index, so as to further deeply analyze the dynamic change condition of the data and the value condition under a specific condition;

S6.1.3 drawing a histogram curve of the univariate and a relation curve of the univariate and the target variable so as to perform visual analysis on the three-layer index.

Further optionally, in step S6, data cleaning is performed on the data included in the relational data table of the training sample and the three-layer index of the enterprise credit index system, which specifically includes:

s6.2.1, performing invalid value processing on three layers of indexes of the enterprise credit index system based on data contained in the relational data table of the training sample;

s6.2.2, carrying out numerical quantization on three quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample;

S6.2.3, carrying out missing value statistics on three layers of indexes of an enterprise credit index system based on data contained in a relational data table of a training sample, and removing three layers of indexes with missing values more than 60%;

S6.2.4, based on the data contained in the relational data table of the training sample, counting the same value rate of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.3, removing the characteristic that the attribute has only one value, and removing the three layers of indexes with the same value rate of the attribute being more than 60%;

S6.2.5, removing unreasonable indexes determined in the exploratory data analysis process for three layers of indexes remained after the step S6.2.4, and then performing VIF collinearity analysis;

s6.2.6, based on the missing value statistics performed in step S6.2.3, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample, and removing the dataset with the data missing ratio greater than 50%;

S6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, detecting abnormal values of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.5 by adopting a four-bit distance method of a box graph, screening abnormal values of part of indexes according to an upper quartile standard, filling the screened abnormal values with a specific value '-999' as a missing value,

S6.2.8, adopting a RandomForest random forest method to take the characteristic of the index without the missing value in the relational data table of the training sample after the step S6.2.7 as a characteristic variable, respectively selecting the index with the missing value in the relational data table of the training sample as an objective function, training RandomForest models by taking the characteristic variable and the non-missing value of the objective function as training samples, and enabling the trained RandomForest models to predict the missing value of the missing characteristic so as to finish filling of the missing index in all the training samples;

s6.2.9, performing Z-Score standardization processing on the training sample filled with the missing values to form a standardized training sample vector containing primary modeling indexes, which is used for training an enterprise credit assessment model.

Further optionally, in step S7, an enterprise credit assessment model fused by deep learning and logistic regression is constructed based on the neural network, and the process includes three stages of determining the neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;

S7.1, determining a neural network stage:

The neural network selects a multi-layer fully-connected neural network, and comprises three parts, namely an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary modulus indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in training samples, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;

s4.2, determining an activation function stage of the neural network:

the hidden layer output of the neural network is activated by Relu functions, the output layer of the neural network is treated by softmax activation functions, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit assessment model fused with deep learning and logistic regression is constructed;

S4.3, determining a weight search strategy stage of the neural network:

Based on the enterprise credit assessment model constructed in the step S4.2, determining a weight search strategy stage of the neural network comprises determining four aspects of a loss function, an optimizer, a learning rate and the iteration number, wherein,

A) Determine categorical crossentropy a classification cross-entropy function as a loss function of the enterprise credit assessment model,

B) The optimizer of the enterprise credit assessment model is determined to be tf.keras.optimizers.adam, to find the optimal value of the weight based on the change in the loss function,

C) The learning rate was determined to be 0.001,

D) And determining the iteration number of the enterprise credit assessment model to be 10000 times.

Further optionally, in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model entry index and the optimal enterprise credit assessment model are output, which specifically includes:

S8.1, training an enterprise credit assessment model: adopting tensorflow and keras open source package training step S7 to construct an enterprise credit assessment model, selecting python as a development training language of the enterprise credit assessment model, selecting a training sample and a prediction sample to conduct 10000 times of iterative training of the enterprise credit assessment model, drawing a learning curve of the enterprise credit assessment model in the training process, observing a loss function, a training sample accuracy and a prediction sample accuracy in the training process of the enterprise credit assessment model, and finally judging whether the enterprise credit assessment model converges or not and whether fitting is performed;

S8.2, performing primary modeling index importance assessment: the method comprises the steps of (1) carrying out Z-Score standardization processing on a training sample filled with missing values to form a standardized training sample vector containing preliminary modulus indexes, (2) randomly generating a list of disturbance variables to replace all the column index vectors of the preliminary modulus indexes in the training sample vector in sequence, generating new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, (3) circularly generating 100 disturbance variables for each modulus index, circularly executing the step (2), calculating the average value of the obtained loss function under the training sample vector newly generated 100 times, and evaluating the importance of each preliminary modulus index;

S8.3, iterative tuning is carried out on the enterprise credit assessment model: and sequencing the preliminary modeling indexes according to the average value of the obtained loss function, sequentially selecting different thresholds to screen the preliminary modeling indexes, inputting the preliminary modeling indexes of the predicted samples into an enterprise credit assessment model for training, comparing the manual labeling result of the predicted samples with the predicted result of the enterprise credit assessment model, determining the final modeling indexes, and generating an optimal enterprise credit assessment model.

Optionally, in step S9, the enterprise breach probability predicted by the optimal enterprise credit assessment model is converted into a standard credit score, which includes the following two methods:

s9a, calculating a feature score by predicting coefficients of the obtained features through a WOE value and an optimal enterprise credit assessment model based on a WOE conversion method;

S9b, predicting the obtained enterprise default probability based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.

The enterprise credit assessment method based on the fusion of deep learning and logistic regression has the beneficial effects compared with the prior art that:

1) The method is based on enterprise multi-source data fusion, performs operations such as data merging, data alignment, data fusion and the like on multi-source data, establishes an enterprise multi-dimensional credit assessment system on the basis of multi-source data fusion, has richer enterprise credit assessment dimension and more comprehensive assessment index, and overcomes the defect that the credit assessment dimension is one-sided when a single data source covers;

2) The neural network based on deep learning can deeply excavate important features in multi-dimensional features, can automatically extract and derive the features to obtain more important information, overcomes the defects of complex and non-automatic traditional high-dimensional feature importance assessment and feature extraction methods, is also suitable for parallel realization of large-scale training samples, has higher operation efficiency, and expands realization paths and realization scenes for enterprise credit assessment based on high-dimensional features and massive training samples;

3) The enterprise credit assessment method can score the credit of the enterprise, assist the enterprise to complete the financial credit, is particularly suitable for wind control prediction of large-data mass enterprises, and has wide application prospect.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.

Embodiment one:

referring to fig. 1, this embodiment proposes an enterprise credit assessment method based on fusion of deep learning and logistic regression, which includes the following steps:

S1, acquiring government data, internet data and third party data of a plurality of enterprises, and storing the government data, the internet data and the third party data of the same enterprise in the same relational data sub-table.

In the step, government data of enterprises comprise various information of industry and commerce, public accumulation, social security, issuing and modifying commission, banking and protecting supervision and administrative punishment; the internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, executed trust loss and bidding various information; the third party data of the enterprise comprises enterprise business information, personnel information and personnel relationship data.

S2, gathering, aligning and fusing relational data sub-tables of a plurality of enterprises into at least one relational data total table, and storing the relational data total table in a standard data warehouse, wherein the specific operation comprises the following steps:

S3, respectively screening data contained in all the relational data total tables in the standard data warehouse, and constructing an enterprise credit index system with three layers of indexes, wherein the specific operation comprises the following steps:

s3.1, based on the business objective of enterprise credit evaluation, respectively combing all table fields of a relational data total table in a standard data warehouse to determine an original index;

s3.2, deriving the original index to form three-level index content;

s3.3, abstract summarization is carried out on the three-level index to form two-level index content;

s3.4, analyzing the evaluation dimension of the enterprise credit reflected by the index by combining the contents of the third-level index and the second-level index, and determining the content of the first-level index;

From steps S3.1-S3.5, it can be known that the contents of the tertiary index, the secondary index, and the primary index are sequentially reduced, wherein,

S4, marking the enterprise as the default user or follow treaty user manually based on the data contained in the relational data sub-table of the enterprise, and then marking the related data of the default user and the follow treaty user in the relational data total table stored in the standard data warehouse respectively.

S5, randomly dividing the default users and follow treaty users into a training sample and a prediction sample, wherein the number of users contained in the training sample is greater than that of users contained in the prediction sample, then splitting the total relational data table into two relational data tables according to a random division result, and correspondingly storing the two relational data tables obtained by splitting into the training sample and the prediction sample.

S6, exploratory data analysis and data cleaning are carried out on the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system, and a preliminary modeling index is determined.

In this step, the specific operations for conducting the exploratory data analysis are:

S6.1.1, carrying out description statistics on three layers of indexes of data contained in a relational data table of a training sample and an enterprise credit index system, such as describing variance, mean, median and data distribution of each index;

In this step, the specific operation of data cleaning is:

s6.2.1, based on the data contained in the relational data table of the training sample, performing invalid value processing on the three layers of indexes of the enterprise credit index system,

S6.2.2, carrying out numerical quantization on three layers of quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample,

S6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, detecting abnormal values of three layers of indexes remaining in the enterprise credit index system after the step S6.2.5 by adopting a four-bit distance method of a box graph, screening abnormal values of part of indexes according to an upper quartile standard, and filling the screened abnormal values with a specific value '-999' as a missing value;

S7, constructing an enterprise credit assessment model fused by deep learning and logistic regression based on a neural network, wherein the process comprises three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network;

S7.1, determining a neural network stage:

s4.2, determining an activation function stage of the neural network:

S4.3, determining a weight search strategy stage of the neural network:

C) The learning rate was determined to be 0.001,

D) And determining the iteration number of the enterprise credit assessment model to be 10000 times. The iteration number determines whether the learning process is finished or not in the neural network model training process.

S8, training the enterprise credit assessment model constructed in the step S7 based on the training sample and the preliminary modeling index determined in the step S6, and outputting a final modeling index and an optimal enterprise credit assessment model, wherein the process specifically comprises the following steps:

S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model trained in the step S8, converting the default probability into a standard credit score, carrying out normal distribution inspection on the distribution of the credit scores of the whole enterprise, and determining the final credit score of the enterprise. Among these, there are two methods to translate the probability of breach into a standard credit score:

The enterprise credit assessment method of the embodiment specifically comprises the following execution flow:

And (I) selecting government data, internet data and third party data of 20 ten thousand enterprises, and storing the government data, the Internet data and the third party data of the same enterprise in the same relational data sub-table to obtain 20 ten thousand relational data sub-tables in total.

And (II) collecting, aligning and fusing the data contained in the 20 ten thousand relational data sub-tables to obtain a relational data total table, and storing the relational data total table in a standard data warehouse.

Thirdly, manually screening data contained in the relational data total table, and constructing an enterprise credit index system with three layers of indexes, wherein: the three-level indexes include 1042, such as the number of customs enterprise grades obtained in the last year, amount paid in capital, duration, personnel scale, whether to put on a blacklist, the number of the repeated credit rating of the last year contract, the number of the executed persons in the last year, the amount of the executed mark in the last year, the number of important business changes in the last year, the number of the external guarantee of the last year, the number of common business changes in the last three years, the number of the bulletin of the court of the last 6 months, the change of the camping scope of the last 6 menses, the number of the historical accumulated branches, the number of the offset branch structures, the number of the camping branches, the accumulated associated transaction number and the like; the second-level index is 17 indexes obtained after summarizing the third-level index, such as risk, legal representatives, association relation, management layer, industry, legal, stability, collateral, management, region and the like; the first-level index is an index for evaluating final determination of credit risk of enterprises and comprises 7 indexes in total of debt repayment, industry, operation, performance, area, cash flow and operation.

Fourth, manually labeling the enterprise as an offending user or follow treaty users based on the data contained in the 20 ten thousand relational data tables, wherein the offending user is 5 ten thousand, and the follow treaty user is 15 ten thousand; then, in the relational data total table stored in the standard data warehouse, relevant data of 5 ten thousand offending users and 15 ten thousand follow treaty users are marked respectively.

And fifthly, randomly dividing 5 ten thousand default users and 15 ten thousand follow treaty users into training samples and prediction samples according to the ratio of 7:3, wherein 3.2 ten thousand default users and 10.8 ten thousand follow treaty users are divided into training samples, the rest 1.8 ten thousand default users and 4.2 ten thousand follow treaty users are divided into prediction samples, then finding corresponding data in a relational data total table according to the random division result, dividing the relational data total table into a relational data table I and a relational data table II, storing the relational data table I into the training samples, and storing the relational data table II into the prediction samples. It should be noted that, the relevant data of a certain enterprise in the relational data total table is all divided into training samples or prediction samples.

Sixth, exploratory data analysis is carried out on the data contained in the first relational data table of the training sample and three-layer indexes of the enterprise credit index system, and 11 unreasonable indexes are removed;

Data cleaning is carried out on the data contained in the first relational data form of the training sample and three layers of indexes of the enterprise credit index system, and 546 indexes are left after invalid value processing, numerical quantization and missing value processing; the residual 40 indexes are subjected to the same value rate screening; adding 11 unreasonable indexes removed in the exploratory data analysis process, and then performing VIF collinearity analysis to remove relevant characteristics, wherein 17 modeling indexes remain; after screening and filtering the training samples and detecting abnormal values, 17 in-model indexes, wherein 10 indexes have missing values, 7 indexes do not have missing values, the characteristics of 7 non-missing indexes are taken as characteristic variables, 10 missing indexes are respectively selected as objective functions, the rest training samples after screening and filtering are used as training samples, a RandomForest model is trained, the trained RandomForest model can predict the missing values of missing characteristics, and filling of the missing indexes in all the training samples is completed; the missing value filling of the 17 mold entering indexes is used as a preliminary mold entering index.

Seventhly, determining 17 input layer nodes of the neural network based on the number of preliminary modulus indexes; based on the training samples including positive type samples and negative type samples, determining that the number of output layer nodes and the number of hidden layer layers of the neural network are respectively 2, and based on the learning speed, the number of preliminary modulus indexes and the number of hidden layer layers of the neural network, determining that the number of nodes of the hidden layer is 17 x 2 x 100 x 5;

The most common neural network activation functions comprise Sigmoid, tanh, softplus, relu (rectifiers RECTIFIER LINER Units) and the like, the embodiment determines that the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit assessment model fused with deep learning and logistic regression is constructed;

determining a weight search strategy of the neural network by determining four aspects of a loss function, an optimizer, a learning rate and an iteration number, wherein,

C) The learning rate was determined to be 0.001,

Training an enterprise credit assessment model by using a training sample and a prediction sample, and judging whether the enterprise credit assessment model converges or not and whether fitting is performed or not; based on the preliminary modulus indexes filled with the missing values, sequentially replacing each column of index vectors of the preliminary modulus indexes in the training sample vectors by randomly generating a column of disturbance variables, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, circularly generating 100 disturbance variables for each modulus index, calculating the average value of the loss function obtained under the 100 newly generated training sample vectors, and evaluating the importance of each preliminary modulus index; and screening the preliminary modeling indexes by setting unused thresholds, training the enterprise credit assessment model by using the preliminary modeling indexes of the prediction samples, determining the final modeling indexes by comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit assessment model, and generating the optimal enterprise credit assessment model.

(Nine) when evaluating the credit of an enterprise, the government data, internet data and third party data of the enterprise are required to be converged, aligned and fused into a relational data sub-form, then the data of the relational data sub-form is input into an optimal enterprise credit evaluation model, and the optimal enterprise credit evaluation model predicts the probability of default of the enterprise and passes through

A. Based on the WOE conversion method, calculating the feature score by predicting the coefficient of the obtained feature through the WOE value and the optimal enterprise credit evaluation model,

Or alternatively

B. predicting the obtained enterprise breach probability based on the optimal enterprise credit evaluation model, converting the standard score according to the breach probability,

And obtaining a standard credit score, and then carrying out normal distribution inspection on the whole enterprise credit score distribution to determine a final enterprise credit score.

In summary, by adopting the enterprise credit assessment method based on the combination of deep learning and logistic regression, the defect that the credit assessment dimension is relatively single-sided due to single data source coverage can be overcome, the credit scoring accuracy of the enterprise is improved, and important assessment is provided for financial credit of the enterprise.

Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims

1. The enterprise credit assessment method based on the fusion of deep learning and logistic regression is characterized by comprising the following steps:

s7, constructing an enterprise credit assessment model integrating deep learning and logistic regression based on the neural network, comprising three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network,

S7.1, determining a neural network stage:

The neural network selects a multi-layer fully-connected neural network, which comprises three parts of an input layer, a hidden layer and an output layer, wherein the number of nodes of the input layer of the neural network is the number of input preliminary modulus indexes, the number of nodes of the output layer of the neural network corresponds to the number of sample categories contained in training samples, the number of hidden layer of the neural network is equal to the number of nodes of the output layer of the neural network, the number of nodes of the hidden layer of the neural network is a multiple of the product of the number of nodes of the input layer and the number of nodes of the output layer,

S7.2, determining an activation function stage of the neural network:

The hidden layer output of the neural network is activated by Relu functions, the output layer of the neural network is treated by softmax activation functions, the output layer of the neural network is fused with a logistic regression method, an enterprise credit assessment model fused with deep learning and logistic regression is constructed,

S7.3, determining a weight search strategy stage of the neural network:

Based on the enterprise credit assessment model constructed in step S7.2, determining a weight search strategy stage of the neural network includes determining four aspects of a loss function, an optimizer, a learning rate and a number of iterations, wherein,

C) The learning rate was determined to be 0.001,

D) Determining that the iteration number of the enterprise credit assessment model is 10000 times;

2. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 1, wherein in step S1,

Government data of enterprises comprise various information of industry and commerce, accumulation of money, social security, issuing and modifying commission, banking and administrative punishment;

the internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, information that the belief loss is executed and bidding various information;

3. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 2, wherein in step S2, the relational data sub-tables of the multiple enterprises are collected, aligned and combined to obtain at least one relational data total table, which specifically comprises the following operations:

S2.1, a data aggregation stage: collecting enterprise data, wherein the enterprise data comprises government data, internet data and third party data of an enterprise, the government data of the enterprise are in butt joint in an interface form, and cover public accumulation, social security, industry and commerce, tax, food and drug monitoring and silver security monitoring, the Internet data of the enterprise cover enterprise background, electronic commerce data, judge documents, bidding and judicial data, and the third party data of the enterprise are in butt joint in an interface form, and cover enterprise business information, personnel information and relationship data of the personnel and the enterprise;

S2.3, data fusion stage: and carrying out horizontal and vertical data fusion on government data, internet data and third party data of a plurality of enterprises, fusing and converging the government data, the Internet data and the third party data into at least one relational data total table, uniformly storing the at least one relational data total table into a standard data warehouse, and storing three information of a standard library data, a processed index library and a characteristic library after the three party data fusion in the standard data warehouse.

4. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 1, wherein step S3 constructs an enterprise credit index system with three layers of indexes, and the specific operations thereof include:

S3.2, deriving the original index to form three-level index content,

5. The enterprise credit assessment method based on deep learning and logistic regression fusion according to claim 4, wherein the contents of the tertiary, secondary and primary indices decrease in sequence, wherein,

The content of the second-level index is enterprise credit index which is integrated with business knowledge classification and arrangement based on the third-level index;

6. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 1, wherein in step S6, exploratory data analysis is performed on the data contained in the relational data table of the training sample and the three-layer index of the enterprise credit index system, and the specific operations are as follows:

7. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 6, wherein in step S6, data cleansing is performed on the three-layer index of the enterprise credit index system and the data contained in the relational data table of the training sample, which specifically comprises the following steps:

S6.2.3, carrying out missing value statistics on three layers of indexes of an enterprise credit index system based on data contained in a relational data table of a training sample, removing three layers of indexes with missing values more than 60%,

S6.2.4, based on the data contained in the relational data table of the training sample, carrying out statistics of the same value rate on the three layers of indexes remained in the enterprise credit index system after the step S6.2.3, removing the characteristic that the attribute has only one value, removing the three layers of indexes with the same value rate of more than 60 percent,

S6.2.5, firstly removing unreasonable indexes determined in the exploratory data analysis process for three layers of indexes remained after the step S6.2.4, then performing VIF collinearity analysis,

8. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 7, wherein in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model-in index and the optimal enterprise credit assessment model are output, and the process specifically includes:

9. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 1, wherein in step S9, the enterprise breach probability predicted by the optimal enterprise credit assessment model is converted into a standard credit score by the following two methods: