CN112017025A - Enterprise credit assessment method based on fusion of deep learning and logistic regression - Google Patents

Enterprise credit assessment method based on fusion of deep learning and logistic regression Download PDF

Info

Publication number
CN112017025A
CN112017025A CN202010868081.2A CN202010868081A CN112017025A CN 112017025 A CN112017025 A CN 112017025A CN 202010868081 A CN202010868081 A CN 202010868081A CN 112017025 A CN112017025 A CN 112017025A
Authority
CN
China
Prior art keywords
data
enterprise
indexes
enterprise credit
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010868081.2A
Other languages
Chinese (zh)
Other versions
CN112017025B (en
Inventor
尹盼盼
边松华
崔乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyuan Big Data Credit Management Co Ltd
Original Assignee
Tianyuan Big Data Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyuan Big Data Credit Management Co Ltd filed Critical Tianyuan Big Data Credit Management Co Ltd
Priority to CN202010868081.2A priority Critical patent/CN112017025B/en
Priority claimed from CN202010868081.2A external-priority patent/CN112017025B/en
Publication of CN112017025A publication Critical patent/CN112017025A/en
Application granted granted Critical
Publication of CN112017025B publication Critical patent/CN112017025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Abstract

The invention discloses an enterprise credit assessment method based on the integration of deep learning and logistic regression, which relates to the technical field of financial credit and comprises the following steps: storing government, internet and third-party data of an enterprise in a relational data sub-table, and merging a plurality of sub-tables into a total table to be stored in a standard data warehouse; screening warehouse data, and constructing an enterprise credit index system; manually marking whether an enterprise violates a contract or not, and randomly dividing corresponding data into a training sample and a prediction sample; performing exploratory data analysis and data cleaning based on a training sample and an index system, and determining a primary mould entering index; constructing an enterprise credit evaluation model integrating deep learning and logistic regression, training the model based on the training sample and the preliminary model entry index, and outputting a final model entry index and an optimal model; the optimal model predicts the default probability of the enterprise and converts the default probability into credit scores. The invention can improve the credit scoring accuracy of enterprises and provide important assessment for financial credit of the enterprises.

Description

Enterprise credit assessment method based on fusion of deep learning and logistic regression
Technical Field
The invention relates to the technical field of financial credit, in particular to an enterprise credit evaluation method based on the fusion of deep learning and logistic regression.
Background
The enterprise credit score is one of important links for credit risk management and control of enterprises, overdue probability index reference is provided according to existing data, a means for measuring risk probability in a score mode is adopted, and generally, the higher the score is, the safer the score is. The enterprise credit score modeling usually adopts a machine learning modeling method of logistic regression, decision trees and combined models. With the popularization of the application of the artificial intelligence technology in the field of financial wind control, a credit scoring model based on the deep learning technology is also widely applied. The credit finance industry has the characteristic of small amount dispersion, a user sinks more, and needs to continuously perfect intellectualization in each link of loan, examination and approval, customer service and post-loan management, so that the risk of the user is reduced, the high-dimensional characteristics of the user are deeply excavated by using a deep learning technology to analyze the potential risk of the user, and the credit examination and approval service is more efficient and faster.
Deep learning is derived from a neural network, and recognition of a specific mode is realized by simulating the ability of human brain to learn and process knowledge. Compared with the traditional scoring method, the deep learning parallel distribution processing method has strong parallel distribution processing capacity and strong distribution storage and learning capacity, can be used in the supervision field (classification and prediction) and the unsupervised field (feature derivation), and can learn the intricate and complex hidden feature association and mode features in a large number of data features. The enterprise credit score based on deep learning is one of the extended applications of the deep learning technology in the enterprise credit score, and lays a foundation for establishing various models in the enterprise wind control field by applying the deep learning technology based on a large amount of data and characteristics in the later period.
The application field of the deep neural network is focused on a plurality of fields of image recognition, voice recognition, natural language processing and the like, and the Sun 'Min' and Wanglan propose a novel efficient iris image quality evaluation method based on the deep neural network. The deep neural network is also widely applied in the credit finance field, the difference between image identification and financial risk assessment is that the stages of data preprocessing are different, deep excavation and analysis aiming at features are communicated after feature vector extraction is completed, and an algorithm can be reused. Based on the method, research and development personnel apply the deep neural network learning technology to enterprise risk evaluation in the field of financial credit, deeply excavate and learn the model-entering characteristics through the deep learning technology, and comprehensively evaluate the credit risk condition of an enterprise.
Disclosure of Invention
Aiming at the requirements and the defects of the prior art development, the invention provides an enterprise credit evaluation method based on the fusion of deep learning and logistic regression.
The invention discloses an enterprise credit assessment method based on the integration of deep learning and logistic regression, which adopts the following technical scheme for solving the technical problems:
an enterprise credit assessment method based on integration of deep learning and logistic regression comprises the following steps:
s1, acquiring government data, internet data and third-party data of multiple enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table;
s2, converging, aligning and fusing the relational data sub-tables of the plurality of enterprises into at least one relational data general table, and storing the relational data general table in a standard data warehouse;
s3, screening data contained in all relational data general tables in the standard data warehouse respectively, and constructing an enterprise credit index system with three levels of indexes;
s4, manually marking the enterprise as a default user or a conservative user based on the data contained in the relational data sub-table of the enterprise, and then respectively marking the related data of the default user and the conservative user in a relational data general table stored in a standard data warehouse;
s5, dividing default users and conservative users into training samples and prediction samples at random, wherein the number of users contained in the training samples is more than that of users contained in the prediction samples, dividing the relational data total table into two relational data tables according to the result of random division, and correspondingly storing the two divided relational data tables into the training samples and the prediction samples;
s6, performing exploratory data analysis and data cleaning on data contained in the relational data table of the training sample and three-layer indexes of an enterprise credit index system, and determining a primary modeling index;
s7, constructing an enterprise credit evaluation model integrating deep learning and logistic regression based on the neural network;
s8, training the enterprise credit evaluation model constructed in the step S7 based on the training samples and the preliminary model entry indexes determined in the step S6, and outputting final model entry indexes and an optimal enterprise credit evaluation model;
and S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model obtained by training in the step S8, converting the default probability into standard credit score, carrying out normal distribution test on the whole enterprise credit score distribution, and determining the final enterprise credit score.
Optionally, in step S1, the government data of the enterprise includes various information of industry and commerce, public deposit, social security, committee for modification, bank supervision, and administrative penalty;
the internet data of the enterprise comprises E-commerce data, marketing information, affirmation information, online store information, lawsuits, information of information;
the third-party data of the enterprise comprises enterprise business information, personnel information and various items of information of the human-enterprise relationship data.
Further optionally, in step S2, the relational data sub-tables of multiple enterprises are aggregated, aligned, and fused to obtain at least one relational data total table, where the specific operations include:
s2.1, data aggregation stage: enterprise data is collected, wherein the enterprise data comprises government data, internet data and third-party data of an enterprise, the government data of the enterprise is in butt joint in an interface form and covers accumulation fund, social security, industrial and commercial, tax, food and drug administration and bank protection administration, the internet data of the enterprise covers enterprise background, E-commerce data, referee documents, bid and judicial data, and the third-party data of the enterprise is in butt joint in an interface form and covers enterprise industrial and commercial information, personnel information and human-enterprise relationship data;
s2.2, data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third-party data which are put in a warehouse of an enterprise, and carrying out treatment processing on the three-party data through an ETL data treatment tool;
s2.3, data fusion stage: the method comprises the steps of performing horizontal and vertical data fusion on government data, internet data and third-party data of multiple enterprises, fusing and converging the government data, the internet data and the third-party data into at least one relational data general table, uniformly storing the at least one relational data general table into a standard data warehouse, and storing three items of information of standard library data, an index library and a feature library which are obtained by processing after the three-party data are fused in the standard data warehouse.
Optionally, step S3 is to construct an enterprise credit index system with three levels of indexes, where the specific operations include:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing each form field of the relational data total form in the standard data warehouse to determine the original index,
s3.2, deriving the original index to form three-level index content,
s3.3, abstracting and summarizing the three-level indexes to form second-level index content,
s3.4, analyzing the evaluation dimension of the enterprise credit embodied by the indexes by combining the contents of the third-level indexes and the second-level indexes, determining the contents of the first-level indexes,
and S3.5, constructing an enterprise credit index system covering three layers of indexes based on the contents of the three-level indexes, the contents of the two-level indexes and the contents of the first-level indexes.
Preferably, the contents of the third level index, the second level index and the first level index are reduced in sequence, wherein,
the content of the third-level indexes comprises specific enterprise credit indexes extracted through a relational data total table;
the content of the second-level index is an enterprise credit index which is integrated with business knowledge classification and arrangement on the basis of the third-level index;
the content of the primary index is an index finally determined by evaluating the credit risk of the enterprise, and the primary index comprises 7 indexes of repayment, industry, operation, performance, region, cash flow and operation, and is applied to radar map display of an enterprise portrait to evaluate the credit risk condition of the enterprise on each subdivision dimension.
Optionally, in step S6, exploratory data analysis is performed on data included in the relational data table of the training sample and three-layer indexes of the enterprise credit index system, and the specific operations are as follows:
s6.1.1, describing and counting the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system;
s6.1.2, analyzing the description statistics of step S6.1.1, referring the index containing time information as a specific index, and segmenting the description data of the specific index to further deeply analyze the dynamic change situation of the data and the value taking situation under a certain specific condition;
s6.1.3, drawing a histogram curve of the univariates and a relation curve of the univariates and the target variable so as to perform visual analysis on the three-layer indexes.
Further optionally, in step S6, data cleaning is performed on data included in the relational data table of the training sample and three-level indexes of the enterprise credit index system, and the specific operations are as follows:
s6.2.1, carrying out invalid value processing on the three-layer indexes of the enterprise credit index system based on the data contained in the relational data table of the training sample;
s6.2.2, carrying out numerical quantification on three layers of quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample;
s6.2.3, performing missing value statistics on three-layer indexes of the enterprise credit index system based on data contained in the relational data table of the training sample, and removing the three-layer indexes with missing values larger than 60%;
s6.2.4, carrying out statistics of the same-value rate on the remaining three-layer indexes in the enterprise credit index system after step S6.2.3 based on the data contained in the relational data table of the training sample, removing the characteristic that the attribute has only one value, and removing the three-layer indexes of which the attribute same-value rate is more than 60%;
s6.2.5, removing unreasonable indexes determined in the process of exploratory data analysis for the three layers of indexes left after step S6.2.4, and then performing VIF collinearity analysis;
s6.2.6, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample based on the missing value statistics carried out in the step S6.2.3, and removing the data set with the data missing ratio larger than 50%;
s6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, carrying out abnormal value detection on the three-layer indexes left in the enterprise credit index system after the step S6.2.5 by adopting a quartile distance method of a box diagram, screening abnormal values of partial indexes according to the quartile standard, filling the screened abnormal values serving as missing values with a specific value "-999",
s6.2.8, using a random forest method to take the characteristics of the indexes without missing values in the relational data table of the training sample after step S6.2.7 as characteristic variables, selecting the indexes with missing values in the relational data table of the training sample as target functions, taking the characteristic variables and the non-missing values of the target functions as training samples, training a random forest model, and completing the filling of the missing indexes in all the training samples, wherein the trained random forest model can predict the missing values of the missing characteristics;
s6.2.9, carrying out Z-Score standardization treatment on the training samples filled with the missing values to form standardized training sample vectors containing the preliminary model-entering indexes for carrying out the training of the enterprise credit evaluation model.
Further optionally, in step S7, an enterprise credit evaluation model integrating deep learning and logistic regression is constructed based on the neural network, and the process includes three stages of determining the neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;
s7.1, determining a neural network stage:
the neural network selects a multilayer fully-connected neural network, which comprises an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary mode-entering indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in a training sample, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;
s4.2, determining an activation function stage of the neural network:
the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model integrating deep learning and logistic regression is constructed;
s4.3, determining the weight search strategy stage of the neural network:
based on the enterprise credit evaluation model constructed in the step S4.2, the stage of determining the weight search strategy of the neural network includes four aspects of determining a loss function, an optimizer, a learning rate and an iteration number, wherein,
a) determining a categorical cross-entropy classification function as a loss function for the enterprise credit assessment model,
b) determine an optimizer of an enterprise credit assessment model to be tf.keras.optimizers.adam, to find an optimal value of the weight according to a change of the loss function,
c) it is determined that the learning rate is 0.001,
d) and determining the number of iterations of the enterprise credit evaluation model to be 10000.
Further optionally, in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model entry index and the optimal enterprise credit assessment model are output, where the process specifically includes:
s8.1, training an enterprise credit evaluation model: adopting an open source package of tensierflow and keras to train the enterprise credit evaluation model constructed in the step S7, selecting python as a development training language of the enterprise credit evaluation model, selecting a training sample and a prediction sample to carry out 10000 times of iterative training of the enterprise credit evaluation model, drawing a learning curve of the enterprise credit evaluation model in the training process, observing a loss function, the accuracy rate of the training sample and the accuracy rate of the prediction sample in the training process of the enterprise credit evaluation model, and finally judging whether the enterprise credit evaluation model is converged and is over-fitted;
s8.2, carrying out importance evaluation on the initial mould entering index: (1) performing Z-Score standardization processing on training samples filled with missing values to form training sample vectors containing preliminary model-entering indexes after standardization, (2) randomly generating a row of disturbance variables to sequentially replace each row of index vectors of the preliminary model-entering indexes in the training sample vectors and generate new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating loss functions according to the input vectors and the output predicted values, (3) circularly generating 100 times of disturbance variables for each model-entering index, circularly executing the step (2), calculating the average value of the obtained loss functions under the 100 times of newly generated training sample vectors, and evaluating the importance of each preliminary model-entering index;
s8.3, performing iterative tuning on the enterprise credit evaluation model: and sorting the initial modeling indexes according to the obtained loss function average value, sequentially selecting different threshold values to screen the initial modeling indexes, inputting the initial modeling indexes of the prediction samples into the enterprise credit evaluation model for training, comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit evaluation model, determining the final modeling indexes, and generating the optimal enterprise credit evaluation model.
Optionally, in step S9, the enterprise default probability predicted by the optimal enterprise credit evaluation model is converted into a standard credit score, and there are the following two methods:
s9a, calculating a feature score through a WOE value and a coefficient of a feature obtained by predicting an optimal enterprise credit evaluation model based on a WOE conversion method;
and S9b, predicting the obtained default probability of the enterprise based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
Compared with the prior art, the enterprise credit assessment method based on the integration of deep learning and logistic regression has the beneficial effects that:
1) the enterprise multi-source data fusion-based enterprise multi-dimensional credit assessment method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise multi-dimensional credit assessment system is established on the basis of the multi-source data fusion, the enterprise credit assessment dimension is richer, the assessment index is more comprehensive, and the defect that a single data source covers the credit assessment dimension and is more comprehensive is overcome;
2) on one hand, the neural network based on deep learning can deeply excavate important features in multi-dimensional features, can automatically extract and derive the features to obtain more important information, overcomes the defects that the traditional high-dimensional data feature importance evaluation and feature extraction method is complex and non-automatic, is also suitable for parallel implementation of large-scale training samples, has higher operation efficiency, and expands the implementation path and implementation scene of enterprise credit evaluation based on the high-dimensional features and massive training samples;
3) the enterprise credit evaluation method can be used for scoring the credit of the enterprise and assisting the enterprise to finish financial credit, is particularly suitable for wind control prediction of large-data mass enterprises, and has a wide application prospect.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the technical scheme, the technical problems to be solved and the technical effects of the present invention more clearly apparent, the following technical scheme of the present invention is clearly and completely described with reference to the specific embodiments.
The first embodiment is as follows:
with reference to fig. 1, the embodiment provides an enterprise credit evaluation method based on the fusion of deep learning and logistic regression, which includes the following steps:
and S1, acquiring government data, internet data and third-party data of a plurality of enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table.
In the step, the government data of the enterprise comprises various information of industry and commerce, public accumulation fund, social security, committee for change, bank security supervision and administrative penalty; the internet data of the enterprise comprises E-commerce data, marketing information, affirmation information, online store information, lawsuits, information of information; the third-party data of the enterprise comprises enterprise business information, personnel information and various items of information of the human-enterprise relationship data.
S2, the relational data sub-tables of multiple enterprises are converged, aligned and fused into at least one relational data total table, and the relational data total table is stored in a standard data warehouse, and the specific operation of the relational data total table comprises the following steps:
s2.1, data aggregation stage: enterprise data is collected, wherein the enterprise data comprises government data, internet data and third-party data of an enterprise, the government data of the enterprise is in butt joint in an interface form and covers accumulation fund, social security, industrial and commercial, tax, food and drug administration and bank protection administration, the internet data of the enterprise covers enterprise background, E-commerce data, referee documents, bid and judicial data, and the third-party data of the enterprise is in butt joint in an interface form and covers enterprise industrial and commercial information, personnel information and human-enterprise relationship data;
s2.2, data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third-party data which are put in a warehouse of an enterprise, and carrying out treatment processing on the three-party data through an ETL data treatment tool;
s2.3, data fusion stage: the method comprises the steps of performing horizontal and vertical data fusion on government data, internet data and third-party data of multiple enterprises, fusing and converging the government data, the internet data and the third-party data into at least one relational data general table, uniformly storing the at least one relational data general table into a standard data warehouse, and storing three items of information of standard library data, an index library and a feature library which are obtained by processing after the three-party data are fused in the standard data warehouse.
S3, screening data contained in all relational data general tables in the standard data warehouse respectively, and constructing an enterprise credit index system with three-layer indexes, wherein the specific operation comprises the following steps:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing each form field of a relational data total form in a standard data warehouse to determine an original index;
s3.2, deriving the original index to form third-level index content;
s3.3, abstracting and summarizing the three-level indexes to form second-level index content;
s3.4, analyzing the evaluation dimension of the enterprise credit embodied by the indexes by combining the contents of the third-level indexes and the second-level indexes, and determining the contents of the first-level indexes;
and S3.5, constructing an enterprise credit index system covering three layers of indexes based on the contents of the three-level indexes, the contents of the two-level indexes and the contents of the first-level indexes.
From steps S3.1-S3.5, it can be seen that the contents of the tertiary index, the secondary index, and the primary index decrease in sequence, wherein,
the content of the third-level indexes comprises specific enterprise credit indexes extracted through a relational data total table;
the content of the second-level index is an enterprise credit index which is integrated with business knowledge classification and arrangement on the basis of the third-level index;
the content of the primary index is an index finally determined by evaluating the credit risk of the enterprise, and the primary index comprises 7 indexes of repayment, industry, operation, performance, region, cash flow and operation, and is applied to radar map display of an enterprise portrait to evaluate the credit risk condition of the enterprise on each subdivision dimension.
And S4, manually marking the enterprises as default users or conservation users based on the data contained in the relational data sub-tables of the enterprises, and then respectively marking the relevant data of the default users and the conservation users in the relational data general tables stored in the standard data warehouse.
S5, dividing default users and conservative users into training samples and prediction samples at random, wherein the number of users contained in the training samples is more than that of users contained in the prediction samples, then dividing the relational data total table into two relational data tables according to the result of random division, and correspondingly storing the two divided relational data tables into the training samples and the prediction samples.
And S6, performing exploratory data analysis and data cleaning on data contained in the relational data table of the training sample and three-layer indexes of an enterprise credit index system, and determining a primary modeling index.
In this step, the specific operations for performing exploratory data analysis are as follows:
s6.1.1, describing and counting the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system, such as the variance, the mean, the median and the data distribution of each index;
s6.1.2, analyzing the description statistics of step S6.1.1, referring the index containing time information as a specific index, and segmenting the description data of the specific index to further deeply analyze the dynamic change situation of the data and the value taking situation under a certain specific condition;
s6.1.3, drawing a histogram curve of the univariates and a relation curve of the univariates and the target variable so as to perform visual analysis on the three-layer indexes.
In this step, the specific operations for data cleaning are as follows:
s6.2.1, based on the data contained in the relational data table of the training sample, carrying out invalid value processing on the three-layer indexes of the enterprise credit index system,
s6.2.2, based on the data contained in the relational data table of the training sample, the three layers of indexes which can be quantified in the enterprise credit index system are quantified numerically,
s6.2.3, performing missing value statistics on three-layer indexes of the enterprise credit index system based on data contained in the relational data table of the training sample, and removing the three-layer indexes with missing values larger than 60%;
s6.2.4, carrying out statistics of the same-value rate on the remaining three-layer indexes in the enterprise credit index system after step S6.2.3 based on the data contained in the relational data table of the training sample, removing the characteristic that the attribute has only one value, and removing the three-layer indexes of which the attribute same-value rate is more than 60%;
s6.2.5, removing unreasonable indexes determined in the process of exploratory data analysis for the three layers of indexes left after step S6.2.4, and then performing VIF collinearity analysis;
s6.2.6, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample based on the missing value statistics carried out in the step S6.2.3, and removing the data set with the data missing ratio larger than 50%;
s6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, carrying out abnormal value detection on the residual three-layer indexes in the enterprise credit index system after the step S6.2.5 by adopting a quartile distance method of a box diagram, screening abnormal values of partial indexes according to the quartile standard, and filling the screened abnormal values serving as missing values with a specific numerical value "-999";
s6.2.8, using a random forest method to take the characteristics of the indexes without missing values in the relational data table of the training sample after step S6.2.7 as characteristic variables, selecting the indexes with missing values in the relational data table of the training sample as target functions, taking the characteristic variables and the non-missing values of the target functions as training samples, training a random forest model, and completing the filling of the missing indexes in all the training samples, wherein the trained random forest model can predict the missing values of the missing characteristics;
s6.2.9, carrying out Z-Score standardization treatment on the training samples filled with the missing values to form standardized training sample vectors containing the preliminary model-entering indexes for carrying out the training of the enterprise credit evaluation model.
S7, constructing an enterprise credit evaluation model integrating deep learning and logistic regression based on the neural network, wherein the process comprises three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network;
s7.1, determining a neural network stage:
the neural network selects a multilayer fully-connected neural network, which comprises an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary mode-entering indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in a training sample, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;
s4.2, determining an activation function stage of the neural network:
the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model integrating deep learning and logistic regression is constructed;
s4.3, determining the weight search strategy stage of the neural network:
based on the enterprise credit evaluation model constructed in the step S4.2, the stage of determining the weight search strategy of the neural network includes four aspects of determining a loss function, an optimizer, a learning rate and an iteration number, wherein,
a) determining a categorical cross-entropy classification function as a loss function for the enterprise credit assessment model,
b) determine an optimizer of an enterprise credit assessment model to be tf.keras.optimizers.adam, to find an optimal value of the weight according to a change of the loss function,
c) it is determined that the learning rate is 0.001,
d) and determining the number of iterations of the enterprise credit evaluation model to be 10000. The iteration times determine whether the learning process is finished or not in the training process of the neural network model.
S8, training the enterprise credit evaluation model constructed in the step S7 based on the training samples and the preliminary model entry indexes determined in the step S6, and outputting the final model entry indexes and the optimal enterprise credit evaluation model, wherein the process specifically comprises the following steps:
s8.1, training an enterprise credit evaluation model: adopting an open source package of tensierflow and keras to train the enterprise credit evaluation model constructed in the step S7, selecting python as a development training language of the enterprise credit evaluation model, selecting a training sample and a prediction sample to carry out 10000 times of iterative training of the enterprise credit evaluation model, drawing a learning curve of the enterprise credit evaluation model in the training process, observing a loss function, the accuracy rate of the training sample and the accuracy rate of the prediction sample in the training process of the enterprise credit evaluation model, and finally judging whether the enterprise credit evaluation model is converged and is over-fitted;
s8.2, carrying out importance evaluation on the initial mould entering index: (1) performing Z-Score standardization processing on training samples filled with missing values to form training sample vectors containing preliminary model-entering indexes after standardization, (2) randomly generating a row of disturbance variables to sequentially replace each row of index vectors of the preliminary model-entering indexes in the training sample vectors and generate new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating loss functions according to the input vectors and the output predicted values, (3) circularly generating 100 times of disturbance variables for each model-entering index, circularly executing the step (2), calculating the average value of the obtained loss functions under the 100 times of newly generated training sample vectors, and evaluating the importance of each preliminary model-entering index;
s8.3, performing iterative tuning on the enterprise credit evaluation model: and sorting the initial modeling indexes according to the obtained loss function average value, sequentially selecting different threshold values to screen the initial modeling indexes, inputting the initial modeling indexes of the prediction samples into the enterprise credit evaluation model for training, comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit evaluation model, determining the final modeling indexes, and generating the optimal enterprise credit evaluation model.
And S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model obtained by training in the step S8, converting the default probability into standard credit score, carrying out normal distribution test on the whole enterprise credit score distribution, and determining the final enterprise credit score. There are two methods for converting the default probability into the standard credit score:
s9a, calculating a feature score through a WOE value and a coefficient of a feature obtained by predicting an optimal enterprise credit evaluation model based on a WOE conversion method;
and S9b, predicting the obtained default probability of the enterprise based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
The enterprise credit evaluation method of the embodiment specifically executes the following processes:
and (I) selecting government data, internet data and third-party data of 20 ten thousand enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table to obtain 20 ten thousand relational data sub-tables.
And (II) after data contained in 20 ten thousand relational data sub-tables are gathered, aligned and fused, a relational data total table is obtained and stored in a standard data warehouse.
(III) manually screening data contained in the relational data general table, and constructing an enterprise credit index system with three layers of indexes, wherein: 1042 tertiary indexes, such as number of customs enterprise grades acquired in the last year, real payment capital of an enterprise, duration, personnel scale, blacklisting, number of times of close contractual credit rating in the last year, number of times of executed persons brought into the last year, amount of executed targets in the last year, number of times of change of important business in the last year, number of external guarantee in the last year, number of times of change of common business in the last three years, number of times of bulletin of court in the last 6 months, change of operating range in the last 6 months, historical accumulated branch number, number of overhead branch structures, number of branches in operation, accumulated associated transaction number and the like; the second-level indexes are 17 indexes obtained after the third-level indexes are summarized, such as risks, legal representatives, association relations, management layers, industries, legality, stability, collateral products, management, regions and the like; the primary indexes are finally determined indexes for evaluating the credit risk of the enterprise, and comprise 7 indexes in total, namely, repayment, industry, operation, performance, region, cash flow and operation.
(IV) manually marking the enterprise as default users or conservative users based on the data contained in 20 ten thousand relational data tables, wherein the default users are 5 thousands of families, and the conservative users are 15 thousands of families; subsequently, in a relational data summary table stored in a standard data warehouse, the relevant data of 5 ten thousand default users and 15 ten thousand conservative users are respectively marked.
And (V) randomly dividing 5 million defaulting users and 15 million family users into training samples and prediction samples according to a ratio of 7:3, wherein 3.2 million defaulting users and 10.8 million family users are divided into training samples, the rest 1.8 million defaulting users and 4.2 million family users are divided into prediction samples, then, according to a random division result, finding corresponding data in a relational data total table, dividing the relational data total table into a relational data table I and a relational data table II, storing the relational data table I into the training samples, and storing the relational data table II into the prediction samples. It should be noted here that all the relevant data of a certain enterprise in the relational data total table are divided into training samples or prediction samples.
Carrying out exploratory data analysis on data contained in a relational data table of the training sample and three-layer indexes of an enterprise credit index system to remove 11 unreasonable indexes;
data contained in a relational data table I of a training sample and three-layer indexes of an enterprise credit index system are subjected to data cleaning, and 546 indexes are left after invalid value processing, numerical value quantification and missing value processing; after the screening of the same value rate, remaining 40 indexes; adding 11 unreasonable indexes removed in the exploratory data analysis process, and then performing VIF collinearity analysis to remove relevant characteristics, and then remaining 17 mold-entering indexes; after screening and filtering and abnormal value detection of training samples, 10 indexes in 17 in-mold indexes have missing values, 7 indexes have no missing values, the characteristics of 7 non-missing indexes are used as characteristic variables, 10 missing indexes are respectively selected as target functions, aiming at the remaining training samples after screening and filtering, the characteristic variables and the non-missing values of the target functions are used as training samples, a RandomForest model is trained, the trained RandomForest model can predict the missing values of the missing characteristics, and filling of the missing indexes in all the training samples is completed; and filling missing values of 17 modulus-entering indexes to serve as a preliminary modulus-entering index.
Seventhly, determining that the number of input layer nodes of the neural network is 17 on the basis of the number of the preliminary mode entering indexes; determining the number of output layer nodes and the number of hidden layer layers of the neural network to be 2 respectively based on the fact that the training samples comprise positive samples and negative samples, and determining the number of nodes of the hidden layers to be 17 x 2 x 100 x 5 based on the learning speed, the number of initial mode-entering indexes and the number of hidden layer layers of the neural network;
the most common neural network activation functions include Sigmoid, Tanh, Softplus, Relu (Rectifier Liner Units), and the like, in this embodiment, it is determined that hidden layer output of the neural network is activated by using the Relu function, an output layer of the neural network is processed by using a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model with deep learning and logistic regression fused is constructed;
determining a weight search strategy of the neural network by determining four aspects of a loss function, an optimizer, a learning rate and an iteration number, wherein,
a) determining a categorical cross-entropy classification function as a loss function for the enterprise credit assessment model,
b) determine an optimizer of an enterprise credit assessment model to be tf.keras.optimizers.adam, to find an optimal value of the weight according to a change of the loss function,
c) it is determined that the learning rate is 0.001,
d) and determining the number of iterations of the enterprise credit evaluation model to be 10000. The iteration times determine whether the learning process is finished or not in the training process of the neural network model.
(VIII) training the enterprise credit evaluation model by using the training samples and the prediction samples, and judging whether the enterprise credit evaluation model is converged and is over-fitted; based on the initial model-entering index filled with the missing value, sequentially replacing each column of index vectors of the initial model-entering index in the training sample vector by randomly generating a column of disturbance variables, inputting the generated new training sample vector into a determined neural network to obtain a predicted value, calculating a loss function according to the input vector and the output predicted value, circularly generating 100 times of disturbance variables for each model-entering index, calculating the average value of the loss functions obtained under 100 times of newly generated training sample vectors, and evaluating the importance of each initial model-entering index; screening preliminary model entry indexes by setting different threshold values, training an enterprise credit evaluation model by the preliminary model entry indexes of the prediction samples, comparing the manual labeling result of the prediction samples with the prediction result of the enterprise credit evaluation model, determining the final model entry indexes, and generating an optimal enterprise credit evaluation model.
When the credit of a certain enterprise is evaluated, government data, internet data and third-party data of the enterprise need to be converged, aligned and fused into a relational data sub-table, then the data of the relational data sub-table is input into an optimal enterprise credit evaluation model, and the optimal enterprise credit evaluation model predicts the default probability of the enterprise and passes through the default probability
A. Based on a WOE conversion method, the calculation of the feature score is carried out through the WOE value and the coefficient of the feature obtained by the prediction of the optimal enterprise credit evaluation model,
alternatively, the first and second electrodes may be,
B. based on the enterprise default probability predicted by the optimal enterprise credit evaluation model, the standard score is converted according to the default probability,
and obtaining standard credit scores, and then carrying out normal distribution test on the overall enterprise credit score distribution to determine final enterprise credit scores.
In conclusion, the enterprise credit assessment method based on the combination of deep learning and logistic regression can overcome the defect that a single data source covers a credit assessment dimension in one aspect, improve the credit scoring accuracy of enterprises and provide important assessment for financial credits of the enterprises.
Based on the above embodiments of the present invention, those skilled in the art should make any improvements and modifications to the present invention without departing from the principle of the present invention, and therefore, the present invention should fall into the protection scope of the present invention.

Claims (10)

1. An enterprise credit assessment method based on the fusion of deep learning and logistic regression is characterized by comprising the following steps:
s1, acquiring government data, internet data and third-party data of multiple enterprises, and storing the government data, the internet data and the third-party data of the same enterprise in the same relational data sub-table;
s2, converging, aligning and fusing the relational data sub-tables of the plurality of enterprises into at least one relational data general table, and storing the relational data general table in a standard data warehouse;
s3, screening data contained in all relational data general tables in the standard data warehouse respectively, and constructing an enterprise credit index system with three levels of indexes;
s4, manually marking the enterprise as a default user or a conservative user based on the data contained in the relational data sub-table of the enterprise, and then respectively marking the related data of the default user and the conservative user in a relational data general table stored in a standard data warehouse;
s5, dividing default users and conservative users into training samples and prediction samples at random, wherein the number of users contained in the training samples is more than that of users contained in the prediction samples, dividing the relational data total table into two relational data tables according to the result of random division, and correspondingly storing the two divided relational data tables into the training samples and the prediction samples;
s6, performing exploratory data analysis and data cleaning on data contained in the relational data table of the training sample and three-layer indexes of an enterprise credit index system, and determining a primary modeling index;
s7, constructing an enterprise credit evaluation model integrating deep learning and logistic regression based on the neural network;
s8, training the enterprise credit evaluation model constructed in the step S7 based on the training samples and the preliminary model entry indexes determined in the step S6, and outputting final model entry indexes and an optimal enterprise credit evaluation model;
and S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model obtained by training in the step S8, converting the default probability into standard credit score, carrying out normal distribution test on the whole enterprise credit score distribution, and determining the final enterprise credit score.
2. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 1, wherein in step S1,
the government data of the enterprise comprises various information of industry and commerce, public accumulation fund, social security, issuing and modifying commission, bank protection supervision and administrative penalty;
the internet data of the enterprise comprises E-commerce data, marketing information, affirmation information, online store information, lawsuits, information of information;
the third-party data of the enterprise comprises enterprise business information, personnel information and various items of information of the human-enterprise relationship data.
3. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 2, wherein in step S2, the relational data sub-tables of multiple enterprises are converged, aligned and fused to obtain at least one relational data total table, which is specifically operated as follows:
s2.1, data aggregation stage: enterprise data is collected, wherein the enterprise data comprises government data, internet data and third-party data of an enterprise, the government data of the enterprise is in butt joint in an interface form and covers accumulation fund, social security, industrial and commercial, tax, food and drug administration and bank protection administration, the internet data of the enterprise covers enterprise background, E-commerce data, referee documents, bid and judicial data, and the third-party data of the enterprise is in butt joint in an interface form and covers enterprise industrial and commercial information, personnel information and human-enterprise relationship data;
s2.2, data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third-party data which are put in a warehouse of an enterprise, and carrying out treatment processing on the three-party data through an ETL data treatment tool;
s2.3, data fusion stage: the method comprises the steps of performing horizontal and vertical data fusion on government data, internet data and third-party data of multiple enterprises, fusing and converging the government data, the internet data and the third-party data into at least one relational data general table, uniformly storing the at least one relational data general table into a standard data warehouse, and storing three items of information of standard library data, an index library and a feature library which are obtained by processing after the three-party data are fused in the standard data warehouse.
4. The method for enterprise credit assessment based on deep learning and logistic regression fusion as claimed in claim 1, wherein step S3 is to construct an enterprise credit index system with three-level indexes, and the specific operations thereof include:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing each form field of the relational data total form in the standard data warehouse to determine the original index,
s3.2, deriving the original index to form three-level index content,
s3.3, abstracting and summarizing the three-level indexes to form second-level index content,
s3.4, analyzing the evaluation dimension of the enterprise credit embodied by the indexes by combining the contents of the third-level indexes and the second-level indexes, determining the contents of the first-level indexes,
and S3.5, constructing an enterprise credit index system covering three layers of indexes based on the contents of the three-level indexes, the contents of the two-level indexes and the contents of the first-level indexes.
5. The method according to claim 4, wherein the contents of the three-level index, the two-level index and the one-level index are sequentially decreased, wherein,
the content of the third-level indexes comprises specific enterprise credit indexes extracted through a relational data total table;
the content of the second-level index is an enterprise credit index which is integrated with business knowledge classification and arrangement on the basis of the third-level index;
the content of the primary index is an index finally determined by evaluating the credit risk of the enterprise, and the primary index comprises 7 indexes of repayment, industry, operation, performance, region, cash flow and operation, and is applied to radar map display of an enterprise portrait to evaluate the credit risk condition of the enterprise on each subdivision dimension.
6. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 1, wherein in step S6, exploratory data analysis is performed on the data contained in the relational data table of the training sample and the three-tier indexes of the enterprise credit index system, and the method specifically comprises:
s6.1.1, describing and counting the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system;
s6.1.2, analyzing the description statistics of step S6.1.1, referring the index containing time information as a specific index, and segmenting the description data of the specific index to further deeply analyze the dynamic change situation of the data and the value taking situation under a certain specific condition;
s6.1.3, drawing a histogram curve of the univariates and a relation curve of the univariates and the target variable so as to perform visual analysis on the three-layer indexes.
7. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 6, wherein in step S6, the data included in the relational data table of the training sample and the three-tier index of the enterprise credit index system are subjected to data cleansing, which is specifically performed by:
s6.2.1, based on the data contained in the relational data table of the training sample, carrying out invalid value processing on the three-layer indexes of the enterprise credit index system,
s6.2.2, based on the data contained in the relational data table of the training sample, the three layers of indexes which can be quantified in the enterprise credit index system are quantified numerically,
s6.2.3, performing missing value statistics on the three-layer indexes of the enterprise credit index system based on the data contained in the relational data table of the training sample, removing the three-layer indexes with missing values larger than 60%,
s6.2.4, based on the data contained in the relation data table of the training sample, making statistics of the same-value rate of the remaining three-layer indexes in the enterprise credit index system after step S6.2.3, removing the characteristic that the attribute has only one value, removing the three-layer indexes with the attribute same-value rate more than 60%,
s6.2.5, removing unreasonable indexes determined in the process of exploratory data analysis for the three layers of indexes left after step S6.2.4, performing VIF collinearity analysis,
s6.2.6, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample based on the missing value statistics carried out in the step S6.2.3, and removing the data set with the data missing ratio larger than 50%;
s6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, carrying out abnormal value detection on the three-layer indexes left in the enterprise credit index system after the step S6.2.5 by adopting a quartile distance method of a box diagram, screening abnormal values of partial indexes according to the quartile standard, filling the screened abnormal values serving as missing values with a specific value "-999",
s6.2.8, using a random forest method to take the characteristics of the indexes without missing values in the relational data table of the training sample after step S6.2.7 as characteristic variables, selecting the indexes with missing values in the relational data table of the training sample as target functions, taking the characteristic variables and the non-missing values of the target functions as training samples, training a random forest model, and completing the filling of the missing indexes in all the training samples, wherein the trained random forest model can predict the missing values of the missing characteristics;
s6.2.9, carrying out Z-Score standardization treatment on the training samples filled with the missing values to form standardized training sample vectors containing the preliminary model-entering indexes for carrying out the training of the enterprise credit evaluation model.
8. The method for evaluating enterprise credit based on the fusion of deep learning and logistic regression as claimed in claim 7, wherein in step S7, an enterprise credit evaluation model based on the fusion of deep learning and logistic regression is constructed, and the process includes three stages of determining a neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;
s7.1, determining a neural network stage:
the neural network selects a multilayer fully-connected neural network, which comprises an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary mode-entering indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in a training sample, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;
s4.2, determining an activation function stage of the neural network:
the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit evaluation model integrating deep learning and logistic regression is constructed;
s4.3, determining the weight search strategy stage of the neural network:
based on the enterprise credit evaluation model constructed in the step S4.2, the stage of determining the weight search strategy of the neural network includes four aspects of determining a loss function, an optimizer, a learning rate and an iteration number, wherein,
a) determining a categorical cross-entropy classification function as a loss function for the enterprise credit assessment model,
b) determine an optimizer of an enterprise credit assessment model to be tf.keras.optimizers.adam, to find an optimal value of the weight according to a change of the loss function,
c) it is determined that the learning rate is 0.001,
d) and determining the number of iterations of the enterprise credit evaluation model to be 10000.
9. The method of claim 8, wherein in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final modeling index and the optimal enterprise credit assessment model are output, and the process specifically includes:
s8.1, training an enterprise credit evaluation model: adopting an open source package of tensierflow and keras to train the enterprise credit evaluation model constructed in the step S7, selecting python as a development training language of the enterprise credit evaluation model, selecting a training sample and a prediction sample to carry out 10000 times of iterative training of the enterprise credit evaluation model, drawing a learning curve of the enterprise credit evaluation model in the training process, observing a loss function, the accuracy rate of the training sample and the accuracy rate of the prediction sample in the training process of the enterprise credit evaluation model, and finally judging whether the enterprise credit evaluation model is converged and is over-fitted;
s8.2, carrying out importance evaluation on the initial mould entering index: (1) performing Z-Score standardization processing on training samples filled with missing values to form training sample vectors containing preliminary model-entering indexes after standardization, (2) randomly generating a row of disturbance variables to sequentially replace each row of index vectors of the preliminary model-entering indexes in the training sample vectors and generate new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating loss functions according to the input vectors and the output predicted values, (3) circularly generating 100 times of disturbance variables for each model-entering index, circularly executing the step (2), calculating the average value of the obtained loss functions under the 100 times of newly generated training sample vectors, and evaluating the importance of each preliminary model-entering index;
s8.3, performing iterative tuning on the enterprise credit evaluation model: and sorting the initial modeling indexes according to the obtained loss function average value, sequentially selecting different threshold values to screen the initial modeling indexes, inputting the initial modeling indexes of the prediction samples into the enterprise credit evaluation model for training, comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit evaluation model, determining the final modeling indexes, and generating the optimal enterprise credit evaluation model.
10. The method for enterprise credit assessment based on deep learning and logistic regression integration according to claim 1, wherein in step S9, the enterprise default probability predicted by the optimal enterprise credit assessment model is converted into the standard credit score by the following two methods:
s9a, calculating a feature score through a WOE value and a coefficient of a feature obtained by predicting an optimal enterprise credit evaluation model based on a WOE conversion method;
and S9b, predicting the obtained default probability of the enterprise based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
CN202010868081.2A 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression Active CN112017025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010868081.2A CN112017025B (en) 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010868081.2A CN112017025B (en) 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression

Publications (2)

Publication Number Publication Date
CN112017025A true CN112017025A (en) 2020-12-01
CN112017025B CN112017025B (en) 2024-05-14

Family

ID=

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364182A (en) * 2020-12-09 2021-02-12 交通银行股份有限公司 Graph feature-based enterprise risk conduction prediction method and device and storage medium
CN112767126A (en) * 2021-01-21 2021-05-07 诺亚阿客(上海)网络科技有限公司 Collateral grading method and device based on big data
CN112783884A (en) * 2021-01-29 2021-05-11 浪潮软件股份有限公司 Data optimization method based on normal distribution
CN112906772A (en) * 2021-02-04 2021-06-04 深圳前海微众银行股份有限公司 Sample processing method, device, equipment and computer readable storage medium
CN112990946A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Enterprise default prediction method, device, medium and electronic equipment
CN113011752A (en) * 2021-03-19 2021-06-22 天道金科股份有限公司 Enterprise credit evaluation index system based on big data
CN113239199A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Credit classification method based on multi-party data set
CN113283583A (en) * 2021-05-18 2021-08-20 广州致景信息科技有限公司 Method and device for predicting default rate of textile industry, storage medium and processor
CN113298221A (en) * 2021-04-26 2021-08-24 上海淇玥信息技术有限公司 User risk prediction method and device based on logistic regression and graph neural network
CN113449819A (en) * 2021-08-27 2021-09-28 中国测绘科学研究院 Credit evaluation model method based on capsule network and storage medium thereof
CN113610630A (en) * 2021-08-06 2021-11-05 东方口岸科技有限公司 Financial credit modeling method and system based on import and export trade data
CN113643125A (en) * 2021-08-30 2021-11-12 天元大数据信用管理有限公司 Credit line measuring and calculating method, equipment and medium
CN113822542A (en) * 2021-08-30 2021-12-21 天元大数据信用管理有限公司 Enterprise credit investigation platform construction method based on government affair big data
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device
CN114742238A (en) * 2022-06-14 2022-07-12 四川省郫县豆瓣股份有限公司 Method, device, equipment and medium for screening raw materials of thick broad-bean sauce
CN115456753A (en) * 2022-09-07 2022-12-09 安徽省优质采科技发展有限责任公司 Enterprise credit information analysis method and system for bidding platform
CN115471056A (en) * 2022-08-31 2022-12-13 鼎翰文化股份有限公司 Data transmission method and data transmission system
CN115545880A (en) * 2022-09-02 2022-12-30 睿智合创(北京)科技有限公司 Product evaluation method and system applied to credit field
CN116596095A (en) * 2023-07-17 2023-08-15 华能山东发电有限公司众泰电厂 Training method and device of carbon emission prediction model based on machine learning
CN116645014A (en) * 2023-07-27 2023-08-25 湖南华菱电子商务有限公司 Provider supply data model construction method based on artificial intelligence
CN116757837A (en) * 2023-08-22 2023-09-15 国泰新点软件股份有限公司 Credit wind control method and system applied to winning bid
CN117149293A (en) * 2023-10-30 2023-12-01 北京谷器数据科技有限公司 Personalized configuration method for operating system
CN117151867A (en) * 2023-09-20 2023-12-01 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN115330531B (en) * 2022-09-05 2023-12-22 南方电网数字电网研究院有限公司 Enterprise risk prediction method based on electricity consumption fluctuation period
CN112990946B (en) * 2021-03-31 2024-05-14 建信金融科技有限责任公司 Enterprise default prediction method, device, medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880934A (en) * 2012-09-07 2013-01-16 中国标准化研究院 Integrity evaluation method for food enterprise
CN105913195A (en) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 All-industry data based enterprise's financial risk scoring method
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN110163467A (en) * 2019-04-02 2019-08-23 苏州纤联电子商务有限公司 A kind of risk quantification modeling method based on textile industry medium-sized and small enterprises credit
CN110580268A (en) * 2019-08-05 2019-12-17 西北大学 Credit scoring integrated classification system and method based on deep learning
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880934A (en) * 2012-09-07 2013-01-16 中国标准化研究院 Integrity evaluation method for food enterprise
CN105913195A (en) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 All-industry data based enterprise's financial risk scoring method
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN110163467A (en) * 2019-04-02 2019-08-23 苏州纤联电子商务有限公司 A kind of risk quantification modeling method based on textile industry medium-sized and small enterprises credit
CN110580268A (en) * 2019-08-05 2019-12-17 西北大学 Credit scoring integrated classification system and method based on deep learning
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CUICUI LUO等: "A deep learning approach for credit scoring using credit default swaps", ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, vol. 65, 15 June 2016 (2016-06-15), pages 465 - 470 *
张德栋, 张强: "基于神经网络的企业信用评估模型", 北京理工大学学报, vol. 24, no. 11, 30 November 2004 (2004-11-30), pages 982 - 985 *
王金珠: "基于证据权重逻辑回归模型的P2P公司信用风险评估", 中国优秀硕士学位论文全文数据库 经济与管理科学辑, no. 03, 15 March 2017 (2017-03-15), pages 157 - 534 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364182A (en) * 2020-12-09 2021-02-12 交通银行股份有限公司 Graph feature-based enterprise risk conduction prediction method and device and storage medium
CN112767126A (en) * 2021-01-21 2021-05-07 诺亚阿客(上海)网络科技有限公司 Collateral grading method and device based on big data
CN112783884A (en) * 2021-01-29 2021-05-11 浪潮软件股份有限公司 Data optimization method based on normal distribution
CN112906772A (en) * 2021-02-04 2021-06-04 深圳前海微众银行股份有限公司 Sample processing method, device, equipment and computer readable storage medium
CN113011752A (en) * 2021-03-19 2021-06-22 天道金科股份有限公司 Enterprise credit evaluation index system based on big data
CN112990946A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Enterprise default prediction method, device, medium and electronic equipment
CN112990946B (en) * 2021-03-31 2024-05-14 建信金融科技有限责任公司 Enterprise default prediction method, device, medium and electronic equipment
CN113298221B (en) * 2021-04-26 2023-08-22 上海淇玥信息技术有限公司 User Risk Prediction Method and Device Based on Logistic Regression and Graph Neural Network
CN113298221A (en) * 2021-04-26 2021-08-24 上海淇玥信息技术有限公司 User risk prediction method and device based on logistic regression and graph neural network
CN113239199B (en) * 2021-05-18 2022-09-23 重庆邮电大学 Credit classification method based on multi-party data set
CN113239199A (en) * 2021-05-18 2021-08-10 重庆邮电大学 Credit classification method based on multi-party data set
CN113283583A (en) * 2021-05-18 2021-08-20 广州致景信息科技有限公司 Method and device for predicting default rate of textile industry, storage medium and processor
CN113610630A (en) * 2021-08-06 2021-11-05 东方口岸科技有限公司 Financial credit modeling method and system based on import and export trade data
CN113449819A (en) * 2021-08-27 2021-09-28 中国测绘科学研究院 Credit evaluation model method based on capsule network and storage medium thereof
CN113643125A (en) * 2021-08-30 2021-11-12 天元大数据信用管理有限公司 Credit line measuring and calculating method, equipment and medium
CN113822542A (en) * 2021-08-30 2021-12-21 天元大数据信用管理有限公司 Enterprise credit investigation platform construction method based on government affair big data
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device
CN114462516B (en) * 2022-01-21 2024-04-16 天元大数据信用管理有限公司 Enterprise credit scoring sample labeling method and device
CN114742238B (en) * 2022-06-14 2022-09-09 四川省郫县豆瓣股份有限公司 Method, device, equipment and medium for screening raw materials of thick broad-bean sauce
CN114742238A (en) * 2022-06-14 2022-07-12 四川省郫县豆瓣股份有限公司 Method, device, equipment and medium for screening raw materials of thick broad-bean sauce
CN115471056A (en) * 2022-08-31 2022-12-13 鼎翰文化股份有限公司 Data transmission method and data transmission system
CN115545880A (en) * 2022-09-02 2022-12-30 睿智合创(北京)科技有限公司 Product evaluation method and system applied to credit field
CN115330531B (en) * 2022-09-05 2023-12-22 南方电网数字电网研究院有限公司 Enterprise risk prediction method based on electricity consumption fluctuation period
CN115456753A (en) * 2022-09-07 2022-12-09 安徽省优质采科技发展有限责任公司 Enterprise credit information analysis method and system for bidding platform
CN116596095A (en) * 2023-07-17 2023-08-15 华能山东发电有限公司众泰电厂 Training method and device of carbon emission prediction model based on machine learning
CN116596095B (en) * 2023-07-17 2023-11-07 华能山东泰丰新能源有限公司 Training method and device of carbon emission prediction model based on machine learning
CN116645014A (en) * 2023-07-27 2023-08-25 湖南华菱电子商务有限公司 Provider supply data model construction method based on artificial intelligence
CN116757837A (en) * 2023-08-22 2023-09-15 国泰新点软件股份有限公司 Credit wind control method and system applied to winning bid
CN117151867A (en) * 2023-09-20 2023-12-01 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117151867B (en) * 2023-09-20 2024-04-30 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117149293A (en) * 2023-10-30 2023-12-01 北京谷器数据科技有限公司 Personalized configuration method for operating system
CN117149293B (en) * 2023-10-30 2024-01-23 北京谷器数据科技有限公司 Personalized configuration method for operating system

Similar Documents

Publication Publication Date Title
CN108898479B (en) Credit evaluation model construction method and device
CN109446416B (en) Law recommendation method based on word vector model
CN111882446A (en) Abnormal account detection method based on graph convolution network
Foroghi et al. Applying decision tree to predict bankruptcy
CN112613977A (en) Personal credit loan admission credit granting method and system based on government affair data
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
CN112767136A (en) Credit anti-fraud identification method, credit anti-fraud identification device, credit anti-fraud identification equipment and credit anti-fraud identification medium based on big data
CN111401600A (en) Enterprise credit risk evaluation method and system based on incidence relation
CN115221387A (en) Enterprise information integration method based on deep neural network
Zhu et al. Loan default prediction based on convolutional neural network and LightGBM
Rokaha et al. Enhancement of supermarket business and market plan by using hierarchical clustering and association mining technique
Kun et al. Default identification of p2p lending based on stacking ensemble learning
CN111951050A (en) Financial product recommendation method and device
CN112017025B (en) Enterprise credit assessment method based on fusion of deep learning and logistic regression
CN112017025A (en) Enterprise credit assessment method based on fusion of deep learning and logistic regression
CN115330526A (en) Enterprise credit scoring method and device
CN106022915A (en) Enterprise credit risk assessment method and apparatus
CN115796635A (en) Bank digital transformation maturity evaluation system based on big data and machine learning
CN114820074A (en) Target user group prediction model construction method based on machine learning
Kezelj et al. A Systematic Literature Review on Corporate Insolvency Prevention Using Artificial Intelligence Algorithms
Pang et al. WT combined early warning model and applications for loaning platform customers default prediction in smart city
Kulothungan Loan Forecast by Using Machine Learning
Dattachaudhuri et al. Transparent neural based expert system for credit risk (TNESCR): an automated credit risk evaluation system
CN113837859B (en) Image construction method for small and micro enterprises
Zhou et al. Bank Customer Classification Algorithm Based on Improved Decision Tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant