CN112017025B - Enterprise credit assessment method based on fusion of deep learning and logistic regression - Google Patents

Enterprise credit assessment method based on fusion of deep learning and logistic regression Download PDF

Info

Publication number
CN112017025B
CN112017025B CN202010868081.2A CN202010868081A CN112017025B CN 112017025 B CN112017025 B CN 112017025B CN 202010868081 A CN202010868081 A CN 202010868081A CN 112017025 B CN112017025 B CN 112017025B
Authority
CN
China
Prior art keywords
data
enterprise
index
enterprise credit
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010868081.2A
Other languages
Chinese (zh)
Other versions
CN112017025A (en
Inventor
尹盼盼
边松华
崔乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyuan Big Data Credit Management Co Ltd
Original Assignee
Tianyuan Big Data Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyuan Big Data Credit Management Co Ltd filed Critical Tianyuan Big Data Credit Management Co Ltd
Priority to CN202010868081.2A priority Critical patent/CN112017025B/en
Publication of CN112017025A publication Critical patent/CN112017025A/en
Application granted granted Critical
Publication of CN112017025B publication Critical patent/CN112017025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Technology Law (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an enterprise credit assessment method based on fusion of deep learning and logistic regression, which relates to the technical field of financial credit and comprises the following steps: storing government, internet and third party data of enterprises in relational data sub-forms, and integrating a plurality of sub-forms into a total form to be stored in a standard data warehouse; screening warehouse data and constructing an enterprise credit index system; manually marking whether the enterprise violates, and randomly dividing corresponding data into a training sample and a prediction sample; based on the training sample and the index system, exploratory data analysis and data cleaning are carried out, and a preliminary modeling index is determined; establishing an enterprise credit assessment model integrating deep learning and logistic regression, training the model based on training samples and preliminary modeling indexes, and outputting final modeling indexes and an optimal model; the optimal model predicts the default probability of the enterprise and converts the default probability into credit scores. The invention can improve the credit scoring accuracy of enterprises and provide important evaluation for the financial credit of the enterprises.

Description

Enterprise credit assessment method based on fusion of deep learning and logistic regression
Technical Field
The invention relates to the technical field of financial credit, in particular to an enterprise credit assessment method based on fusion of deep learning and logistic regression.
Background
The credit score of an enterprise is one of important links for managing and controlling credit risks of the enterprise, overdue probability index references are provided according to existing data, the credit score is a means for measuring risk probabilities in a score mode, and generally the higher the score is, the safer the higher the score is. The enterprise credit score modeling usually adopts a machine learning modeling method of logistic regression, decision trees and combined models. With the popularization of the artificial intelligence technology in the field of financial wind control, a credit scoring model mainly based on the deep learning technology is also widely applied. The credit finance industry has the advantages that due to the characteristic of scattered small amount, users sink more, intelligent management is required to be continuously performed in each link of loan, approval, customer service and post-loan management, the risk of the users is reduced, and the deep learning technology is used for deep mining of the high-dimensional characteristics of the users to analyze the potential risk of the users, so that the credit approval service is more efficient and quick.
Deep learning is derived from neural networks, which enable the recognition of specific patterns by simulating the ability of the human brain to learn and process knowledge. Compared with the traditional scoring method, the deep learning parallel distribution processing capability is strong, the distribution storage and learning capability is strong, the method can be used in the supervision field (classification and prediction) and the non-supervision field (feature derivation), and the method can learn the feature association and pattern feature which are complicated and hidden in a large number of data features. The enterprise credit score based on the deep learning is one of expansion application of the deep learning technology in the enterprise credit score, and lays a foundation for building various models in the enterprise wind control field by applying the deep learning technology based on a large amount of data and features in the later stage.
The application field of the deep neural network is concentrated in a plurality of fields such as image recognition, voice recognition, natural language processing and the like, sun Zhena, wang Leyuan and the like propose a novel efficient iris image quality evaluation method based on the deep neural network, a feature extraction model is used for extracting a feature map of an iris image in an image, a reconstruction model is used for estimating an iris effective area thermodynamic diagram from the feature map of the iris image, and finally a quality prediction model takes the iris effective area as an interested area, and the integral quality fraction of the iris image is calculated from the feature map. The deep neural network is widely applied to the field of credit finance, the difference between image recognition and financial risk assessment is that the data preprocessing stage is different, the deep mining and analysis of the features after feature vector extraction are completed are communicated, and the algorithm can be reused. Based on the method, the deep neural network learning technology is applied to enterprise risk assessment in the field of financial credit by the research and development personnel, deep mining learning is conducted on the model entering features through the deep learning technology, and credit risk conditions of enterprises are comprehensively assessed.
Disclosure of Invention
Aiming at the needs and the shortcomings of the prior art development, the invention provides an enterprise credit assessment method based on the fusion of deep learning and logistic regression.
The invention discloses an enterprise credit assessment method based on fusion of deep learning and logistic regression, which solves the technical problems by adopting the following technical scheme:
An enterprise credit assessment method based on fusion of deep learning and logistic regression comprises the following steps:
S1, acquiring government data, internet data and third party data of a plurality of enterprises, and storing the government data, the Internet data and the third party data of the same enterprise in the same relational data sub-table;
S2, gathering, aligning and fusing relational data sub-tables of a plurality of enterprises into at least one relational data total table, and storing the relational data total table in a standard data warehouse;
S3, respectively screening data contained in all the relational data total tables in the standard data warehouse, and constructing an enterprise credit index system with three layers of indexes;
s4, marking the enterprise as an offending user or follow treaty user manually based on the data contained in the relational data sub-table of the enterprise, and then marking the related data of the offending user and follow treaty user in the relational data total table stored in the standard data warehouse respectively;
S5, randomly dividing default users and follow treaty users into a training sample and a prediction sample, wherein the number of users contained in the training sample is greater than that of users contained in the prediction sample, then splitting a relational data total table into two relational data tables according to a random division result, and correspondingly storing the two relational data tables obtained by splitting into the training sample and the prediction sample;
s6, exploratory data analysis and data cleaning are carried out on data contained in the relational data form of the training sample and three-layer indexes of the enterprise credit index system, and a preliminary modeling index is determined;
s7, constructing an enterprise credit assessment model integrating deep learning and logistic regression based on a neural network;
s8, training the enterprise credit assessment model constructed in the step S7 based on the training sample and the preliminary modeling index determined in the step S6, and outputting a final modeling index and an optimal enterprise credit assessment model;
S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model trained in the step S8, converting the default probability into a standard credit score, carrying out normal distribution inspection on the distribution of the credit scores of the whole enterprise, and determining the final credit score of the enterprise.
Optionally, in step S1, government data of the enterprise includes information of industry and commerce, public accumulation, social security, issuing and modifying commission, banking and protecting supervision and administrative penalty;
The internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, executed trust loss and bidding various information;
the third party data of the enterprise comprises enterprise business information, personnel information and personnel relationship data.
Further optionally, in step S2, the aggregation, alignment and fusion are performed on the relational data sub-tables of the multiple enterprises to obtain at least one relational data total table, and the specific operations include:
S2.1, a data aggregation stage: collecting enterprise data, wherein the enterprise data comprises government data, internet data and third party data of an enterprise, the government data of the enterprise are in butt joint in an interface form, the enterprise data cover enterprise background, electronic business data, judge documents, bidding and judicial data, and the third party data of the enterprise are in butt joint in an interface form, and the enterprise business information, personnel information and personnel relationship data are covered;
s2.2, a data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third party data in enterprise warehouse entry, and carrying out treatment processing on the three party data through an ETL data treatment tool;
S2.3, data fusion stage: and (3) carrying out horizontal and vertical data fusion on government data, internet data and third party data of a plurality of enterprises, fusing and converging the government data, the Internet data and the third party data into at least one relational data total table, uniformly storing the at least one relational data total table into a standard data warehouse, and storing three information of a standard library data, a processed index library and a characteristic library after the three party data fusion in the standard data warehouse.
Optionally, step S3 constructs an enterprise credit index system with three layers of indexes, which specifically includes:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing all table fields of the relational data total table in the standard data warehouse to determine an original index,
S3.2, deriving the original index to form three-level index content,
S3.3, abstract and summarize the three-level index to form a second-level index content,
S3.4, analyzing the evaluation dimension of the enterprise credit embodied by the index by combining the contents of the third-level index and the second-level index to determine the content of the first-level index,
S3.5, constructing an enterprise credit index system covering three layers of indexes based on the three-level index content, the two-level index content and the first-level index content.
Preferably, the contents of the three-level index, the two-level index and the first-level index are sequentially reduced, wherein,
The content of the three-level index comprises specific enterprise credit indexes extracted through a relational data total table;
The content of the second-level index is enterprise credit index which is classified and arranged based on the third-level index and is integrated with business knowledge;
the content of the first-level index is an index for evaluating final determination of credit risk of an enterprise, and the first-level index comprises 7 indexes including repayment, industry, operation, performance, area, cash flow and operation, and is applied to radar chart display of an enterprise portrait for evaluating credit risk conditions of the enterprise in each subdivision dimension.
Optionally, in step S6, exploratory data analysis is performed on the data included in the relational data table of the training sample and the three-layer index of the enterprise credit index system, which specifically includes:
S6.1.1, carrying out description statistics on three layers of indexes of the data and enterprise credit index system contained in the relational data table of the training sample;
s6.1.2, analyzing the descriptive statistics of the step S6.1.1, namely, dividing the descriptive data of the specific index by using the index containing the time information as the specific index, so as to further deeply analyze the dynamic change condition of the data and the value condition under a specific condition;
S6.1.3 drawing a histogram curve of the univariate and a relation curve of the univariate and the target variable so as to perform visual analysis on the three-layer index.
Further optionally, in step S6, data cleaning is performed on the data included in the relational data table of the training sample and the three-layer index of the enterprise credit index system, which specifically includes:
s6.2.1, performing invalid value processing on three layers of indexes of the enterprise credit index system based on data contained in the relational data table of the training sample;
s6.2.2, carrying out numerical quantization on three quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample;
S6.2.3, carrying out missing value statistics on three layers of indexes of an enterprise credit index system based on data contained in a relational data table of a training sample, and removing three layers of indexes with missing values more than 60%;
S6.2.4, based on the data contained in the relational data table of the training sample, counting the same value rate of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.3, removing the characteristic that the attribute has only one value, and removing the three layers of indexes with the same value rate of the attribute being more than 60%;
S6.2.5, removing unreasonable indexes determined in the exploratory data analysis process for three layers of indexes remained after the step S6.2.4, and then performing VIF collinearity analysis;
s6.2.6, based on the missing value statistics performed in step S6.2.3, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample, and removing the dataset with the data missing ratio greater than 50%;
S6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, detecting abnormal values of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.5 by adopting a four-bit distance method of a box graph, screening abnormal values of part of indexes according to an upper quartile standard, filling the screened abnormal values with a specific value '-999' as a missing value,
S6.2.8, adopting a RandomForest random forest method to take the characteristic of the index without the missing value in the relational data table of the training sample after the step S6.2.7 as a characteristic variable, respectively selecting the index with the missing value in the relational data table of the training sample as an objective function, training RandomForest models by taking the characteristic variable and the non-missing value of the objective function as training samples, and enabling the trained RandomForest models to predict the missing value of the missing characteristic so as to finish filling of the missing index in all the training samples;
s6.2.9, performing Z-Score standardization processing on the training sample filled with the missing values to form a standardized training sample vector containing primary modeling indexes, which is used for training an enterprise credit assessment model.
Further optionally, in step S7, an enterprise credit assessment model fused by deep learning and logistic regression is constructed based on the neural network, and the process includes three stages of determining the neural network, determining an activation function of the neural network, and determining a weight search strategy of the neural network;
S7.1, determining a neural network stage:
The neural network selects a multi-layer fully-connected neural network, and comprises three parts, namely an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary modulus indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in training samples, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;
s4.2, determining an activation function stage of the neural network:
the hidden layer output of the neural network is activated by Relu functions, the output layer of the neural network is treated by softmax activation functions, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit assessment model fused with deep learning and logistic regression is constructed;
S4.3, determining a weight search strategy stage of the neural network:
Based on the enterprise credit assessment model constructed in the step S4.2, determining a weight search strategy stage of the neural network comprises determining four aspects of a loss function, an optimizer, a learning rate and the iteration number, wherein,
A) Determine categorical crossentropy a classification cross-entropy function as a loss function of the enterprise credit assessment model,
B) The optimizer of the enterprise credit assessment model is determined to be tf.keras.optimizers.adam, to find the optimal value of the weight based on the change in the loss function,
C) The learning rate was determined to be 0.001,
D) And determining the iteration number of the enterprise credit assessment model to be 10000 times.
Further optionally, in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model entry index and the optimal enterprise credit assessment model are output, which specifically includes:
S8.1, training an enterprise credit assessment model: adopting tensorflow and keras open source package training step S7 to construct an enterprise credit assessment model, selecting python as a development training language of the enterprise credit assessment model, selecting a training sample and a prediction sample to conduct 10000 times of iterative training of the enterprise credit assessment model, drawing a learning curve of the enterprise credit assessment model in the training process, observing a loss function, a training sample accuracy and a prediction sample accuracy in the training process of the enterprise credit assessment model, and finally judging whether the enterprise credit assessment model converges or not and whether fitting is performed;
S8.2, performing primary modeling index importance assessment: the method comprises the steps of (1) carrying out Z-Score standardization processing on a training sample filled with missing values to form a standardized training sample vector containing preliminary modulus indexes, (2) randomly generating a list of disturbance variables to replace all the column index vectors of the preliminary modulus indexes in the training sample vector in sequence, generating new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, (3) circularly generating 100 disturbance variables for each modulus index, circularly executing the step (2), calculating the average value of the obtained loss function under the training sample vector newly generated 100 times, and evaluating the importance of each preliminary modulus index;
S8.3, iterative tuning is carried out on the enterprise credit assessment model: and sequencing the preliminary modeling indexes according to the average value of the obtained loss function, sequentially selecting different thresholds to screen the preliminary modeling indexes, inputting the preliminary modeling indexes of the predicted samples into an enterprise credit assessment model for training, comparing the manual labeling result of the predicted samples with the predicted result of the enterprise credit assessment model, determining the final modeling indexes, and generating an optimal enterprise credit assessment model.
Optionally, in step S9, the enterprise breach probability predicted by the optimal enterprise credit assessment model is converted into a standard credit score, which includes the following two methods:
s9a, calculating a feature score by predicting coefficients of the obtained features through a WOE value and an optimal enterprise credit assessment model based on a WOE conversion method;
S9b, predicting the obtained enterprise default probability based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
The enterprise credit assessment method based on the fusion of deep learning and logistic regression has the beneficial effects compared with the prior art that:
1) The method is based on enterprise multi-source data fusion, performs operations such as data merging, data alignment, data fusion and the like on multi-source data, establishes an enterprise multi-dimensional credit assessment system on the basis of multi-source data fusion, has richer enterprise credit assessment dimension and more comprehensive assessment index, and overcomes the defect that the credit assessment dimension is one-sided when a single data source covers;
2) The neural network based on deep learning can deeply excavate important features in multi-dimensional features, can automatically extract and derive the features to obtain more important information, overcomes the defects of complex and non-automatic traditional high-dimensional feature importance assessment and feature extraction methods, is also suitable for parallel realization of large-scale training samples, has higher operation efficiency, and expands realization paths and realization scenes for enterprise credit assessment based on high-dimensional features and massive training samples;
3) The enterprise credit assessment method can score the credit of the enterprise, assist the enterprise to complete the financial credit, is particularly suitable for wind control prediction of large-data mass enterprises, and has wide application prospect.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.
Embodiment one:
referring to fig. 1, this embodiment proposes an enterprise credit assessment method based on fusion of deep learning and logistic regression, which includes the following steps:
S1, acquiring government data, internet data and third party data of a plurality of enterprises, and storing the government data, the internet data and the third party data of the same enterprise in the same relational data sub-table.
In the step, government data of enterprises comprise various information of industry and commerce, public accumulation, social security, issuing and modifying commission, banking and protecting supervision and administrative punishment; the internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, executed trust loss and bidding various information; the third party data of the enterprise comprises enterprise business information, personnel information and personnel relationship data.
S2, gathering, aligning and fusing relational data sub-tables of a plurality of enterprises into at least one relational data total table, and storing the relational data total table in a standard data warehouse, wherein the specific operation comprises the following steps:
S2.1, a data aggregation stage: collecting enterprise data, wherein the enterprise data comprises government data, internet data and third party data of an enterprise, the government data of the enterprise are in butt joint in an interface form, the enterprise data cover enterprise background, electronic business data, judge documents, bidding and judicial data, and the third party data of the enterprise are in butt joint in an interface form, and the enterprise business information, personnel information and personnel relationship data are covered;
s2.2, a data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third party data in enterprise warehouse entry, and carrying out treatment processing on the three party data through an ETL data treatment tool;
S2.3, data fusion stage: and (3) carrying out horizontal and vertical data fusion on government data, internet data and third party data of a plurality of enterprises, fusing and converging the government data, the Internet data and the third party data into at least one relational data total table, uniformly storing the at least one relational data total table into a standard data warehouse, and storing three information of a standard library data, a processed index library and a characteristic library after the three party data fusion in the standard data warehouse.
S3, respectively screening data contained in all the relational data total tables in the standard data warehouse, and constructing an enterprise credit index system with three layers of indexes, wherein the specific operation comprises the following steps:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing all table fields of a relational data total table in a standard data warehouse to determine an original index;
s3.2, deriving the original index to form three-level index content;
s3.3, abstract summarization is carried out on the three-level index to form two-level index content;
s3.4, analyzing the evaluation dimension of the enterprise credit reflected by the index by combining the contents of the third-level index and the second-level index, and determining the content of the first-level index;
s3.5, constructing an enterprise credit index system covering three layers of indexes based on the three-level index content, the two-level index content and the first-level index content.
From steps S3.1-S3.5, it can be known that the contents of the tertiary index, the secondary index, and the primary index are sequentially reduced, wherein,
The content of the three-level index comprises specific enterprise credit indexes extracted through a relational data total table;
The content of the second-level index is enterprise credit index which is classified and arranged based on the third-level index and is integrated with business knowledge;
the content of the first-level index is an index for evaluating final determination of credit risk of an enterprise, and the first-level index comprises 7 indexes including repayment, industry, operation, performance, area, cash flow and operation, and is applied to radar chart display of an enterprise portrait for evaluating credit risk conditions of the enterprise in each subdivision dimension.
S4, marking the enterprise as the default user or follow treaty user manually based on the data contained in the relational data sub-table of the enterprise, and then marking the related data of the default user and the follow treaty user in the relational data total table stored in the standard data warehouse respectively.
S5, randomly dividing the default users and follow treaty users into a training sample and a prediction sample, wherein the number of users contained in the training sample is greater than that of users contained in the prediction sample, then splitting the total relational data table into two relational data tables according to a random division result, and correspondingly storing the two relational data tables obtained by splitting into the training sample and the prediction sample.
S6, exploratory data analysis and data cleaning are carried out on the data contained in the relational data table of the training sample and the three-layer indexes of the enterprise credit index system, and a preliminary modeling index is determined.
In this step, the specific operations for conducting the exploratory data analysis are:
S6.1.1, carrying out description statistics on three layers of indexes of data contained in a relational data table of a training sample and an enterprise credit index system, such as describing variance, mean, median and data distribution of each index;
s6.1.2, analyzing the descriptive statistics of the step S6.1.1, namely, dividing the descriptive data of the specific index by using the index containing the time information as the specific index, so as to further deeply analyze the dynamic change condition of the data and the value condition under a specific condition;
S6.1.3 drawing a histogram curve of the univariate and a relation curve of the univariate and the target variable so as to perform visual analysis on the three-layer index.
In this step, the specific operation of data cleaning is:
s6.2.1, based on the data contained in the relational data table of the training sample, performing invalid value processing on the three layers of indexes of the enterprise credit index system,
S6.2.2, carrying out numerical quantization on three layers of quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample,
S6.2.3, carrying out missing value statistics on three layers of indexes of an enterprise credit index system based on data contained in a relational data table of a training sample, and removing three layers of indexes with missing values more than 60%;
S6.2.4, based on the data contained in the relational data table of the training sample, counting the same value rate of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.3, removing the characteristic that the attribute has only one value, and removing the three layers of indexes with the same value rate of the attribute being more than 60%;
S6.2.5, removing unreasonable indexes determined in the exploratory data analysis process for three layers of indexes remained after the step S6.2.4, and then performing VIF collinearity analysis;
s6.2.6, based on the missing value statistics performed in step S6.2.3, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample, and removing the dataset with the data missing ratio greater than 50%;
S6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, detecting abnormal values of three layers of indexes remaining in the enterprise credit index system after the step S6.2.5 by adopting a four-bit distance method of a box graph, screening abnormal values of part of indexes according to an upper quartile standard, and filling the screened abnormal values with a specific value '-999' as a missing value;
s6.2.8, adopting a RandomForest random forest method to take the characteristic of the index without the missing value in the relational data table of the training sample after the step S6.2.7 as a characteristic variable, respectively selecting the index with the missing value in the relational data table of the training sample as an objective function, training RandomForest models by taking the characteristic variable and the non-missing value of the objective function as training samples, and enabling the trained RandomForest models to predict the missing value of the missing characteristic so as to finish filling of the missing index in all the training samples;
s6.2.9, performing Z-Score standardization processing on the training sample filled with the missing values to form a standardized training sample vector containing primary modeling indexes, which is used for training an enterprise credit assessment model.
S7, constructing an enterprise credit assessment model fused by deep learning and logistic regression based on a neural network, wherein the process comprises three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network;
S7.1, determining a neural network stage:
The neural network selects a multi-layer fully-connected neural network, and comprises three parts, namely an input layer, a hidden layer and an output layer, wherein the number of input layer nodes of the neural network is the number of input preliminary modulus indexes, the number of output layer nodes of the neural network corresponds to the number of sample categories contained in training samples, the number of hidden layer nodes of the neural network is equal to the number of output layer nodes of the neural network, and the number of hidden layer nodes of the neural network is a multiple of the product of the number of input layer nodes and the number of output layer nodes;
s4.2, determining an activation function stage of the neural network:
the hidden layer output of the neural network is activated by Relu functions, the output layer of the neural network is treated by softmax activation functions, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit assessment model fused with deep learning and logistic regression is constructed;
S4.3, determining a weight search strategy stage of the neural network:
Based on the enterprise credit assessment model constructed in the step S4.2, determining a weight search strategy stage of the neural network comprises determining four aspects of a loss function, an optimizer, a learning rate and the iteration number, wherein,
A) Determine categorical crossentropy a classification cross-entropy function as a loss function of the enterprise credit assessment model,
B) The optimizer of the enterprise credit assessment model is determined to be tf.keras.optimizers.adam, to find the optimal value of the weight based on the change in the loss function,
C) The learning rate was determined to be 0.001,
D) And determining the iteration number of the enterprise credit assessment model to be 10000 times. The iteration number determines whether the learning process is finished or not in the neural network model training process.
S8, training the enterprise credit assessment model constructed in the step S7 based on the training sample and the preliminary modeling index determined in the step S6, and outputting a final modeling index and an optimal enterprise credit assessment model, wherein the process specifically comprises the following steps:
S8.1, training an enterprise credit assessment model: adopting tensorflow and keras open source package training step S7 to construct an enterprise credit assessment model, selecting python as a development training language of the enterprise credit assessment model, selecting a training sample and a prediction sample to conduct 10000 times of iterative training of the enterprise credit assessment model, drawing a learning curve of the enterprise credit assessment model in the training process, observing a loss function, a training sample accuracy and a prediction sample accuracy in the training process of the enterprise credit assessment model, and finally judging whether the enterprise credit assessment model converges or not and whether fitting is performed;
S8.2, performing primary modeling index importance assessment: the method comprises the steps of (1) carrying out Z-Score standardization processing on a training sample filled with missing values to form a standardized training sample vector containing preliminary modulus indexes, (2) randomly generating a list of disturbance variables to replace all the column index vectors of the preliminary modulus indexes in the training sample vector in sequence, generating new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, (3) circularly generating 100 disturbance variables for each modulus index, circularly executing the step (2), calculating the average value of the obtained loss function under the training sample vector newly generated 100 times, and evaluating the importance of each preliminary modulus index;
S8.3, iterative tuning is carried out on the enterprise credit assessment model: and sequencing the preliminary modeling indexes according to the average value of the obtained loss function, sequentially selecting different thresholds to screen the preliminary modeling indexes, inputting the preliminary modeling indexes of the predicted samples into an enterprise credit assessment model for training, comparing the manual labeling result of the predicted samples with the predicted result of the enterprise credit assessment model, determining the final modeling indexes, and generating an optimal enterprise credit assessment model.
S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model trained in the step S8, converting the default probability into a standard credit score, carrying out normal distribution inspection on the distribution of the credit scores of the whole enterprise, and determining the final credit score of the enterprise. Among these, there are two methods to translate the probability of breach into a standard credit score:
s9a, calculating a feature score by predicting coefficients of the obtained features through a WOE value and an optimal enterprise credit assessment model based on a WOE conversion method;
S9b, predicting the obtained enterprise default probability based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
The enterprise credit assessment method of the embodiment specifically comprises the following execution flow:
And (I) selecting government data, internet data and third party data of 20 ten thousand enterprises, and storing the government data, the Internet data and the third party data of the same enterprise in the same relational data sub-table to obtain 20 ten thousand relational data sub-tables in total.
And (II) collecting, aligning and fusing the data contained in the 20 ten thousand relational data sub-tables to obtain a relational data total table, and storing the relational data total table in a standard data warehouse.
Thirdly, manually screening data contained in the relational data total table, and constructing an enterprise credit index system with three layers of indexes, wherein: the three-level indexes include 1042, such as the number of customs enterprise grades obtained in the last year, amount paid in capital, duration, personnel scale, whether to put on a blacklist, the number of the repeated credit rating of the last year contract, the number of the executed persons in the last year, the amount of the executed mark in the last year, the number of important business changes in the last year, the number of the external guarantee of the last year, the number of common business changes in the last three years, the number of the bulletin of the court of the last 6 months, the change of the camping scope of the last 6 menses, the number of the historical accumulated branches, the number of the offset branch structures, the number of the camping branches, the accumulated associated transaction number and the like; the second-level index is 17 indexes obtained after summarizing the third-level index, such as risk, legal representatives, association relation, management layer, industry, legal, stability, collateral, management, region and the like; the first-level index is an index for evaluating final determination of credit risk of enterprises and comprises 7 indexes in total of debt repayment, industry, operation, performance, area, cash flow and operation.
Fourth, manually labeling the enterprise as an offending user or follow treaty users based on the data contained in the 20 ten thousand relational data tables, wherein the offending user is 5 ten thousand, and the follow treaty user is 15 ten thousand; then, in the relational data total table stored in the standard data warehouse, relevant data of 5 ten thousand offending users and 15 ten thousand follow treaty users are marked respectively.
And fifthly, randomly dividing 5 ten thousand default users and 15 ten thousand follow treaty users into training samples and prediction samples according to the ratio of 7:3, wherein 3.2 ten thousand default users and 10.8 ten thousand follow treaty users are divided into training samples, the rest 1.8 ten thousand default users and 4.2 ten thousand follow treaty users are divided into prediction samples, then finding corresponding data in a relational data total table according to the random division result, dividing the relational data total table into a relational data table I and a relational data table II, storing the relational data table I into the training samples, and storing the relational data table II into the prediction samples. It should be noted that, the relevant data of a certain enterprise in the relational data total table is all divided into training samples or prediction samples.
Sixth, exploratory data analysis is carried out on the data contained in the first relational data table of the training sample and three-layer indexes of the enterprise credit index system, and 11 unreasonable indexes are removed;
Data cleaning is carried out on the data contained in the first relational data form of the training sample and three layers of indexes of the enterprise credit index system, and 546 indexes are left after invalid value processing, numerical quantization and missing value processing; the residual 40 indexes are subjected to the same value rate screening; adding 11 unreasonable indexes removed in the exploratory data analysis process, and then performing VIF collinearity analysis to remove relevant characteristics, wherein 17 modeling indexes remain; after screening and filtering the training samples and detecting abnormal values, 17 in-model indexes, wherein 10 indexes have missing values, 7 indexes do not have missing values, the characteristics of 7 non-missing indexes are taken as characteristic variables, 10 missing indexes are respectively selected as objective functions, the rest training samples after screening and filtering are used as training samples, a RandomForest model is trained, the trained RandomForest model can predict the missing values of missing characteristics, and filling of the missing indexes in all the training samples is completed; the missing value filling of the 17 mold entering indexes is used as a preliminary mold entering index.
Seventhly, determining 17 input layer nodes of the neural network based on the number of preliminary modulus indexes; based on the training samples including positive type samples and negative type samples, determining that the number of output layer nodes and the number of hidden layer layers of the neural network are respectively 2, and based on the learning speed, the number of preliminary modulus indexes and the number of hidden layer layers of the neural network, determining that the number of nodes of the hidden layer is 17 x 2 x 100 x 5;
The most common neural network activation functions comprise Sigmoid, tanh, softplus, relu (rectifiers RECTIFIER LINER Units) and the like, the embodiment determines that the hidden layer output of the neural network is activated by adopting a Relu function, the output layer of the neural network is processed by adopting a softmax activation function, the output layer of the neural network is fused with a logistic regression method, and an enterprise credit assessment model fused with deep learning and logistic regression is constructed;
determining a weight search strategy of the neural network by determining four aspects of a loss function, an optimizer, a learning rate and an iteration number, wherein,
A) Determine categorical crossentropy a classification cross-entropy function as a loss function of the enterprise credit assessment model,
B) The optimizer of the enterprise credit assessment model is determined to be tf.keras.optimizers.adam, to find the optimal value of the weight based on the change in the loss function,
C) The learning rate was determined to be 0.001,
D) And determining the iteration number of the enterprise credit assessment model to be 10000 times. The iteration number determines whether the learning process is finished or not in the neural network model training process.
Training an enterprise credit assessment model by using a training sample and a prediction sample, and judging whether the enterprise credit assessment model converges or not and whether fitting is performed or not; based on the preliminary modulus indexes filled with the missing values, sequentially replacing each column of index vectors of the preliminary modulus indexes in the training sample vectors by randomly generating a column of disturbance variables, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, circularly generating 100 disturbance variables for each modulus index, calculating the average value of the loss function obtained under the 100 newly generated training sample vectors, and evaluating the importance of each preliminary modulus index; and screening the preliminary modeling indexes by setting unused thresholds, training the enterprise credit assessment model by using the preliminary modeling indexes of the prediction samples, determining the final modeling indexes by comparing the manual labeling results of the prediction samples with the prediction results of the enterprise credit assessment model, and generating the optimal enterprise credit assessment model.
(Nine) when evaluating the credit of an enterprise, the government data, internet data and third party data of the enterprise are required to be converged, aligned and fused into a relational data sub-form, then the data of the relational data sub-form is input into an optimal enterprise credit evaluation model, and the optimal enterprise credit evaluation model predicts the probability of default of the enterprise and passes through
A. Based on the WOE conversion method, calculating the feature score by predicting the coefficient of the obtained feature through the WOE value and the optimal enterprise credit evaluation model,
Or alternatively
B. predicting the obtained enterprise breach probability based on the optimal enterprise credit evaluation model, converting the standard score according to the breach probability,
And obtaining a standard credit score, and then carrying out normal distribution inspection on the whole enterprise credit score distribution to determine a final enterprise credit score.
In summary, by adopting the enterprise credit assessment method based on the combination of deep learning and logistic regression, the defect that the credit assessment dimension is relatively single-sided due to single data source coverage can be overcome, the credit scoring accuracy of the enterprise is improved, and important assessment is provided for financial credit of the enterprise.
Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims (9)

1. The enterprise credit assessment method based on the fusion of deep learning and logistic regression is characterized by comprising the following steps:
S1, acquiring government data, internet data and third party data of a plurality of enterprises, and storing the government data, the internet data and the third party data of the same enterprise in the same relational data sub-table;
S2, gathering, aligning and fusing relational data sub-tables of a plurality of enterprises into at least one relational data total table, and storing the relational data total table in a standard data warehouse;
S3, respectively screening data contained in all the relational data total tables in the standard data warehouse, and constructing an enterprise credit index system with three layers of indexes;
s4, marking the enterprise as an offending user or follow treaty user manually based on the data contained in the relational data sub-table of the enterprise, and then marking the related data of the offending user and follow treaty user in the relational data total table stored in the standard data warehouse respectively;
S5, randomly dividing default users and follow treaty users into a training sample and a prediction sample, wherein the number of users contained in the training sample is greater than that of users contained in the prediction sample, then splitting a relational data total table into two relational data tables according to a random division result, and correspondingly storing the two relational data tables obtained by splitting into the training sample and the prediction sample;
s6, exploratory data analysis and data cleaning are carried out on data contained in the relational data form of the training sample and three-layer indexes of the enterprise credit index system, and a preliminary modeling index is determined;
s7, constructing an enterprise credit assessment model integrating deep learning and logistic regression based on the neural network, comprising three stages of determining the neural network, determining an activation function of the neural network and determining a weight search strategy of the neural network,
S7.1, determining a neural network stage:
The neural network selects a multi-layer fully-connected neural network, which comprises three parts of an input layer, a hidden layer and an output layer, wherein the number of nodes of the input layer of the neural network is the number of input preliminary modulus indexes, the number of nodes of the output layer of the neural network corresponds to the number of sample categories contained in training samples, the number of hidden layer of the neural network is equal to the number of nodes of the output layer of the neural network, the number of nodes of the hidden layer of the neural network is a multiple of the product of the number of nodes of the input layer and the number of nodes of the output layer,
S7.2, determining an activation function stage of the neural network:
The hidden layer output of the neural network is activated by Relu functions, the output layer of the neural network is treated by softmax activation functions, the output layer of the neural network is fused with a logistic regression method, an enterprise credit assessment model fused with deep learning and logistic regression is constructed,
S7.3, determining a weight search strategy stage of the neural network:
Based on the enterprise credit assessment model constructed in step S7.2, determining a weight search strategy stage of the neural network includes determining four aspects of a loss function, an optimizer, a learning rate and a number of iterations, wherein,
A) Determine categorical crossentropy a classification cross-entropy function as a loss function of the enterprise credit assessment model,
B) The optimizer of the enterprise credit assessment model is determined to be tf.keras.optimizers.adam, to find the optimal value of the weight based on the change in the loss function,
C) The learning rate was determined to be 0.001,
D) Determining that the iteration number of the enterprise credit assessment model is 10000 times;
s8, training the enterprise credit assessment model constructed in the step S7 based on the training sample and the preliminary modeling index determined in the step S6, and outputting a final modeling index and an optimal enterprise credit assessment model;
S9, predicting the default probability of the enterprise based on the optimal enterprise credit evaluation model trained in the step S8, converting the default probability into a standard credit score, carrying out normal distribution inspection on the distribution of the credit scores of the whole enterprise, and determining the final credit score of the enterprise.
2. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 1, wherein in step S1,
Government data of enterprises comprise various information of industry and commerce, accumulation of money, social security, issuing and modifying commission, banking and administrative punishment;
the internet data of the enterprise comprises e-commerce data, marketing information, identification information, online store information, legal litigation, information that the belief loss is executed and bidding various information;
The third party data of the enterprise comprises enterprise business information, personnel information and personnel relationship data.
3. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 2, wherein in step S2, the relational data sub-tables of the multiple enterprises are collected, aligned and combined to obtain at least one relational data total table, which specifically comprises the following operations:
S2.1, a data aggregation stage: collecting enterprise data, wherein the enterprise data comprises government data, internet data and third party data of an enterprise, the government data of the enterprise are in butt joint in an interface form, and cover public accumulation, social security, industry and commerce, tax, food and drug monitoring and silver security monitoring, the Internet data of the enterprise cover enterprise background, electronic commerce data, judge documents, bidding and judicial data, and the third party data of the enterprise are in butt joint in an interface form, and cover enterprise business information, personnel information and relationship data of the personnel and the enterprise;
s2.2, a data alignment stage: establishing a unified data standard specification, carrying out standardized management on government data, internet data and third party data in enterprise warehouse entry, and carrying out treatment processing on the three party data through an ETL data treatment tool;
S2.3, data fusion stage: and carrying out horizontal and vertical data fusion on government data, internet data and third party data of a plurality of enterprises, fusing and converging the government data, the Internet data and the third party data into at least one relational data total table, uniformly storing the at least one relational data total table into a standard data warehouse, and storing three information of a standard library data, a processed index library and a characteristic library after the three party data fusion in the standard data warehouse.
4. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 1, wherein step S3 constructs an enterprise credit index system with three layers of indexes, and the specific operations thereof include:
s3.1, based on the business objective of enterprise credit evaluation, respectively combing all table fields of the relational data total table in the standard data warehouse to determine an original index,
S3.2, deriving the original index to form three-level index content,
S3.3, abstract and summarize the three-level index to form a second-level index content,
S3.4, analyzing the evaluation dimension of the enterprise credit embodied by the index by combining the contents of the third-level index and the second-level index to determine the content of the first-level index,
S3.5, constructing an enterprise credit index system covering three layers of indexes based on the three-level index content, the two-level index content and the first-level index content.
5. The enterprise credit assessment method based on deep learning and logistic regression fusion according to claim 4, wherein the contents of the tertiary, secondary and primary indices decrease in sequence, wherein,
The content of the three-level index comprises specific enterprise credit indexes extracted through a relational data total table;
The content of the second-level index is enterprise credit index which is integrated with business knowledge classification and arrangement based on the third-level index;
The content of the first-level index is an index for evaluating final determination of credit risk of an enterprise, and the first-level index comprises 7 indexes including repayment, industry, operation, performance, area, cash flow and operation, and is applied to radar chart display of an enterprise portrait for evaluating credit risk conditions of the enterprise in each subdivision dimension.
6. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 1, wherein in step S6, exploratory data analysis is performed on the data contained in the relational data table of the training sample and the three-layer index of the enterprise credit index system, and the specific operations are as follows:
S6.1.1, carrying out description statistics on three layers of indexes of the data and enterprise credit index system contained in the relational data table of the training sample;
s6.1.2, analyzing the descriptive statistics of the step S6.1.1, namely, dividing the descriptive data of the specific index by using the index containing the time information as the specific index, so as to further deeply analyze the dynamic change condition of the data and the value condition under a specific condition;
S6.1.3 drawing a histogram curve of the univariate and a relation curve of the univariate and the target variable so as to perform visual analysis on the three-layer index.
7. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 6, wherein in step S6, data cleansing is performed on the three-layer index of the enterprise credit index system and the data contained in the relational data table of the training sample, which specifically comprises the following steps:
s6.2.1, based on the data contained in the relational data table of the training sample, performing invalid value processing on the three layers of indexes of the enterprise credit index system,
S6.2.2, carrying out numerical quantization on three layers of quantifiable indexes in an enterprise credit index system based on data contained in a relational data table of a training sample,
S6.2.3, carrying out missing value statistics on three layers of indexes of an enterprise credit index system based on data contained in a relational data table of a training sample, removing three layers of indexes with missing values more than 60%,
S6.2.4, based on the data contained in the relational data table of the training sample, carrying out statistics of the same value rate on the three layers of indexes remained in the enterprise credit index system after the step S6.2.3, removing the characteristic that the attribute has only one value, removing the three layers of indexes with the same value rate of more than 60 percent,
S6.2.5, firstly removing unreasonable indexes determined in the exploratory data analysis process for three layers of indexes remained after the step S6.2.4, then performing VIF collinearity analysis,
S6.2.6, based on the missing value statistics performed in step S6.2.3, calculating the data missing ratio contained in the relational data table of the training sample according to the data missing ratio contained in the relational data table of the training sample, and removing the dataset with the data missing ratio greater than 50%;
S6.2.7, based on the residual data contained in the relational data table of the training sample after the step S6.2.6, detecting abnormal values of the three layers of indexes remaining in the enterprise credit index system after the step S6.2.5 by adopting a four-bit distance method of a box graph, screening abnormal values of part of indexes according to an upper quartile standard, filling the screened abnormal values with a specific value '-999' as a missing value,
S6.2.8, adopting a RandomForest random forest method to take the characteristic of the index without the missing value in the relational data table of the training sample after the step S6.2.7 as a characteristic variable, respectively selecting the index with the missing value in the relational data table of the training sample as an objective function, training RandomForest models by taking the characteristic variable and the non-missing value of the objective function as training samples, and enabling the trained RandomForest models to predict the missing value of the missing characteristic so as to finish filling of the missing index in all the training samples;
s6.2.9, performing Z-Score standardization processing on the training sample filled with the missing values to form a standardized training sample vector containing primary modeling indexes, which is used for training an enterprise credit assessment model.
8. The enterprise credit assessment method based on the fusion of deep learning and logistic regression according to claim 7, wherein in step S8, the enterprise credit assessment model constructed in step S7 is trained, and the final model-in index and the optimal enterprise credit assessment model are output, and the process specifically includes:
S8.1, training an enterprise credit assessment model: adopting tensorflow and keras open source package training step S7 to construct an enterprise credit assessment model, selecting python as a development training language of the enterprise credit assessment model, selecting a training sample and a prediction sample to conduct 10000 times of iterative training of the enterprise credit assessment model, drawing a learning curve of the enterprise credit assessment model in the training process, observing a loss function, a training sample accuracy and a prediction sample accuracy in the training process of the enterprise credit assessment model, and finally judging whether the enterprise credit assessment model converges or not and whether fitting is performed;
S8.2, performing primary modeling index importance assessment: the method comprises the steps of (1) carrying out Z-Score standardization processing on a training sample filled with missing values to form a standardized training sample vector containing preliminary modulus indexes, (2) randomly generating a list of disturbance variables to replace all the column index vectors of the preliminary modulus indexes in the training sample vector in sequence, generating new training sample vectors, inputting the generated new training sample vectors into a determined neural network to obtain predicted values, calculating a loss function according to the input vectors and the output predicted values, (3) circularly generating 100 disturbance variables for each modulus index, circularly executing the step (2), calculating the average value of the obtained loss function under the training sample vector newly generated 100 times, and evaluating the importance of each preliminary modulus index;
S8.3, iterative tuning is carried out on the enterprise credit assessment model: and sequencing the preliminary modeling indexes according to the average value of the obtained loss function, sequentially selecting different thresholds to screen the preliminary modeling indexes, inputting the preliminary modeling indexes of the predicted samples into an enterprise credit assessment model for training, comparing the manual labeling result of the predicted samples with the predicted result of the enterprise credit assessment model, determining the final modeling indexes, and generating an optimal enterprise credit assessment model.
9. The enterprise credit assessment method based on the combination of deep learning and logistic regression according to claim 1, wherein in step S9, the enterprise breach probability predicted by the optimal enterprise credit assessment model is converted into a standard credit score by the following two methods:
s9a, calculating a feature score by predicting coefficients of the obtained features through a WOE value and an optimal enterprise credit assessment model based on a WOE conversion method;
S9b, predicting the obtained enterprise default probability based on the optimal enterprise credit evaluation model, and converting the standard score according to the default probability.
CN202010868081.2A 2020-08-26 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression Active CN112017025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010868081.2A CN112017025B (en) 2020-08-26 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010868081.2A CN112017025B (en) 2020-08-26 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression

Publications (2)

Publication Number Publication Date
CN112017025A CN112017025A (en) 2020-12-01
CN112017025B true CN112017025B (en) 2024-05-14

Family

ID=73503117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010868081.2A Active CN112017025B (en) 2020-08-26 2020-08-26 Enterprise credit assessment method based on fusion of deep learning and logistic regression

Country Status (1)

Country Link
CN (1) CN112017025B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364182A (en) * 2020-12-09 2021-02-12 交通银行股份有限公司 Graph feature-based enterprise risk conduction prediction method and device and storage medium
CN112767126A (en) * 2021-01-21 2021-05-07 诺亚阿客(上海)网络科技有限公司 Collateral grading method and device based on big data
CN112783884A (en) * 2021-01-29 2021-05-11 浪潮软件股份有限公司 Data optimization method based on normal distribution
CN112906772A (en) * 2021-02-04 2021-06-04 深圳前海微众银行股份有限公司 Sample processing method, device, equipment and computer readable storage medium
CN113011752A (en) * 2021-03-19 2021-06-22 天道金科股份有限公司 Enterprise credit evaluation index system based on big data
CN112990946B (en) * 2021-03-31 2024-05-14 建信金融科技有限责任公司 Enterprise default prediction method, device, medium and electronic equipment
CN113298221B (en) * 2021-04-26 2023-08-22 上海淇玥信息技术有限公司 User Risk Prediction Method and Device Based on Logistic Regression and Graph Neural Network
CN113239199B (en) * 2021-05-18 2022-09-23 重庆邮电大学 Credit classification method based on multi-party data set
CN113283583A (en) * 2021-05-18 2021-08-20 广州致景信息科技有限公司 Method and device for predicting default rate of textile industry, storage medium and processor
CN113610630A (en) * 2021-08-06 2021-11-05 东方口岸科技有限公司 Financial credit modeling method and system based on import and export trade data
CN113449819A (en) * 2021-08-27 2021-09-28 中国测绘科学研究院 Credit evaluation model method based on capsule network and storage medium thereof
CN113643125A (en) * 2021-08-30 2021-11-12 天元大数据信用管理有限公司 Credit line measuring and calculating method, equipment and medium
CN113822542A (en) * 2021-08-30 2021-12-21 天元大数据信用管理有限公司 Enterprise credit investigation platform construction method based on government affair big data
CN114266641A (en) * 2021-09-27 2022-04-01 东方微银科技股份有限公司 Scoring model construction method based on logistic regression and rules
CN114462516B (en) * 2022-01-21 2024-04-16 天元大数据信用管理有限公司 Enterprise credit scoring sample labeling method and device
CN114742238B (en) * 2022-06-14 2022-09-09 四川省郫县豆瓣股份有限公司 Method, device, equipment and medium for screening raw materials of thick broad-bean sauce
CN115471056B (en) * 2022-08-31 2023-05-23 鼎翰文化股份有限公司 Data transmission method and data transmission system
CN115545880A (en) * 2022-09-02 2022-12-30 睿智合创(北京)科技有限公司 Product evaluation method and system applied to credit field
CN115330531B (en) * 2022-09-05 2023-12-22 南方电网数字电网研究院有限公司 Enterprise risk prediction method based on electricity consumption fluctuation period
CN115456753A (en) * 2022-09-07 2022-12-09 安徽省优质采科技发展有限责任公司 Enterprise credit information analysis method and system for bidding platform
CN116596095B (en) * 2023-07-17 2023-11-07 华能山东泰丰新能源有限公司 Training method and device of carbon emission prediction model based on machine learning
CN116645014A (en) * 2023-07-27 2023-08-25 湖南华菱电子商务有限公司 Provider supply data model construction method based on artificial intelligence
CN116757837A (en) * 2023-08-22 2023-09-15 国泰新点软件股份有限公司 Credit wind control method and system applied to winning bid
CN117151867B (en) * 2023-09-20 2024-04-30 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117149293B (en) * 2023-10-30 2024-01-23 北京谷器数据科技有限公司 Personalized configuration method for operating system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880934A (en) * 2012-09-07 2013-01-16 中国标准化研究院 Integrity evaluation method for food enterprise
CN105913195A (en) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 All-industry data based enterprise's financial risk scoring method
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
CN110163467A (en) * 2019-04-02 2019-08-23 苏州纤联电子商务有限公司 A kind of risk quantification modeling method based on textile industry medium-sized and small enterprises credit
CN110580268A (en) * 2019-08-05 2019-12-17 西北大学 Credit scoring integrated classification system and method based on deep learning
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880934A (en) * 2012-09-07 2013-01-16 中国标准化研究院 Integrity evaluation method for food enterprise
CN105913195A (en) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 All-industry data based enterprise's financial risk scoring method
WO2018090657A1 (en) * 2016-11-18 2018-05-24 同济大学 Bp_adaboost model-based method and system for predicting credit card user default
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN110163467A (en) * 2019-04-02 2019-08-23 苏州纤联电子商务有限公司 A kind of risk quantification modeling method based on textile industry medium-sized and small enterprises credit
CN110580268A (en) * 2019-08-05 2019-12-17 西北大学 Credit scoring integrated classification system and method based on deep learning
CN111080442A (en) * 2019-12-21 2020-04-28 湖南大学 Credit scoring model construction method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A deep learning approach for credit scoring using credit default swaps;Cuicui Luo等;Engineering Applications of Artificial Intelligence;20160615;第65卷;465-470 *
基于神经网络的企业信用评估模型;张德栋, 张强;北京理工大学学报;20041130;第24卷(第11期);982-985 *
基于证据权重逻辑回归模型的P2P公司信用风险评估;王金珠;中国优秀硕士学位论文全文数据库 经济与管理科学辑;20170315(第03期);J157-534 *

Also Published As

Publication number Publication date
CN112017025A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN112017025B (en) Enterprise credit assessment method based on fusion of deep learning and logistic regression
CN110704572B (en) Suspected illegal fundraising risk early warning method, device, equipment and storage medium
CN110334212A (en) A kind of territoriality audit knowledge mapping construction method based on machine learning
CN109446416B (en) Law recommendation method based on word vector model
CN111882446A (en) Abnormal account detection method based on graph convolution network
CN112613977A (en) Personal credit loan admission credit granting method and system based on government affair data
CN113837859B (en) Image construction method for small and micro enterprises
CN112767136A (en) Credit anti-fraud identification method, credit anti-fraud identification device, credit anti-fraud identification equipment and credit anti-fraud identification medium based on big data
CN112053234A (en) Enterprise credit rating method based on macroscopic region economic index and microscopic factor
CN109345133B (en) Review method based on big data and deep learning and robot system
Samsir et al. Predicting the loan risk towards new customer applying data mining using nearest neighbor algorithm
CN114202243A (en) Engineering project management risk early warning method and system based on random forest
CN111951050A (en) Financial product recommendation method and device
CN112232944A (en) Scoring card creating method and device and electronic equipment
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology
CN115221387A (en) Enterprise information integration method based on deep neural network
CN114912772A (en) Urban right transparency differential evaluation system matching method and system based on urban economic classification analysis
CN113077271A (en) Enterprise credit rating method and device based on BP neural network
CN117114812A (en) Financial product recommendation method and device for enterprises
Kun et al. Default identification of p2p lending based on stacking ensemble learning
CN116823487A (en) ESG evaluation system investment decision-making system
CN115330526A (en) Enterprise credit scoring method and device
CN115796635A (en) Bank digital transformation maturity evaluation system based on big data and machine learning
CN114820074A (en) Target user group prediction model construction method based on machine learning
Kulothungan Loan Forecast by Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant