CN117522561A - Enterprise credit risk determination method and device - Google Patents

Enterprise credit risk determination method and device Download PDF

Info

Publication number
CN117522561A
CN117522561A CN202311549551.9A CN202311549551A CN117522561A CN 117522561 A CN117522561 A CN 117522561A CN 202311549551 A CN202311549551 A CN 202311549551A CN 117522561 A CN117522561 A CN 117522561A
Authority
CN
China
Prior art keywords
data
enterprise
credit risk
credit
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311549551.9A
Other languages
Chinese (zh)
Inventor
彭丰年
张宜
方松
傅士光
陈晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING JOIN-CHEER SOFTWARE CO LTD
Original Assignee
BEIJING JOIN-CHEER SOFTWARE CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING JOIN-CHEER SOFTWARE CO LTD filed Critical BEIJING JOIN-CHEER SOFTWARE CO LTD
Priority to CN202311549551.9A priority Critical patent/CN117522561A/en
Publication of CN117522561A publication Critical patent/CN117522561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a method and a device for determining credit risk of enterprises, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: acquiring enterprise business condition data and operation management data of an enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise; carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information; and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information. The method and the device can improve the accuracy of enterprise credit risk determination and the efficiency of enterprise credit risk determination.

Description

Enterprise credit risk determination method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an enterprise credit risk determining method and device.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
With the establishment of market economy and marketing operation based on enterprises, the enterprises gradually use credit as important basis modes such as bidding, administrative permissions and the like. In order to keep the enterprise benign and sustainable development, a set of strict credit management system needs to be established, credit risks are identified, reminding is actively determined, and credit loss is reduced.
The rapid development of information technology has the advantages that the data sources are more and more, on one hand, the credit conditions of traffic enterprises can be more accurately and scientifically described, and on the other hand, the traditional credit investigation technology is challenged due to the problems of more data sources, complex structure and the like. Most of the software products in the market at present lack of relevant reminding functions for determining the risks of traffic enterprises, basically passively accept the reduction of credit grades, so that the enterprises cannot meet relevant requirements in bidding and other behaviors and are damaged in operation.
Disclosure of Invention
The embodiment of the invention provides an enterprise credit risk determining method for rapidly determining the credit risk of an enterprise, improving the accuracy of enterprise credit risk determination and improving the efficiency of enterprise credit risk determination, which comprises the following steps:
Acquiring enterprise business condition data and operation management data of an enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information;
and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
The embodiment of the invention also provides an enterprise credit risk determining device, which is used for rapidly determining the credit risk of an enterprise, improving the accuracy of enterprise credit risk determination and improving the efficiency of enterprise credit risk determination, and comprises the following steps:
the first processing module is used for acquiring enterprise business condition data and operation management data of the enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
The second processing module is used for carrying out structural analysis processing on enterprise business condition data and operation management data of an enterprise to be predicted to obtain second credit risk information;
and the third processing module is used for obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the enterprise credit risk determination method when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the enterprise credit risk determination method when being executed by a processor.
Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the enterprise credit risk determination method described above.
In the embodiment of the invention, enterprise business condition data and operation management data of an enterprise to be predicted are obtained, and a trained credit risk prediction model is adopted to obtain first credit risk information of the enterprise to be predicted according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise; carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information; and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information. Thus, the credit risk of the enterprise can be rapidly determined, the accuracy of the credit risk determination of the enterprise is improved, and the efficiency of the credit risk determination of the enterprise is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flow chart of an enterprise credit risk determination method provided in an embodiment of the present invention;
FIG. 2 is a flowchart of a method for training a credit risk prediction model according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an exemplary architecture for multi-source data fusion according to an embodiment of the present invention;
FIG. 4 is an exemplary diagram of a multi-source data acquisition technique provided in an embodiment of the present invention;
FIG. 5 is an exemplary diagram of a multi-source data preprocessing flow provided in an embodiment of the present invention;
FIG. 6 is a flowchart of a method for multi-source data fusion according to an embodiment of the present invention;
FIG. 7 is a flow chart of model training provided in an embodiment of the present invention;
FIG. 8 is a diagram of an overall logical framework of an enterprise credit risk determination method provided in an embodiment of the present invention;
FIG. 9 is a schematic diagram of an enterprise credit risk determination apparatus according to an embodiment of the present invention
Fig. 10 is a schematic diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
In the technical scheme, the acquisition, storage, use, processing and the like of the data all accord with the relevant regulations of laws and regulations.
The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are open-ended terms, meaning including, but not limited to. Reference to the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is used to schematically illustrate the practice of the present application, and is not limited thereto and may be appropriately adjusted as desired.
According to research, with the establishment of market economy and marketing operation based on enterprises, the enterprises gradually take credit as important basis modes of bidding, administrative permissions and the like. In order to keep the enterprise benign and sustainable development, a set of strict credit management system needs to be established, credit risks are identified, reminding is actively determined, and credit loss is reduced.
The rapid development of information technology has the advantages that the data sources are more and more, on one hand, the credit conditions of traffic enterprises can be more accurately and scientifically described, and on the other hand, the traditional credit investigation technology is challenged due to the problems of more data sources, complex structure and the like. Most of the software products in the market at present lack of relevant reminding functions for determining the risks of traffic enterprises, basically passively accept the reduction of credit grades, so that the enterprises cannot meet relevant requirements in bidding and other behaviors and are damaged in operation.
For the above study, as shown in fig. 1, an embodiment of the present invention provides a method for determining credit risk of an enterprise, including:
s101: acquiring enterprise business condition data and operation management data of an enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
S102: carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information;
s103: and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
In the embodiment of the invention, enterprise business condition data and operation management data of an enterprise to be predicted are obtained, and a trained credit risk prediction model is adopted to obtain first credit risk information of the enterprise to be predicted according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise; carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information; and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information. Thus, the credit risk of the enterprise can be rapidly determined, the accuracy of the credit risk determination of the enterprise is improved, and the efficiency of the credit risk determination of the enterprise is improved.
The above-described enterprise credit risk determination method is described in detail below.
For the above S101, the enterprise business condition data and the operation management data of the enterprise to be predicted may be collected from a plurality of data channels, for example, and then weighted average is performed.
In addition, the credit risk prediction model is obtained by training according to enterprise operation condition data, operation management data and credit evaluation data of historical traffic enterprises.
As shown in fig. 2, a flowchart of a method for training to obtain a credit risk prediction model according to an embodiment of the present invention includes:
s201: historical traffic enterprise credit-related multi-source data is collected.
Wherein, as shown in fig. 3, the multi-source data includes enterprise business condition data, operation management data, credit evaluation data collected from a plurality of data channels.
In one embodiment of the present invention, the enterprise business situation data includes, for example: enterprise financial data, tax credit data, public announcement data, and the like; the operation management data includes, for example: intellectual property data, industry regulatory data, administrative license data, public service data, judicial information, legal litigation data, and the like; the credit evaluation data includes, for example: enterprise blacklists, enterprise trust behavior data, credit slip data, personnel performance data, administrative penalty data, illegal recording data, management anomaly data and the like.
As shown in fig. 4, collecting historical transportation enterprise credit-related multi-source data includes, for example: collecting enterprise credit related multi-source data from a traffic industry data management platform in at least one mode of a network HTTP protocol, an ETL data extraction tool and an asynchronous message queue; and acquiring enterprise credit related multi-source data from the Internet in at least one mode of an open API interface platform, script service and cloud acquisition crawling tool.
The traffic industry data management platform includes, for example: data center, data center and data sharing exchange platform management in traffic industry are mostly stored in private network and intranet environments, and are collected according to the situation by adopting the following modes:
1) For small data volume, adopting a network interface protocol mode, and acquiring data from a server through an HTTP request;
2) For large data volume and timing acquisition, ETL data extraction is adopted, operations such as cleaning, integration and conversion are carried out, and finally the data is loaded into a target database or a data warehouse;
3) For large amounts of data to be processed, asynchronous message queues are used, data is sent in the form of messages to the message queues, and the messages are then asynchronously read and processed by the consumer to complete the data collection.
In addition, for the open data of the Internet, the data is sourced from the public platform data of the network, mainly comprises an enterprise network, a traffic credit platform, an Internet credit public system, an enterprise public social platform and the like, the data coverage range is wider, and the Internet data can be acquired respectively by using the following modes:
1) Open API interface platform: through the open API interface platform, a large amount of structured data can be obtained. This approach is applicable where the data source provides an API interface, such as some public social platforms of businesses, etc.
2) Script service: website data may be automatically accessed and crawled using a scripting service. This approach is applicable to situations where it is desirable to automatically obtain large amounts of web page data, such as corporate networks, government public information websites, and the like.
3) Cloud collection crawling tool: website data can be accessed and crawled automatically using a mining cloud crawling tool. This approach is applicable to situations where it is desirable to automatically obtain large amounts of web page data, such as corporate networks, some public information websites, and the like.
S202: carrying out data preprocessing and data fusion on the multi-source data to obtain a training data set; the training data set comprises a training sample and a test sample, input characteristic values in the training sample and the test sample are composed of enterprise operation condition data and operation management data, and output characteristic values in the training sample and the test sample are composed of credit evaluation data.
In one embodiment of the present invention, as shown in table 1, the data preprocessing on the multi-source data includes, for example: and performing at least one operation of missing value processing, outlier processing, deduplication processing, encoding problem processing, format conversion and data standardization on the multi-source data.
TABLE 1
As shown in fig. 5, the missing value processing includes: replacing the missing value in the multi-source data with a first preset value, or deducing the missing value by adopting other data when other related data of the missing value exists in the multi-source data; outlier processing includes: deleting or replacing the abnormal value in the multi-source data with a second preset value; the de-duplication process comprises: deleting repeated data in the multi-source data; the coding problem processing includes: converting characters in the multi-source data into a third preset value; the format conversion includes: converting data in the multi-source data from one format to another format; data normalization includes: and scaling the numerical data in the multi-source data to a preset data range.
Here, the outliers include, for example: data in the data source that is different from most observations or does not meet the standard specification, specifically, for example, the following method may be used to determine an outlier in the multi-source data:
1. The statistical method comprises the following steps: outliers are identified using outlier detection methods, such as 3-fold standard deviation methods, box-plot methods, etc., based on statistical principles.
2. Domain knowledge: according to the expertise and experience in the traffic credit field, whether certain numerical values are abnormal values or not is determined.
3. Data range: and determining a reasonable range of the data according to the service requirements and the actual conditions, and removing the numerical values exceeding the range.
4. Logical relationship: considering the logical relationship and a priori knowledge between data, if there is a significant inconsistency or contradiction between certain data and other variables, it may be defined as an outlier.
In addition, the format conversion data required in the embodiment of the present invention includes the following three types:
1. relational data SQL, RDBMS: such data sources typically use SQL statements or specialized ETL tools for data format conversion, such as date format, string format, currency format, phone number format, file format, and the like.
2. File type data: such data sources typically use a programming language or framework for data format conversion, such as Python, java, spark, flink.
3. Journal file data, XML/Html, JSON CSV/TSV: such data sources typically use text processing tools or programming languages for data format conversion, such as Excel, python, java.
In another embodiment of the present invention, as shown in fig. 6, the merging of the multi-source data after the data preprocessing to obtain a training data set, for example, includes: determining first weight values of enterprise operation condition data, operation management data and credit evaluation data in different data acquisition channels respectively; according to first weight values of enterprise operation condition data, operation management data and credit evaluation data in different data acquisition channels, adopting a weighted average algorithm to fuse the enterprise operation condition data acquired by different channels to obtain fused enterprise operation condition data, adopting the weighted average algorithm to fuse the operation management data acquired by different channels to obtain fused operation management data, and adopting the weighted average algorithm to fuse the credit evaluation data acquired by different channels to obtain fused credit evaluation data; and obtaining a training data set according to the fused enterprise business condition data, the fused operation management data and the fused credit evaluation data.
Wherein data collection includes collecting data from different data sources, either the same type of data or different types of data. Data preprocessing involves washing, normalizing or other necessary preprocessing steps on the collected data to make the different data sources comparable. The weight distribution is to determine the contribution weight of each data source, and subjective evaluation can be performed according to factors such as reliability, accuracy, coverage and the like of the data source, or the weight can be automatically estimated by using some statistical or machine learning methods. Weighted average: and carrying out weighted average calculation on the data of each data source according to the distributed weight to obtain a final fusion result. A simple weighted average formula is typically used: fusion result = Σ (weight x data)/Σ weight. Evaluation of results: the fusion result is evaluated, and the accuracy and reliability of the fusion result can be evaluated by using some indexes or experience of field experts.
In an embodiment of the present invention, the first weight value may be determined, for example, by:
1. the first weight value is determined according to the reliability and authority of the data sources, for example, a matrix comparison method or an expert scoring method is used for evaluating the relative credibility of the data of different sources, and the corresponding first weight value is calculated.
Or 2, analyzing rules or trends of the data in different time periods by using a historical data method or an information concentration method, and calculating corresponding first weight values.
Alternatively, 3, a numerical relative size method or a volatility and correlation method is used to compare the cost effectiveness or degree of variation of the different data and calculate the corresponding first weight value.
Or, 4, evaluating the information quantity or the information density of different data by using an entropy method or an information concentration method, and calculating a corresponding first weight value.
In addition, as shown in fig. 3, in another embodiment of the present invention, in addition to using a weighted average algorithm, a logic model, a neural network model, a machine learning model, and the like may be used for fusing the multi-source data after the data preprocessing.
S203: and training the naive Bayes model by adopting a training data set to obtain a trained credit risk prediction model.
Specifically, a training data set training model is adopted, and classification model algorithms such as naive bayes, decision trees and the like can be adopted.
As shown in fig. 7, the training model process specifically includes, for example: sample data grouping, eigenvalue selection, data preprocessing, word frequency matrix generation, training set and test set division, model training, model prediction, model evaluation and the like, and model training is carried out by adopting a naive Bayesian model algorithm according to the extracted eigenvalue, for example, the model training comprises an operation management data about enterprises, and comprises the following steps: the new and proposed terms are predicted to be the probability of a larger risk, the problem is converted into the value of an expression of P (larger|new and proposed), a full probability formula is applied, and the following equation is further converted into P (larger|new and proposed) = (P (larger)/(P (new) ×p (proposed))p (larger).
Sample data for the enterprise behavioral portion is shown in table 2 below:
TABLE 2
Sample numbering Enterprise business situation data Content after data preprocessing Credit rating data
1 Lower strand holding ratio The proportion of the strands held is reduced Smaller size
2 Reduced capital for registration Capital is reduced Larger size
3 Winning result Winning bid Normal state
4 Increased capital registration Capital is increased Normal state
5 Significant changes for enterprise personnel Personnel, change Smaller size
6 Newly added administrative permissions New addition, administrative permissions Smaller size
7 Newly added standing case Newly added and set up a proposal Larger size
8 Newly added bulletin Newly added, open-court announcement Larger size
9 Novel judge document Novel addition, judge document Smaller size
10 Newly added double random spot check Newly added spot check Smaller size
11 Is penalized by administration Is penalized by administrative penalties Larger size
12 Sign post Sign and notice Normal state
13 Final beneficiary becomesMore, the Beneficiary, change Smaller size
14 Business scope change Business scope, change Smaller size
15 Newly increase external investment Newly added, invest in Normal state
The calculation process in the model training process is shown in the following table 3:
TABLE 3 Table 3
In the model evaluation process: the performance of the model was evaluated on the test set. The classification effect of the model can be evaluated by using indexes such as accuracy, precision, recall, F1 value and the like, and the regression effect of the model can also be evaluated by using indexes such as mean square error, average absolute error and the like.
Specifically, when evaluating the performance of the naive bayes model, consider the evaluation index shown in the following table 4:
TABLE 4 Table 4
For S102, the structural analysis processing is performed on the enterprise business condition data and the operation management data of the enterprise to be predicted to obtain the second credit risk information, which includes: a credit risk level threshold is configured for each data item contained in the enterprise management situation data and the operation management data; converting enterprise business condition data and operation management data of an enterprise to be predicted into structured data, and determining a credit risk level corresponding to each data item of the enterprise to be predicted according to the enterprise business condition data and operation management data structured by the enterprise to be predicted and a credit risk level threshold corresponding to each data item; and averaging the credit risk grades corresponding to the enterprises to be predicted under each data item to obtain second credit risk information.
Specifically, a weighted average method is adopted for collected enterprise management situation data and operation management data of multiple sources of an enterprise to be predicted, the contribution degree of the data to a final result is determined by distributing weights to the data of multiple different sources based on a weighted average principle, and finally, a structured data which is more accurately and reliably processed into a standard is obtained and is statistically classified on the basis.
Further, configuring the credit risk level threshold for each data item included in the enterprise business situation data, the operation management data includes, for example: different risk thresholds are set according to different data parameters.
Exemplary, (1) not updated: (1) the method comprises the following steps Annual financial information of enterprises: enterprise revenue is 0; (2) the method comprises the following steps The enterprise business license expires: the business license expiration date exceeds the current time; (3) the method comprises the following steps Contract segment cumulative yield value: the yield was 0.
(2) Overtime: (1) the method comprises the following steps Project standard section is out of date and is not accepted: the planned delivery date exceeds the monitoring date; (2) the method comprises the following steps Contract segment expiration parameter evaluation: the annual evaluation score of the evaluation time is 0.
(3) The evaluation is not in time: (1) the method comprises the following steps Credit evaluation: unaesthetic 2 days before the credit evaluation node ends; (2) the method comprises the following steps And (3) evaluating completely: not being evaluated; (3) the method comprises the following steps The whole button is needed to be buckled: unbuckled; (4) the method comprises the following steps And (3) the following steps: and is not entered.
(4) Enterprise credit: (1) the method comprises the following steps Blacklist of enterprises: the annual score is grade D; (2) the method comprises the following steps Enterprise trust-loss behavior: the number of times of overrun overload is more than 3; (3) the method comprises the following steps Credit slide down: the score of the comparability credit decreases.
For S103, obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information, for example, includes: and obtaining comprehensive risk information of the enterprise to be predicted by adopting a weighted average algorithm according to the second weight value of the first credit risk information and the third weight value of the second credit risk information which are pre-configured.
For example, different subjective assignment methods, objective assignment methods or subjective and objective combination assignment methods can be used for the second weight value and the third weight value:
1. subjective assignment method: the decision maker subjectively judges and gives the weight according to expert experience, historical data or other relevant information. For example, the weight of the credit slip may be determined according to the probability and the influence degree of the credit slip.
2. Objective assignment method: according to the reliability, accuracy, coverage range and other factors of the credit data source, a weight objective method is automatically estimated by using some statistical or machine learning methods, and the weight is automatically calculated and given by the system. For example, the weights may be calculated from historical data of credit roll-down and a predictive model.
3. Combining subjective and objective assignment method: the subjective assignment method and the objective assignment method are combined, experience and objective analysis of data of a decision maker are integrated, and more accurate and comprehensive weight is given. For example, the probability and the influence degree of credit sliding down can be objectively quantified, and the final weight can be obtained by combining the experience judgment of a decision maker. And determining the risk level of credit sliding according to the set risk level standard to form comprehensive risk data.
In the embodiment of the invention, as shown in fig. 8, based on the data set, the early warning of the main risk is realized by two modes of structural analysis processing and model algorithm, so that the credit risk of an enterprise can be rapidly determined, the accuracy of the credit risk determination of the enterprise is improved, and the efficiency of the credit risk determination of the enterprise is improved.
The embodiment of the invention also provides an enterprise credit risk determining device, which is described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the enterprise credit risk determination method, the implementation of the device can refer to the implementation of the enterprise credit risk determination method, and the repetition is omitted.
As shown in fig. 9, a schematic diagram of an enterprise credit risk determining apparatus according to an embodiment of the present invention includes:
The first processing module 901 is configured to obtain enterprise business condition data and operation management data of an enterprise to be predicted, and obtain first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
the second processing module 902 is configured to perform structural analysis processing on enterprise business condition data and operation management data of an enterprise to be predicted, so as to obtain second credit risk information;
the third processing module 903 is configured to obtain comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
In one possible embodiment, the method further comprises: the model training module is used for collecting multi-source data related to historical traffic enterprise credit; the multi-source data comprises enterprise operation condition data, operation management data and credit evaluation data which are collected from a plurality of data channels; carrying out data preprocessing and data fusion on the multi-source data to obtain a training data set; the training data set comprises a training sample and a test sample, input characteristic values in the training sample and the test sample are composed of enterprise operation condition data and operation management data, and output characteristic values in the training sample and the test sample are composed of credit evaluation data; and training the naive Bayes model by adopting a training data set to obtain a trained credit risk prediction model.
In one possible implementation manner, the model training module is specifically configured to collect enterprise credit related multi-source data from the traffic industry data management platform through at least one mode of a network HTTP protocol, an ETL data extraction tool and an asynchronous message queue; and acquiring enterprise credit related multi-source data from the Internet in at least one mode of an open API interface platform, script service and cloud acquisition crawling tool.
In one possible implementation manner, the model training module is specifically configured to perform at least one operation of missing value processing, outlier processing, deduplication processing, coding problem processing, format conversion, and data normalization on the multi-source data; wherein the missing value processing includes: replacing the missing value in the multi-source data with a first preset value, or deducing the missing value by adopting other data when other related data of the missing value exists in the multi-source data; outlier processing includes: deleting or replacing the abnormal value in the multi-source data with a second preset value; the de-duplication process comprises: deleting repeated data in the multi-source data; the coding problem processing includes: converting characters in the multi-source data into a third preset value; the format conversion includes: converting data in the multi-source data from one format to another format; data normalization includes: and scaling the numerical data in the multi-source data to a preset data range.
In one possible implementation manner, the model training module is specifically configured to determine first weight values of the enterprise business condition data, the operation management data, and the credit evaluation data in different data acquisition channels respectively; according to first weight values of enterprise operation condition data, operation management data and credit evaluation data in different data acquisition channels, adopting a weighted average algorithm to fuse the enterprise operation condition data acquired by different channels to obtain fused enterprise operation condition data, adopting the weighted average algorithm to fuse the operation management data acquired by different channels to obtain fused operation management data, and adopting the weighted average algorithm to fuse the credit evaluation data acquired by different channels to obtain fused credit evaluation data; and obtaining a training data set according to the fused enterprise business condition data, the fused operation management data and the fused credit evaluation data.
In a possible implementation manner, the second processing module is specifically configured to configure a credit risk level threshold for each data item included in the enterprise business situation data and the operation management data; converting enterprise business condition data and operation management data of an enterprise to be predicted into structured data, and determining a credit risk level corresponding to each data item of the enterprise to be predicted according to the enterprise business condition data and operation management data structured by the enterprise to be predicted and a credit risk level threshold corresponding to each data item; and averaging the credit risk grades corresponding to the enterprises to be predicted under each data item to obtain second credit risk information.
In a possible implementation manner, the third processing module is specifically configured to obtain, according to the second weight value of the first credit risk information and the third weight value of the second credit risk information that are configured in advance, a weighted average algorithm, so as to obtain comprehensive risk information of the enterprise to be predicted.
Based on the foregoing inventive concept, as shown in fig. 10, the present invention further proposes a computer device 1000, including a memory 1010, a processor 1020, and a computer program 1030 stored on the memory 1010 and executable on the processor 1020, where the processor 1020 implements the aforementioned enterprise credit risk determination method when executing the computer program 1030.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the enterprise credit risk determination method when being executed by a processor.
Embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the enterprise credit risk determination method described above.
In the embodiment of the invention, enterprise business condition data and operation management data of an enterprise to be predicted are obtained, and a trained credit risk prediction model is adopted to obtain first credit risk information of the enterprise to be predicted according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise; carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information; and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information. Thus, the credit risk of the enterprise can be rapidly determined, the accuracy of the credit risk determination of the enterprise is improved, and the efficiency of the credit risk determination of the enterprise is improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (14)

1. A method for determining credit risk of an enterprise, comprising:
acquiring enterprise business condition data and operation management data of an enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
carrying out structural analysis processing on enterprise operation condition data and operation management data of an enterprise to be predicted to obtain second credit risk information;
and obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
2. The enterprise credit risk determination method of claim 1, further comprising:
collecting multi-source data related to historical traffic enterprise credit; the multi-source data comprises enterprise operation condition data, operation management data and credit evaluation data which are collected from a plurality of data channels;
carrying out data preprocessing and data fusion on the multi-source data to obtain a training data set; the training data set comprises a training sample and a test sample, input characteristic values in the training sample and the test sample are composed of enterprise operation condition data and operation management data, and output characteristic values in the training sample and the test sample are composed of credit evaluation data;
And training the naive Bayes model by adopting a training data set to obtain a trained credit risk prediction model.
3. The enterprise credit risk determination method of claim 2, wherein collecting historical traffic enterprise credit-related multi-source data comprises:
collecting enterprise credit related multi-source data from a traffic industry data management platform in at least one mode of a network HTTP protocol, an ETL data extraction tool and an asynchronous message queue;
and acquiring enterprise credit related multi-source data from the Internet in at least one mode of an open API interface platform, script service and cloud acquisition crawling tool.
4. The enterprise credit risk determination method of claim 2, wherein the data preprocessing of the multi-source data comprises: performing at least one operation of missing value processing, outlier processing, deduplication processing, encoding problem processing, format conversion and data standardization on the multi-source data;
wherein the missing value processing includes: replacing the missing value in the multi-source data with a first preset value, or deducing the missing value by adopting other data when other related data of the missing value exists in the multi-source data;
outlier processing includes: deleting or replacing the abnormal value in the multi-source data with a second preset value;
The de-duplication process comprises: deleting repeated data in the multi-source data;
the coding problem processing includes: converting characters in the multi-source data into a third preset value;
the format conversion includes: converting data in the multi-source data from one format to another format;
data normalization includes: and scaling the numerical data in the multi-source data to a preset data range.
5. The enterprise credit risk determination method of claim 2, wherein fusing the data pre-processed multi-source data to obtain a training data set comprises:
determining first weight values of enterprise operation condition data, operation management data and credit evaluation data in different data acquisition channels respectively;
according to first weight values of enterprise operation condition data, operation management data and credit evaluation data in different data acquisition channels, adopting a weighted average algorithm to fuse the enterprise operation condition data acquired by different channels to obtain fused enterprise operation condition data, adopting the weighted average algorithm to fuse the operation management data acquired by different channels to obtain fused operation management data, and adopting the weighted average algorithm to fuse the credit evaluation data acquired by different channels to obtain fused credit evaluation data;
And obtaining a training data set according to the fused enterprise business condition data, the fused operation management data and the fused credit evaluation data.
6. The method for determining credit risk of an enterprise according to claim 1, wherein the performing structural analysis processing on the enterprise business condition data and the operation management data of the enterprise to be predicted to obtain the second credit risk information includes:
a credit risk level threshold is configured for each data item contained in the enterprise management situation data and the operation management data;
converting enterprise business condition data and operation management data of an enterprise to be predicted into structured data, and determining a credit risk level corresponding to each data item of the enterprise to be predicted according to the enterprise business condition data and operation management data structured by the enterprise to be predicted and a credit risk level threshold corresponding to each data item;
and averaging the credit risk grades corresponding to the enterprises to be predicted under each data item to obtain second credit risk information.
7. The method for determining credit risk of an enterprise according to claim 1, wherein obtaining comprehensive risk information of the enterprise to be predicted based on the first credit risk information and the second credit risk information comprises:
And obtaining comprehensive risk information of the enterprise to be predicted by adopting a weighted average algorithm according to the second weight value of the first credit risk information and the third weight value of the second credit risk information which are pre-configured.
8. An enterprise credit risk determination apparatus, comprising:
the first processing module is used for acquiring enterprise business condition data and operation management data of the enterprise to be predicted, and acquiring first credit risk information of the enterprise to be predicted by adopting a trained credit risk prediction model according to the enterprise business condition data and the operation management data of the enterprise to be predicted; the credit risk prediction model is obtained by training a naive Bayesian model according to enterprise operation condition data, operation management data and credit evaluation data of a historical traffic enterprise;
the second processing module is used for carrying out structural analysis processing on enterprise business condition data and operation management data of an enterprise to be predicted to obtain second credit risk information;
and the third processing module is used for obtaining comprehensive risk information of the enterprise to be predicted according to the first credit risk information and the second credit risk information.
9. The enterprise credit risk determination apparatus of claim 8, further comprising:
The model training module is used for collecting multi-source data related to historical traffic enterprise credit; the multi-source data comprises enterprise operation condition data, operation management data and credit evaluation data which are collected from a plurality of data channels;
carrying out data preprocessing and data fusion on the multi-source data to obtain a training data set; the training data set comprises a training sample and a test sample, input characteristic values in the training sample and the test sample are composed of enterprise operation condition data and operation management data, and output characteristic values in the training sample and the test sample are composed of credit evaluation data;
and training the naive Bayes model by adopting a training data set to obtain a trained credit risk prediction model.
10. The enterprise credit risk determination apparatus of claim 8, wherein the second processing module is specifically configured to configure a credit risk level threshold for each data item included in the enterprise business situation data, the operation management data;
converting enterprise business condition data and operation management data of an enterprise to be predicted into structured data, and determining a credit risk level corresponding to each data item of the enterprise to be predicted according to the enterprise business condition data and operation management data structured by the enterprise to be predicted and a credit risk level threshold corresponding to each data item;
And averaging the credit risk grades corresponding to the enterprises to be predicted under each data item to obtain second credit risk information.
11. The enterprise credit risk determination apparatus of claim 8, wherein the third processing module is specifically configured to obtain the comprehensive risk information of the enterprise to be predicted by adopting a weighted average algorithm according to the second weight value of the preconfigured first credit risk information and the third weight value of the second credit risk information.
12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 7 when executing the computer program.
13. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.
14. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the method of any of claims 1 to 7.
CN202311549551.9A 2023-11-20 2023-11-20 Enterprise credit risk determination method and device Pending CN117522561A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311549551.9A CN117522561A (en) 2023-11-20 2023-11-20 Enterprise credit risk determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311549551.9A CN117522561A (en) 2023-11-20 2023-11-20 Enterprise credit risk determination method and device

Publications (1)

Publication Number Publication Date
CN117522561A true CN117522561A (en) 2024-02-06

Family

ID=89752723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311549551.9A Pending CN117522561A (en) 2023-11-20 2023-11-20 Enterprise credit risk determination method and device

Country Status (1)

Country Link
CN (1) CN117522561A (en)

Similar Documents

Publication Publication Date Title
Lyu et al. Artificial Intelligence and emerging digital technologies in the energy sector
CN112464094B (en) Information recommendation method and device, electronic equipment and storage medium
CN106933956B (en) Data mining method and device
CN112734559B (en) Enterprise credit risk evaluation method and device and electronic equipment
Li et al. Research and application of random forest model in mining automobile insurance fraud
US10671926B2 (en) Method and system for generating predictive models for scoring and prioritizing opportunities
CN112668859A (en) Big data based customer risk rating method, device, equipment and storage medium
Mohammad et al. Customer churn prediction in telecommunication industry using machine learning classifiers
CN111738843B (en) Quantitative risk evaluation system and method using running water data
WO2017071369A1 (en) Method and device for predicting user unsubscription
CN113051291A (en) Work order information processing method, device, equipment and storage medium
CN112232833A (en) Lost member customer group data prediction method, model training method and model training device
KR20200053387A (en) System and method for automated management of customer churn based on artificial intelligence and computer program for the same
Oshodi et al. Comparing univariate techniques for tender price index forecasting: Box-Jenkins and neural network model
CN114638498A (en) ESG evaluation method, ESG evaluation system, electronic equipment and storage equipment
Voloshyn et al. Fuzzy mathematical modeling financial risks
CN111738610A (en) Public opinion data-based enterprise loss risk early warning system and method
CN116579640A (en) Power marketing service channel user experience assessment method and system
CN117522561A (en) Enterprise credit risk determination method and device
CN115330490A (en) Product recommendation method and device, storage medium and equipment
CN114912538A (en) Information push model training method, information push method, device and equipment
CN113627997A (en) Data processing method and device, electronic equipment and storage medium
CN114092216A (en) Enterprise credit rating method, apparatus, computer device and storage medium
CN110443646B (en) Product competition relation network analysis method and system
CN113656692A (en) Product recommendation method, device, equipment and medium based on knowledge migration algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination