CN117764724A - Intelligent credit rating report construction method and system - Google Patents

Intelligent credit rating report construction method and system Download PDF

Info

Publication number
CN117764724A
CN117764724A CN202410012492.XA CN202410012492A CN117764724A CN 117764724 A CN117764724 A CN 117764724A CN 202410012492 A CN202410012492 A CN 202410012492A CN 117764724 A CN117764724 A CN 117764724A
Authority
CN
China
Prior art keywords
data
enterprise
generate
public opinion
credit rating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410012492.XA
Other languages
Chinese (zh)
Inventor
方园
毛继恩
蒋申
郑惠文
张祺
王卓林
柯志平
牛海洋
方深田
钟亚剑
梁永寿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Securities Pengyuan Credit Rating Co ltd
Original Assignee
China Securities Pengyuan Credit Rating Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Securities Pengyuan Credit Rating Co ltd filed Critical China Securities Pengyuan Credit Rating Co ltd
Priority to CN202410012492.XA priority Critical patent/CN117764724A/en
Publication of CN117764724A publication Critical patent/CN117764724A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of credit rating, in particular to an intelligent credit rating report construction method and system. The method comprises the following steps: acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated; carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; the invention improves the accuracy and the comprehensiveness of credit rating by integrating the characteristics of multi-source data, multi-dimensional analysis and dynamic updating.

Description

Intelligent credit rating report construction method and system
Technical Field
The invention relates to the technical field of credit rating, in particular to an intelligent credit rating report construction method and system.
Background
Credit rating was at the earliest done manually. Banks and financial institutions rely on analysts to manually collect and evaluate large amounts of financial data and personal information for risk assessment. This stage relies on experience and judgment of the practitioner. With the development of information technology, credit ratings began to shift to the direction of data and automation. Financial institutions began to utilize computer technology and databases to store and process large amounts of financial and personal data. This transition makes the rating process more efficient, but still requires human intervention for data interpretation and analysis. As the application of machine learning and artificial intelligence in credit ratings increases, so too does the demands on the interpretability and transparency of algorithms. Researchers and practitioners are working on developing algorithms that can interpret their decision process to ensure fairness and rationality of the rating results. However, current rating methods still rely on static evaluation, while the heterogeneity of data sources may lead to islanding and fragmentation of data, resulting in insufficient accuracy and dynamics of credit rating reports.
Disclosure of Invention
Based on this, it is necessary to provide an intelligent credit rating report construction method and system to solve at least one of the above technical problems.
To achieve the above object, an intelligent credit rating report construction method includes the steps of:
step S1: acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
step S2: carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
step S3: importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
Step S4: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
According to the invention, the target enterprise data and the due-job investigation document data are acquired, information from different sources and different types can be provided, the data on which the credit rating model is based is enriched, the due-job investigation document data and the target enterprise data are integrated to form the enterprise multi-source heterogeneous data set, so that comprehensive consideration of information in multiple aspects is facilitated, the comprehensiveness and the comprehensiveness of the rating model are improved, preprocessing is performed on the enterprise multi-source heterogeneous data set, cleaning and standardization of the data are facilitated, the data quality problem is solved, the reliability and the accuracy of subsequent analysis are ensured, the multi-source heterogeneous data are organized into multi-source unified data, the data of different types can be analyzed under the same framework, the subsequent data processing flow is simplified, the enterprise association graph data is generated through semantic association mining, the potential relation between data items can be revealed, the understanding degree of the data is improved, and deeper analysis and modeling are facilitated. And carrying out emotion related item elimination processing on the enterprise multisource unified data according to the enterprise related spectrum data, so that irrelevant or irrelevant information on credit rating can be removed, and the cleanliness and accuracy of the data are improved. After the emotion association is removed, enterprise multisource association data are formed, redundant information is reduced, the data are more compact, and the efficiency and accuracy of the credit rating model are improved. The multi-dimensional feature extraction is carried out on the multi-source associated data of the enterprise, so that potential influencing factors can be mined, feature sets used by the credit rating model are enriched, and comprehensive understanding of the model on the enterprise condition is improved. Through feature extraction, the generated enterprise multisource feature data contains more dimensional information, so that the credit rating model can more comprehensively consider various aspects of the enterprise, and the prediction capability of the model is improved. Model training is carried out by utilizing the generated enterprise multisource characteristic data, so that a more accurate and reliable credit rating optimization model can be built. The model can better predict the credit status of the enterprise by learning patterns and correlations in the data. And importing the enterprise multisource characteristic data into a credit rating optimization model for prediction, and generating a prediction model result summarization data set. The credit rating model is constructed by utilizing the multi-source data and the characteristics, and the prediction accuracy and the comprehensiveness of the model are improved. And carrying out text description on the prediction model result summary data set to obtain a first text-credit rating association option. These options are then associated scoring corrected, which may include a manual or automated process, to correct for possible bias or inaccuracy of the model, and to improve the reliability and accuracy of the model predictions. And after the correlation score correction, generating text-credit correlation option comprehensive score data. This step may incorporate various scoring criteria or metrics to obtain a more comprehensive and comprehensive assessment of the credit rating of the business. And carrying out repayment capability assessment by using the text-credit associated option comprehensive scoring data. This may involve assessment of financial aspects, liability levels, business conditions, etc. of the business to generate repayment capability assessment data for the business, providing an important reference for resolving current and future financial risks for the business. Through real-time data collection and event update classification, the system can provide instant assessment of the repayment capability of the enterprise, and real-time decision support for the enterprise and stakeholders. This is very valuable to investors, financial institutions and other stakeholders, especially in environments where market fluctuations are large and information changes are fast. Real-time updates allow the system to more quickly capture and respond to changes in the business's economic status. This is critical to making timely decisions under rapidly changing market conditions, helping to scale through enterprise risks. Real-time data and event classification helps to more accurately identify potential risk factors, enabling enterprises to better perform risk management. This helps predict possible financial problems and take appropriate action to go through potential negative effects. By updating the data set in real-time, the credit rating may more accurately reflect the current economic status of the enterprise. This helps to avoid inaccurate evaluations based on outdated information, improving the accuracy and reliability of the credit rating. Real-time updates to enterprise credit assessment may help financial institutions and other service providers better understand the financial status of customers. This helps provide more personalized services, formulate more appropriate products, and more flexible loan conditions. By integrating real-time data, event classification, and credit assessment, an enterprise can obtain more comprehensive and deep business intelligence. This helps to discover potential opportunities, optimize business strategies, and better accommodate market changes. Therefore, the invention improves the accuracy and the comprehensiveness of credit rating by integrating the characteristics of multi-source data, multi-dimensional analysis and dynamic updating.
The method has the beneficial effects that the multi-source unified data is formed by integrating a plurality of data sources including target enterprise data and due-job investigation document data and integrating and preprocessing the data, so that the conditions and characteristics of the enterprise are comprehensively and comprehensively known. Through semantic association mining, enterprise association graph data are generated, so that the association relation among different data items can be understood, and the meaning and the mutual association behind the data can be better grasped. By eliminating emotion associated items, the data quality can be improved, the negative influence on the credit rating is reduced, and the credit rating is more objective and accurate. Multidimensional features are extracted from the multisource associated data, a credit rating optimization model is trained based on the features, and modeling capacity and prediction accuracy of enterprise credit ratings are enhanced. The enterprise repayment capability evaluation data are collected, updated and classified in real time, so that the change of the enterprise condition can be captured in time, and timeliness and accuracy of the credit rating report are ensured. And combining the enterprise credit evaluation data set updated in real time to generate an intelligent credit rating report, wherein the intelligent credit rating report presents the rating result of the multiparty data comprehensive consideration to the user, and provides more comprehensive and accurate enterprise credit evaluation information. And the repayment capability assessment is carried out through the text-credit related option comprehensive scoring data, so that the grading result is more comprehensive, and the grading result not only depends on numerical data, but also comprises comprehensive analysis of text description. Therefore, the invention improves the accuracy and the comprehensiveness of credit rating by integrating the characteristics of multi-source data, multi-dimensional analysis and dynamic updating.
Drawings
FIG. 1 is a flow chart illustrating steps of an intelligent credit rating report construction method;
FIG. 2 is a flowchart illustrating the detailed implementation of step S2 in FIG. 1;
FIG. 3 is a flowchart illustrating the detailed implementation of step S24 in FIG. 2;
FIG. 4 is a flowchart illustrating the detailed implementation of step S244 in FIG. 3;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
To achieve the above object, referring to fig. 1 to 4, an intelligent credit rating report construction method includes the steps of:
step S1: acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
step S2: carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
Step S3: importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
step S4: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
According to the invention, the target enterprise data and the due-job investigation document data are acquired, information from different sources and different types can be provided, the data on which the credit rating model is based is enriched, the due-job investigation document data and the target enterprise data are integrated to form the enterprise multi-source heterogeneous data set, so that comprehensive consideration of information in multiple aspects is facilitated, the comprehensiveness and the comprehensiveness of the rating model are improved, preprocessing is performed on the enterprise multi-source heterogeneous data set, cleaning and standardization of the data are facilitated, the data quality problem is solved, the reliability and the accuracy of subsequent analysis are ensured, the multi-source heterogeneous data are organized into multi-source unified data, the data of different types can be analyzed under the same framework, the subsequent data processing flow is simplified, the enterprise association graph data is generated through semantic association mining, the potential relation between data items can be revealed, the understanding degree of the data is improved, and deeper analysis and modeling are facilitated. And carrying out emotion related item elimination processing on the enterprise multisource unified data according to the enterprise related spectrum data, so that irrelevant or irrelevant information on credit rating can be removed, and the cleanliness and accuracy of the data are improved. After the emotion association is removed, enterprise multisource association data are formed, redundant information is reduced, the data are more compact, and the efficiency and accuracy of the credit rating model are improved. The multi-dimensional feature extraction is carried out on the multi-source associated data of the enterprise, so that potential influencing factors can be mined, feature sets used by the credit rating model are enriched, and comprehensive understanding of the model on the enterprise condition is improved. Through feature extraction, the generated enterprise multisource feature data contains more dimensional information, so that the credit rating model can more comprehensively consider various aspects of the enterprise, and the prediction capability of the model is improved. Model training is carried out by utilizing the generated enterprise multisource characteristic data, so that a more accurate and reliable credit rating optimization model can be built. The model can better predict the credit status of the enterprise by learning patterns and correlations in the data. And importing the enterprise multisource characteristic data into a credit rating optimization model for prediction, and generating a prediction model result summarization data set. The credit rating model is constructed by utilizing the multi-source data and the characteristics, and the prediction accuracy and the comprehensiveness of the model are improved. And carrying out text description on the prediction model result summary data set to obtain a first text-credit rating association option. These options are then associated scoring corrected, which may include a manual or automated process, to correct for possible bias or inaccuracy of the model, and to improve the reliability and accuracy of the model predictions. And after the correlation score correction, generating text-credit correlation option comprehensive score data. This step may incorporate various scoring criteria or metrics to obtain a more comprehensive and comprehensive assessment of the credit rating of the business. And carrying out repayment capability assessment by using the text-credit associated option comprehensive scoring data. This may involve assessment of financial aspects, liability levels, business conditions, etc. of the business to generate repayment capability assessment data for the business, providing an important reference for resolving current and future financial risks for the business. Through real-time data collection and event update classification, the system can provide instant assessment of the repayment capability of the enterprise, and real-time decision support for the enterprise and stakeholders. This is very valuable to investors, financial institutions and other stakeholders, especially in environments where market fluctuations are large and information changes are fast. Real-time updates allow the system to more quickly capture and respond to changes in the business's economic status. This is critical to making timely decisions under rapidly changing market conditions, helping to scale through enterprise risks. Real-time data and event classification helps to more accurately identify potential risk factors, enabling enterprises to better perform risk management. This helps predict possible financial problems and take appropriate action to go through potential negative effects. By updating the data set in real-time, the credit rating may more accurately reflect the current economic status of the enterprise. This helps to avoid inaccurate evaluations based on outdated information, improving the accuracy and reliability of the credit rating. Real-time updates to enterprise credit assessment may help financial institutions and other service providers better understand the financial status of customers. This helps provide more personalized services, formulate more appropriate products, and more flexible loan conditions. By integrating real-time data, event classification, and credit assessment, an enterprise can obtain more comprehensive and deep business intelligence. This helps to discover potential opportunities, optimize business strategies, and better accommodate market changes. Therefore, the invention improves the accuracy and the comprehensiveness of credit rating by integrating the characteristics of multi-source data, multi-dimensional analysis and dynamic updating.
In the embodiment of the present invention, as described with reference to fig. 1, the method for constructing an intelligent credit rating report of the present invention includes the following steps:
step S1: acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
in the embodiment of the invention, financial data, human resource information, business operation data and the like of a target enterprise are acquired through various ways (such as a database, an API (application program interface), file import and the like), due job investigation document data from different sources, including audit reports, contract files, annual reports, lawyer files and the like, are acquired, and the heterogeneous data sources are integrated, so that data cleaning, standardization and conversion are needed for subsequent processing and analysis. Data integration may be accomplished using ETL (Transform) tools or relational libraries in programming languages. And processing missing values, abnormal values and repeated data in the data, unifying data formats, units and naming specifications, ensuring data consistency, and extracting, converting or creating new features for the data so as to facilitate subsequent association mining. Text analysis is performed using natural language processing techniques, entities (e.g., companies, people, key terms) in documents and their relationships are identified, a graph database or graph model is built based on the identified entities and relationships, the entities and relationships are stored in the form of graphs, and analysis is performed using a graph algorithm.
Step S2: carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
in the embodiment of the invention, the emotion association items in the enterprise multisource unified data are identified by using the enterprise association graph data as a basis. This may include text in the association, comments, ratings, etc. Emotion analysis techniques are used to determine emotion polarity, e.g., positive, negative, or neutral, for emotion associations. This may involve natural language processing techniques such as emotion dictionaries, machine learning models, etc. A series of multidimensional features are defined based on the correlation of business requirements and credit ratings. This may include information in many aspects, such as financial data, business conditions, market impact, etc. These features are extracted from the enterprise multisource association data. This may require the use of statistical methods, aggregation functions, time series analysis, etc. techniques to ensure that the extracted features are representative and practical. Depending on the nature of the problem, an appropriate machine learning or deep learning model is selected. For credit rating problems, classification algorithms may be selected, such as logistic regression, decision trees, random forests, etc. The data set is divided into a training set and a test set to evaluate the performance of the model. The model is trained using the training set and validated through the test set. This may require a hyper-parameter adjustment to optimize the performance of the model. And evaluating the performance of the model, wherein the performance comprises indexes such as accuracy, recall rate, precision and the like. The optimization of the model is carried out according to the requirement, and the operations such as feature engineering, model parameter adjustment and the like can be involved. And deploying the optimized model into a production environment so as to be practically applied. This may require integrating the model into the decision making system or other related systems of the enterprise.
Step S3: importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
in the embodiment of the invention, the enterprise multi-source characteristic data is imported into the credit rating optimization model trained previously, and the model is used for carrying out credit rating prediction on the enterprise. This may involve inputting multi-source feature data into the model, obtaining credit rating predictions for each enterprise. The credit rating predictions are consolidated into a summary dataset, and a textual description is generated for the credit rating predictions for each business. This may include information such as a rating level, probability, or score. A textual description option is defined that is related to the credit rating, which may be the meaning of the rating level, possible influencing factors, etc., each associated option being assigned an association score to reflect its degree of association with the credit rating. And correcting the association scores of the first text-credit rating association options, wherein a statistical method, a machine learning method, expert adjustment or the like can be adopted to ensure that the association scores are more accurate. The corrected associated scores are integrated to form text-credit associated option integrated score data, indicators required for the assessment of the repayment capacity, such as liability ratio, flow ratio, interest payment coverage, etc., are extracted and sorted from the enterprise multi-source feature data, the repayment capacity of the enterprise is assessed using the selected assessment indicators, and a rules engine or a special assessment model can be used. The results of the repayment capability assessment are consolidated into a data set, and the repayment capability assessment data is correlated with the prior text-credit correlation option composite scoring data to generate final corporate repayment capability assessment data.
Step S4: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
In the embodiment of the invention, the data required by the repayment capability evaluation can be obtained from the multi-source data source of the enterprise in real time by configuring the data collection system. This may include financial statements, transaction records, market data, etc., ensuring that the data collection system is able to synchronize data in real time to reflect the latest situation of the business in time. Processing the data collected in real time, ensuring consistency and accuracy of the data format, calculating the real-time data by using the previously defined repayment capability assessment index, generating a real-time repayment capability assessment data set, configuring an event monitoring system, monitoring real-time events related to the repayment capability of the enterprise, such as financial significant fluctuation, market fluctuation and the like, classifying the monitored real-time events, and determining the influence degree of the monitored real-time events on credit assessment of the enterprise. And marking the part of the credit evaluation of the corresponding enterprise, which needs to be updated, according to the classification result of the real-time event, combining the real-time repayment capability evaluation data set with the update mark, and generating an enterprise credit evaluation update data set. The enterprise credit assessment update dataset is merged with the previous credit assessment dataset, ensuring that the latest assessment information is contained. The combined data sets are consistent in format and structure, and subsequent processing is facilitated. And generating an intelligent credit rating report of the enterprise by utilizing the combined data set. The report may include credit rating, repayment capability analysis, latest event impact, etc. A mechanism for periodic updates is set to ensure that the information in the report is updated as the real-time data changes.
Preferably, step S1 comprises the steps of:
step S11: acquiring target enterprise data by using a distributed crawler technology; compiling investigation documents according to the target enterprise data, so as to obtain due-job investigation document data;
step S12: integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; performing data cleaning on the enterprise multi-source heterogeneous data set to generate enterprise multi-source cleaning data; filling the data missing values of the enterprise multi-source cleaning data to generate enterprise multi-source filling data;
step S13: carrying out data standardization on the enterprise multi-source filling data by a maximum-minimum standardization method to generate standard enterprise multi-source data;
step S14: carrying out data structuring unified processing on the standard enterprise multi-source data to generate enterprise multi-source unified data; and carrying out semantic association mining on the enterprise multi-source unified data by using a natural language technology, and generating enterprise association graph data.
The invention can effectively collect the data of the target enterprises from a plurality of sources by utilizing the distributed crawler technology, which is helpful for acquiring more comprehensive and diversified information, and the collected data is tidied and archived by compiling the investigation document, thereby facilitating the subsequent processing and analysis. Integrating the due-job investigation document data with the target enterprise data to form a set containing multiple sources and multiple types of data, wherein data cleaning is helpful for removing errors, inconsistencies or incompleteness in the data, ensuring the accuracy and consistency of the data, and filling missing values is to process blank or missing parts possibly existing in the data so as to ensure the integrity and usability of the data set. The data is mapped into a unified numerical range by a maximum-minimum standardization method, so that the data with different scales or units can be compared and analyzed, which is helpful to eliminate the measurement units and scale differences possibly existing in different data sources, and the data is more comparable. The standardized data is further processed so as to better understand the association and meaning between the data, the semantic association mining is carried out on the data through natural language technology, the relationship hidden in the data can be found, enterprise association graph data is formed, and the key association and structure inside the enterprise can be revealed. Integrating and cleansing data can help form a more comprehensive data set, provide deeper insight, reduce misleading caused by fragmentation of information, cleansing, filling and standardization can improve quality and accuracy of data, reduce misunderstanding or erroneous decision caused by data errors, and through semantic association mining, potential associations and patterns behind the data can be found, provide insight with more depth and insight for enterprises, and help make more intelligent decisions.
In the embodiment of the invention, the distributed crawler frames such as the Scrapy, the Apache Nutch and the like are used for accelerating the data acquisition process, determining the possible website of the target enterprise information, designing the crawler rules, ensuring the effective grabbing of key information, extracting the structured data such as the company basic information, the financial data and the like from the webpage by utilizing the crawler technology, storing the crawled data in a proper data warehouse or database, and ensuring the data security and accessibility. Integrating data obtained from different sources into a centralized data store, possibly using ETL (extraction, transformation, loading) tools, identifying and handling outliers, duplicates and inconsistencies in the data, ensuring the quality and consistency of the data, filling in missing data values using suitable algorithms or models, statistical methods, machine learning models, etc. can be employed. The data is mapped to a specific range using a max-min normalization method, ensuring comparability of different data items, and a data processing library such as Pandas (Python), spark, etc. can be used for normalization operations. According to the data property, further structuring processing is carried out to ensure that the data format is consistent, subsequent analysis and mining are convenient, a data processing tool is used for writing scripts or programs, structuring processing is carried out on the data, NLP technology such as word embedding, entity recognition and the like is used for processing text data, key information is extracted, a data mining algorithm or a graph database is used for finding relevance in the data, an enterprise relevance graph is constructed, and a visualization tool is used for displaying the enterprise relevance graph so as to more intuitively understand the internal relationship of the enterprise.
Preferably, step S2 comprises the steps of:
step S21: carrying out emotion dictionary construction according to enterprise association graph data to generate an emotion dictionary; carrying out emotion marking on the target enterprise data and the due-job investigation document data based on the emotion dictionary, and generating emotion marking data;
step S22: carrying out context weight distribution on the emotion marking data to generate vocabulary emotion weight data; semantic analysis is carried out on the enterprise multisource unified data by using vocabulary emotion weight data and emotion dictionary, and enterprise multisource semantic analysis data are generated;
step S23: carrying out emotion recognition on the enterprise multisource semantic analysis data to generate enterprise multisource emotion feature data; carrying out emotion association rejection processing on the enterprise multisource emotion feature data so as to generate enterprise multisource association data;
step S24: carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data;
step S25: carrying out data division on the enterprise multisource characteristic data to generate a model training set and a model testing set; model training is carried out on the model training set through a supervised learning algorithm, and a credit rating pre-model is generated; performing model test on the credit rating pre-model according to the model test set, so as to generate an enterprise credit rating model;
Step S26: performing model performance evaluation on the enterprise credit rating model by using a cross verification method to generate model performance evaluation data; and performing parameter tuning on the enterprise credit rating model according to the model performance evaluation data, so as to generate an enterprise credit rating optimization model.
According to the invention, through emotion dictionary construction, emotion information in enterprise association graphs can be identified, a basis is provided for subsequent emotion analysis, emotion marking data generation is beneficial to understanding emotion colors expressed by enterprises in investigation documents and data, and more comprehensive information is provided for credit rating. Through context weight distribution, meaning of emotion words can be more accurately understood, emotion analysis accuracy is improved, enterprise multisource semantic analysis data generation enables a system to be capable of deeply understanding enterprise data, including semantics and emotion information, and deeper analysis is provided for credit rating. Emotion recognition can help determine emotional trends of enterprises in different aspects, generation of associated data is helpful for understanding relationships among different data sources, and generated enterprise multi-source associated data is helpful for constructing a more comprehensive enterprise portrait, so that credit conditions of the enterprise are better understood. The extraction of the multidimensional features is helpful for describing enterprises more comprehensively, including finance, operation, emotion and the like, and the generated enterprise multisource feature data provides richer information for the credit rating model, so that the credit rating model has more discriminant. The model can learn the mode in the data through a supervised learning algorithm, so that the credit rating is predicted, the generated credit rating pre-model can be used for carrying out credit evaluation on new enterprises, and the efficiency and accuracy of rating are improved. Through cross verification and parameter tuning, the robustness and accuracy of the credit rating model can be improved, and the effectiveness and stability of the enterprise credit rating model in practical application are ensured through performance evaluation and tuning of the enterprise credit rating model.
As an example of the present invention, referring to fig. 2, the step S2 in this example includes:
step S21: carrying out emotion dictionary construction according to enterprise association graph data to generate an emotion dictionary; carrying out emotion marking on the target enterprise data and the due-job investigation document data based on the emotion dictionary, and generating emotion marking data;
according to the embodiment of the invention, the text data related to the enterprise association graph data, including enterprise reports, news articles, social media comments and the like, are collected to ensure that texts with different sources and types are covered, so that comprehensive emotion vocabularies are obtained. Words in the text are extracted through Natural Language Processing (NLP) technology, emotion polarities of each word, such as positive, negative or neutral, are manually or automatically marked by using positive, negative and neutral emotion labels possibly needed in an emotion dictionary construction process, and the strength of the emotion labels can be combined to describe emotion more finely. And adding emotion vocabularies specific to enterprises or industries by combining the enterprise associated graph data so as to improve the expertise of the model, and integrating the marked emotion vocabularies and emotion polarities thereof into an emotion dictionary. And collecting relevant data of the target enterprise, including financial reports, news reports, social media content and the like, and acquiring due-job investigation documents, which may include legal documents, contracts, research reports and the like, and performing text preprocessing on the target enterprise data and the due-job investigation documents, including word segmentation, stop word removal, word drying and the like. And carrying out emotion marking on the preprocessed text by using the constructed emotion dictionary, considering the frequency and the position of the vocabulary in the text to reflect emotion more accurately, and integrating the emotion marking result into a data set for subsequent analysis and modeling to generate emotion marking data.
Step S22: carrying out context weight distribution on the emotion marking data to generate vocabulary emotion weight data; semantic analysis is carried out on the enterprise multisource unified data by using vocabulary emotion weight data and emotion dictionary, and enterprise multisource semantic analysis data are generated;
in the embodiment of the invention, the context of each emotion vocabulary is known by carrying out context analysis on the emotion marking data. The context relationship is considered to determine the weight of the emotion word in a specific context, and each emotion word is assigned a weight according to the context analysis. The weights may be a function based on the location of the vocabulary in the text, word frequency, and other contextual characteristics. And integrating the context weight labeling result into emotion labeling data to form vocabulary emotion weight data. Enterprise multi-source data is collected, including financial data, social media content, news stories, and the like. The method ensures that the data sources are wide, so as to acquire comprehensive information, and performs text preprocessing on enterprise multi-source data, including word segmentation, stop word removal, word drying and the like, so as to prepare for semantic analysis. And integrating the emotion dictionary constructed before with vocabulary emotion weight data to form a comprehensive emotion dictionary. Each emotion word has a weight associated with it. And carrying out semantic analysis on the enterprise multisource data by utilizing the comprehensive emotion dictionary and the weight data. Conventional emotion analysis methods may be used, and deep learning models such as Recurrent Neural Networks (RNNs) or long short term memory networks (LSTM) may also be considered. Enterprise multi-source semantic analysis data is generated, including emotion polarity and weight information for each data point.
Step S23: carrying out emotion recognition on the enterprise multisource semantic analysis data to generate enterprise multisource emotion feature data; carrying out emotion association rejection processing on the enterprise multisource emotion feature data so as to generate enterprise multisource association data;
in the embodiment of the invention, by selecting a proper emotion classification model, a machine learning algorithm such as a Support Vector Machine (SVM), naive Bayes (Nave Bayes) and the like can be used, and a deep learning model such as a Convolutional Neural Network (CNN) or a long-short-term memory network (LSTM) and the like can be considered. The model should be selected taking into account the characteristics and size of the data. And training the model by using the marked enterprise multi-source semantic analysis data. The training set is ensured to contain samples of various emotion categories so as to improve the generalization capability of the model. The characteristics of the enterprise multi-source semantic analysis data are extracted, and text characteristics can be extracted by using a word bag model, TF-IDF and other methods. Using these features as inputs, an emotion classification model is trained. Training the emotion classification model by using training data, and performing model tuning to improve performance. And carrying out emotion recognition on the enterprise multisource semantic analysis data by using the trained model, and generating enterprise multisource emotion characteristic data. The feature data may include information about emotion classifications, probability scores, etc. for each data point. Defines which features are considered emotion associations. This may include words that are not related to emotion, stop words, industry-specific terms, and the like. Traversing enterprise multisource emotion feature data and eliminating emotion association items in the enterprise multisource emotion feature data. This may be done by setting a threshold, using a predefined list of association items, etc. And (3) cleaning the data, removing noise or abnormal data possibly causing misjudgment of the associated items, and taking the data subjected to the associated item elimination processing as enterprise multi-source associated data. These data can be used for subsequent analysis and decision making.
Step S24: carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data;
in the embodiment of the invention, the distribution, the statistical characteristics and the like of the data are known by carrying out preliminary Exploratory Data Analysis (EDA) on the enterprise multisource associated data. Ensuring data quality, handling missing values, outliers, etc., defining multi-dimensional features that need to be extracted, which may include text features, temporal features, numerical features, etc. Suitable features are selected according to the traffic demand and data characteristics. Feature extraction is performed on the text data, and methods such as a bag of words model, TF-IDF, word models (such as Word2Vec and GloVe), a text topic model (such as Latent Dirichlet Allocation and LDA) and the like can be used, and if the data contains time information, time features such as seasons, months, days of week and the like are extracted. The method is favorable for capturing the change trend of the data with time, and carrying out statistical feature extraction on the numerical data, wherein the statistical feature extraction comprises mean value, standard deviation, maximum value, minimum value and the like. This may provide information about the distribution and variability of values, extracting relevant domain features based on knowledge of the enterprise's specific domain. This may include product sales quantity, customer satisfaction index, etc. Features extracted from different dimensions are combined to form a dataset comprising multi-dimensional features. The unique identifier may be used to correlate to ensure that the data corresponds correctly.
Step S25: carrying out data division on the enterprise multisource characteristic data to generate a model training set and a model testing set; model training is carried out on the model training set through a supervised learning algorithm, and a credit rating pre-model is generated; performing model test on the credit rating pre-model according to the model test set, so as to generate an enterprise credit rating model;
in the embodiment of the invention, the enterprise multisource characteristic data is divided into the training set and the testing set. Typically, random sampling or time-sequential partitioning may be employed to ensure the representativeness of the data set. And selecting an appropriate supervised learning algorithm according to the problem type and the data characteristics. For credit rating problems, common algorithms include decision trees, random forests, support Vector Machines (SVMs), logistic regression, etc. The features in the training set are scaled and standardized to ensure consistent numerical ranges of different features to avoid excessive impact of certain features on model training. Model training is performed on the selected supervised learning algorithm using the training set. The model parameters are adjusted to obtain optimal performance according to a specific algorithm. The test set is subjected to the same feature processing as the training set, including scaling and normalization. The test set is processed using the scaling parameters obtained in the training set to ensure consistency. And predicting the test set by using the model obtained by training to obtain a credit rating prediction result. And selecting proper evaluation indexes such as accuracy, precision, recall rate, F1 score and the like, and determining the importance of the performance of the model according to the service requirements. And comparing the predicted result of the model with the actual label, and calculating a selected evaluation index. The performance of the model on the test set was evaluated. If the model performance does not meet the requirements, the adjustment of the model parameters, the trial and error of different algorithms or the feature engineering can be considered to further improve the model performance.
Step S26: performing model performance evaluation on the enterprise credit rating model by using a cross verification method to generate model performance evaluation data; and performing parameter tuning on the enterprise credit rating model according to the model performance evaluation data, so as to generate an enterprise credit rating optimization model.
In the embodiment of the invention, the common cross-validation method comprises k-fold cross-validation (k-fold cross-validation) and leave-one-out cross-validation (leave-one-out cross-validation). The selection of the appropriate method depends on the data size and computational resources, and the training set is further divided into a number of subsets (folds), one subset being the validation set and the others being used for model training. This process is repeated, ensuring that each subset acts as a validated set, on each fold, model training is performed using the remaining data, and then model performance is evaluated on the validated set. This process is repeated until all folds act as a validated set, on each fold, performance metrics of the model, such as accuracy, precision, recall, etc., are recorded. These metrics will be used to generate model performance assessment data. And integrating the performance indexes obtained on each fold to generate comprehensive model performance evaluation data. This may include the mean and standard deviation of each index in order to fully evaluate the performance of the model. The performance assessment data is analyzed using visualization tools (e.g., charts, graphs). This helps to identify differences in performance of the model across different folds and to find areas of relatively poor performance. And determining model parameters to be adjusted according to the performance evaluation data. Possible parameters include learning rate, tree depth, regularization parameters, etc. The best combination is searched in the parameter space using a grid search or a random search method. Grid searching attempts all possible combinations, while random searching randomly samples from a specified parameter distribution. And performing cross-validation on each parameter combination to obtain corresponding performance evaluation data. This helps determine which set of parameters perform best across the entire dataset. And re-training the enterprise credit rating model on the whole training set by using the optimal parameter combination found in the parameter tuning stage, and performing final performance evaluation on the optimized model by using the testing set to ensure that the optimized model performs well on unseen data.
Preferably, step S24 includes the steps of:
step S241: extracting financial data from the enterprise multisource associated data to obtain enterprise financial index data; calculating enterprise profit margin for the enterprise financial index data to generate enterprise profit margin assessment data; carrying out enterprise asset liability assessment on the enterprise financial index data through enterprise profit margin assessment data to generate enterprise asset liability assessment data; carrying out enterprise cash flow calculation on the enterprise profit margin assessment data through the enterprise asset liability assessment data to generate enterprise cash flow assessment data; carrying out dimension average processing on enterprise profit margin evaluation data, enterprise asset liability evaluation data and enterprise asset liability evaluation data to generate enterprise forward first fluctuation data;
step S242: carrying out investment and financing data extraction on the enterprise multisource associated data to obtain enterprise investment and financing index data; performing financing round calculation on enterprise financing index data to generate enterprise financing round data; carrying out investment amount evaluation on the enterprise investment and financing index data through the enterprise financing round data to generate enterprise investment amount evaluation data; performing dimension average processing on the enterprise financing round data and the enterprise investment amount evaluation data to generate enterprise forward second fluctuation data;
Step S243: carrying out data integration according to the enterprise forward first fluctuation data and the enterprise forward second fluctuation data to generate enterprise built-in fluctuation data; collecting enterprise external data through enterprise internal fluctuation data to generate enterprise external environment data; carrying out external dimension evaluation on the external environment data of the enterprise so as to generate external fluctuation data of the enterprise;
step S244: public opinion monitoring processing is carried out based on the enterprise built-in fluctuation data and the enterprise external fluctuation data, so that enterprise public opinion monitoring data are generated; performing enterprise star classification on the enterprise public opinion monitoring data by using an enterprise confidence analysis formula to generate enterprise confidence star class data;
step S245: comparing the enterprise confidence star level data with a preset star level threshold, and when the enterprise confidence star level data is greater than or equal to the star level threshold, integrating the enterprise built-in fluctuation data, the enterprise external fluctuation data and the enterprise public opinion monitoring data to generate enterprise multi-source characteristic data; and when the enterprise confidence star level data is smaller than the star level threshold, rejecting the enterprise corresponding to the enterprise multisource associated data.
The invention provides for the extraction of enterprise financial data, including financial index data, from multi-source correlation data. By calculating the profit margin of the enterprise, the profit margin of the enterprise can be known, and the enterprise asset liability assessment is performed by using the enterprise profit margin assessment data to provide financial robustness information about the enterprise. Further, the enterprise cash flow is calculated through the asset liability assessment data, a basis is provided for solving the enterprise cash flow condition, and dimension average processing is conducted on the enterprise profit margin assessment data, the enterprise asset liability assessment data and the enterprise cash flow assessment data, so that smooth data fluctuation is facilitated, data stability is improved, and enterprise forward first fluctuation data is formed. And extracting enterprise investment and financing data, including investment and financing indexes, from the multisource associated data. And (5) through calculating financing rounds and investment amount evaluation, the financing condition of the enterprise in different financing stages is known. And carrying out dimension average processing on the enterprise financing round data and the enterprise investment amount evaluation data, thereby being beneficial to forming enterprise forward second fluctuation data. By comprehensively analyzing multi-source data such as finance, investment and financing, external environment and the like, the operating condition and risk of an enterprise can be comprehensively estimated. The generated multi-source characteristic data contains information from different fields, and richer characteristics are provided for subsequent modeling. Through public opinion monitoring and confidence rating, the reputation and market response of an enterprise can be better understood. The star threshold is set, so that the whole process can automatically screen out enterprises with good performances, and efficiency and accuracy are improved.
As an example of the present invention, referring to fig. 3, the step S24 in this example includes:
step S241: extracting financial data from the enterprise multisource associated data to obtain enterprise financial index data; calculating enterprise profit margin for the enterprise financial index data to generate enterprise profit margin assessment data; carrying out enterprise asset liability assessment on the enterprise financial index data through enterprise profit margin assessment data to generate enterprise asset liability assessment data; carrying out enterprise cash flow calculation on the enterprise profit margin assessment data through the enterprise asset liability assessment data to generate enterprise cash flow assessment data; carrying out dimension average processing on enterprise profit margin evaluation data, enterprise asset liability evaluation data and enterprise asset liability evaluation data to generate enterprise forward first fluctuation data;
in the embodiment of the invention, the data related to finance, such as key indexes of income, cost, profit and the like, are extracted from the tidied data by collecting the enterprise multisource associated data, such as but not limited to financial statement, transaction data, cost data and the like, and a data extraction tool or script is used for automatically automating the process so as to improve the efficiency. The extracted financial data is used to calculate the profit margin of the business, typically the ratio of net profit to business income, to generate profit margin assessment data. Asset liability assessment is performed based on the enterprise financial index data, and may include various indexes of an asset liability statement, asset liability assessment data is generated. The cash flow calculation using the liability assessment data may involve cash flows for business activities, investment activities, and financing activities, generating cash flow assessment data. The enterprise profit margin assessment data, the liability assessment data and the cash flow assessment data are subjected to dimension average processing to obtain a comprehensive assessment result, which may include weighted average or other statistical processing modes of different dimension data. And taking the data after mean processing as forward first fluctuation data of the enterprise.
Step S242: carrying out investment and financing data extraction on the enterprise multisource associated data to obtain enterprise investment and financing index data; performing financing round calculation on enterprise financing index data to generate enterprise financing round data; carrying out investment amount evaluation on the enterprise investment and financing index data through the enterprise financing round data to generate enterprise investment amount evaluation data; performing dimension average processing on the enterprise financing round data and the enterprise investment amount evaluation data to generate enterprise forward second fluctuation data;
in the embodiment of the invention, the round of each financing is calculated based on the financing index data by collecting information related to financing in enterprise multisource associated data, such as the financing amount, the investor, the financing time and the like. The financing rounds are typically determined based on the time of occurrence of the investment, the amount of financing, and other relevant information. And generating financing round data comprising round numbers, round types and the like. And evaluating the investment amount of the enterprise investment and financing index data by using the financing round data. This may involve weighting or other evaluation methods of the financing amounts for different rounds, generating investment amount evaluation data, and performing dimensional average processing on the financing round data and the investment amount evaluation data, similar to the processing in step S241. This may include a weighted average or other statistical processing of the different dimensional data to generate enterprise forward second fluctuation data.
Step S243: carrying out data integration according to the enterprise forward first fluctuation data and the enterprise forward second fluctuation data to generate enterprise built-in fluctuation data; collecting enterprise external data through enterprise internal fluctuation data to generate enterprise external environment data; carrying out external dimension evaluation on the external environment data of the enterprise so as to generate external fluctuation data of the enterprise;
in the embodiment of the invention, the enterprise forward first fluctuation data and the enterprise forward second fluctuation data are integrated. This may include merging, connecting or otherwise integrating the two data to ensure proper data association, processing the integrated data, possibly including calculating fluctuations, trends or other relevant metrics to generate enterprise built-in fluctuation data, and determining the type and source of external environmental data to be collected using the enterprise built-in fluctuation data as a reference. This may include macro economic indicators, industry data, market trends, etc. Selecting an appropriate data source may involve accessing an external data provider, API interface, or other data collection channel. And collecting and sorting external data, ensuring the accuracy and consistency of the data, and correlating the collected external data with the built-in fluctuation data of the enterprise. This may be matched by way of shared key fields or time stamps, etc. And processing the correlated data, such as calculating indexes, trends and the like of the external environment data to generate external fluctuation data of the enterprise. Based on the enterprise external environment data, an assessment of external dimensions is made. This may include assessment of market conditions, competitive environments, regulatory changes, and the like. The result of the external dimension assessment is generated, which may be a quantitative indicator or a qualitative description. And integrating the enterprise built-in fluctuation data with the enterprise external fluctuation data to form a complete fluctuation data set.
Step S244: public opinion monitoring processing is carried out based on the enterprise built-in fluctuation data and the enterprise external fluctuation data, so that enterprise public opinion monitoring data are generated; performing enterprise star classification on the enterprise public opinion monitoring data by using an enterprise confidence analysis formula to generate enterprise confidence star class data;
in the embodiment of the invention, the public opinion data related to the enterprise is collected by utilizing the enterprise built-in fluctuation data and the enterprise external fluctuation data. This may include news, social media reviews, industry reports, etc., cleaning the raw public opinion data, removing noise and duplicate information. The text data is preprocessed, including word segmentation, stop word removal, word drying and the like. And generating enterprise public opinion monitoring data by using the processed public opinion data. This may include metrics such as public opinion score, trend analysis, emotion analysis, etc. An enterprise confidence analysis formula is formulated, which should be based on a plurality of indexes, such as public opinion quantity, source credibility, emotion analysis result, etc. The weights and calculation modes of the indexes are determined according to the specific requirements and industry characteristics of enterprises. And evaluating the enterprise public opinion monitoring data by applying a confidence analysis formula. This may include a weighted sum or other mathematical operation on each indicator to yield a comprehensive confidence score. And (5) formulating a star grade division standard according to the enterprise confidence score. For example, the confidence score may be divided into different intervals, each interval corresponding to a star level. And correlating the enterprise public opinion monitoring data with star level to generate enterprise confidence star level data.
Step S245: comparing the enterprise confidence star level data with a preset star level threshold, and when the enterprise confidence star level data is greater than or equal to the star level threshold, integrating the enterprise built-in fluctuation data, the enterprise external fluctuation data and the enterprise public opinion monitoring data to generate enterprise multi-source characteristic data; and when the enterprise confidence star level data is smaller than the star level threshold, rejecting the enterprise corresponding to the enterprise multisource associated data.
In the embodiment of the invention, the condition judgment is carried out by acquiring the confidence star level data of the enterprise and the preset star level threshold value, the magnitude relation between the enterprise confidence star level data and the star level threshold value is compared, and if the enterprise confidence star level data is greater than or equal to the star level threshold value: the enterprise built-in fluctuation data, the enterprise external fluctuation data and the enterprise public opinion monitoring data are integrated to generate enterprise multi-source characteristic data, which may include the operations of merging fields of different data sources, data normalization, statistics index generation and the like. If the enterprise confidence star level data is less than the star level threshold: and eliminating the corresponding enterprises from the multi-source associated data. This may involve deleting records of the relevant business from the dataset, and integrating and culling the records for subsequent review and traceability. For example, record which businesses are integrated and which are culled.
Preferably, step S244 includes the steps of:
step S2441: first public opinion monitoring is conducted on the built-in fluctuation data of the enterprise, and first public opinion data are generated; performing second public opinion monitoring on the external fluctuation data of the enterprise to generate second public opinion data, wherein the public opinion monitoring comprises positive public opinion monitoring and negative public opinion monitoring;
step S2442: front public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and front first public opinion data and front second public opinion data are generated; negative public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and negative first public opinion data and negative second public opinion data are generated;
step S2443: carrying out mean value weight distribution on the negative first public opinion data and the positive first public opinion data to generate first public opinion weight data; carrying out mean value weight distribution on the negative second public opinion data and the positive second public opinion data to generate second public opinion weight data; the first public opinion weight data and the second public opinion weight data are subjected to weight comparison, and when the first public opinion weight data is larger than the second public opinion weight data, the first public opinion weight data is marked as main dimension public opinion data, and the second public opinion weight data is marked as secondary dimension public opinion data; when the first public opinion weight data is smaller than the second public opinion weight data, marking the second public opinion weight data as main dimension public opinion data, and marking the first public opinion weight data as secondary dimension public opinion data;
Step S2444: carrying out enterprise confidence calculation on the main dimension public opinion data and the secondary dimension public opinion data by using an enterprise confidence analysis formula to generate enterprise confidence evaluation data; and performing star classification on the enterprise confidence evaluation data through a data visualization method to generate enterprise confidence star-level data.
The invention covers positive and negative public opinion through the first public opinion monitoring and the second public opinion monitoring including the monitoring of the internal fluctuation data and the external fluctuation data. This ensures a comprehensive understanding of the public opinion of the enterprise, helps to discover and cope with potential public opinion risks in time, and positive and negative public opinion data extraction helps to refine the understanding of public opinion. By classifying data into positive and negative, the enterprise's image and reputation in the public and market can be more specifically analyzed. The public opinion of the main dimension and the secondary dimension can be determined by carrying out average weight distribution and weight comparison on the positive public opinion data and the negative public opinion data. This helps focus on the public opinion aspects that have the greatest impact on the enterprise image, providing a more targeted analysis. And calculating the primary dimension and the secondary dimension public opinion data by using an enterprise confidence analysis formula to generate enterprise confidence evaluation data. The public opinion condition of enterprises can be quantified, and objective indexes are provided for decision making. The data visualization method performs star classification on enterprise confidence evaluation data, so that information is easier to understand and convey. Through visualization, the whole condition of the enterprise can be quickly grasped, and a decision maker is helped to make decisions more quickly. The real-time nature of the process enables the enterprise to discover and deal with public opinion that may have an impact on reputation and image in time. This helps take precautions and repair measures before the problem is amplified.
As an example of the present invention, referring to fig. 4, the step S4 includes, in this example:
step S2441: first public opinion monitoring is conducted on the built-in fluctuation data of the enterprise, and first public opinion data are generated; performing second public opinion monitoring on the external fluctuation data of the enterprise to generate second public opinion data, wherein the public opinion monitoring comprises positive public opinion monitoring and negative public opinion monitoring;
in the embodiment of the invention, by collecting the data related to the inside of the enterprise, the data may comprise social media comments, employee feedback, internal reports and the like. These data sources may be obtained by way of APIs, crawlers, etc. Emotion analysis is performed by natural language processing techniques to classify built-in wave data as positive, negative or neutral. This helps determine the overall emotional tendency of the internal public opinion. External information related to the business, such as news stories, social media reviews, forum discussions, etc., is monitored using web crawlers or specialized public opinion monitoring tools. The collected information is filtered and filtered to ensure that only enterprise-related content is retained, excluding irrelevant or misleading information. Keywords related to the front public opinion are set to identify the front information. These keywords may include company name, product name, active industry terminology, etc. And screening out data containing positive emotion in the monitoring process, so as to ensure that only information beneficial to enterprises is reserved. Setting keywords related to negative public opinion, including company names, product names, negative industry terms and the like, and screening out data containing negative emotion in the monitoring process so as to ensure that the negative public opinion can be timely identified and processed. And integrating the data subjected to emotion analysis and keyword filtering to generate first public opinion data (built-in fluctuation data) and second public opinion data (external fluctuation data).
Step S2442: front public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and front first public opinion data and front second public opinion data are generated; negative public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and negative first public opinion data and negative second public opinion data are generated;
in the embodiment of the invention, the first public opinion data and the second public opinion data are processed by using the emotion analysis technology to identify and distinguish positive emotion contents, and a machine learning model, such as a neural network based on deep learning or a traditional rule-based method, can be adopted to carry out emotion classification on the text. Keywords or features related to positive emotion, such as positive vocabulary, praise, support and the like, are set for filtering and extracting positive public opinion data. And marking the data identified as the positive emotion as positive first public opinion data and positive second public opinion data, and respectively corresponding to positive contents in the internal fluctuation data and the external fluctuation data. And identifying and marking negative emotion contents according to the first public opinion data and the second public opinion data by using an emotion analysis technology, and classifying and extracting the negative emotion texts by means of a machine learning model. And setting key words or characteristics related to negative emotion, such as negative vocabulary, criticism, complaints and the like, for filtering and extracting negative public opinion data. The data identified as negative emotion is marked as negative first public opinion data and negative second public opinion data, corresponding to negative content in the internal and external fluctuation data, respectively.
Step S2443: carrying out mean value weight distribution on the negative first public opinion data and the positive first public opinion data to generate first public opinion weight data; carrying out mean value weight distribution on the negative second public opinion data and the positive second public opinion data to generate second public opinion weight data; the first public opinion weight data and the second public opinion weight data are subjected to weight comparison, and when the first public opinion weight data is larger than the second public opinion weight data, the first public opinion weight data is marked as main dimension public opinion data, and the second public opinion weight data is marked as secondary dimension public opinion data; when the first public opinion weight data is smaller than the second public opinion weight data, marking the second public opinion weight data as main dimension public opinion data, and marking the first public opinion weight data as secondary dimension public opinion data;
in the embodiment of the invention, the first public opinion weight data is calculated by carrying out weighted average or simple average on the negative first public opinion data and the positive first public opinion data, and the second public opinion weight data is generated by carrying out weighted average or simple average on the negative second public opinion data and the positive second public opinion data. Comparing the first public opinion weight data with the second public opinion weight data to determine which weight data is larger, marking the first public opinion weight data as main dimension public opinion data if the first public opinion weight data is larger than the second public opinion weight data, marking the second public opinion weight data as secondary dimension public opinion data if the first public opinion weight data is smaller than the second public opinion weight data, marking the second public opinion weight data as main dimension public opinion data, and marking the first public opinion weight data as secondary dimension public opinion data.
Step S2444: carrying out enterprise confidence calculation on the main dimension public opinion data and the secondary dimension public opinion data by using an enterprise confidence analysis formula to generate enterprise confidence evaluation data; and performing star classification on the enterprise confidence evaluation data through a data visualization method to generate enterprise confidence star-level data.
In embodiments of the present invention, some weighted combinations of metrics or other mathematical formulas may be involved by this. For example, factors such as public opinion weight data, historical enterprise performance, market share and the like can be considered, and main dimension public opinion data and secondary dimension public opinion data are brought into an enterprise confidence analysis formula to be calculated, so that enterprise confidence evaluation data are obtained. Using tools such as Matplotlib, seaband (Python library), tableau, power BI, etc., appropriate visualization tools are selected based on data distribution and demand, and enterprise confidence rating data is visualized in the form of histograms, pie charts, radar charts, etc., to better understand the distribution and characteristics of the data. Star rating criteria are defined based on a range of values of enterprise confidence. For example, the confidence level may be divided into five star levels: one star, two stars, three stars, four stars and five stars. The enterprise confidence data for each enterprise is mapped to a corresponding star level and enterprise confidence star level data is generated.
Preferably, the enterprise confidence analysis formula in step S2444 is specifically as follows:
wherein C is expressed as enterprise confidence rating data, N is expressed as total number of public opinion data samples, T is expressed as observation time period of public opinion data, f (T) is expressed as weight value of main dimension public opinion data at time point T, g (T) is expressed as weight value of secondary dimension public opinion data at time point T, h (T) is expressed as standard confidence weight value at time point (T), w 1 Source judgment coefficient expressed as main dimension public opinion data, w 2 The source judgment coefficient is expressed as secondary dimension public opinion data, and mu is expressed as enterprise confidence analysis anomaly adjustment value.
The invention constructs an enterprise confidence analysis formula, wherein the number of public opinion data samples is considered in the formula. A larger N may improve the accuracy and statistical reliability of the assessment. Time frame of the public opinion data considered. Longer time periods may more fully reflect the confidence level of the business. And the weight value of the main dimension public opinion data is used for measuring the importance of the main dimension public opinion data in the evaluation. The main dimension public opinion data may include key events, public opinion guides, etc. The main dimension public opinion data can be flexibly weighted by adjusting the weight of f (t). The weight value of the secondary dimension public opinion data at the time point t is used for measuring the importance of the secondary dimension public opinion data in evaluation. The secondary dimension public opinion data may include detail information, user comments, etc. The weighting processing of the secondary dimension public opinion data can be flexibly carried out by adjusting the weight of g (t). And a standard confidence weight value at the time point t is used for measuring the confidence level of the public opinion data. The standard confidence weight value can be set according to specific conditions, and is adjusted according to factors such as sources of public opinion data, credibility and the like so as to improve the accuracy of evaluation. The source judgment coefficient of the main dimension public opinion data is used for considering the reliability of the source of the main dimension public opinion data. By adjusting w 1 The degree of confidence of the main dimension public opinion data can be weighed. Based on the total number of public opinion data samples and the above parametersThe interrelationship between them constitutes a functional relationship:
the formula comprehensively considers factors such as main dimension and secondary dimension public opinion data, data weight, confidence coefficient weight, abnormal adjustment value and the like, and the enterprise confidence coefficient evaluation data obtained through calculation can help to evaluate the confidence coefficient level of an enterprise. Through reasonably setting the values of all parameters, the public opinion data of different types can be weighted according to specific conditions, and the accuracy and the practicability of the evaluation result are improved. The anomaly adjustment value μ is analyzed by enterprise confidence for correction of errors and deviations due to the complexity and non-idealities of the actual system. The method can correct the difference between theoretical assumption and an actual system in the formula, improves the accuracy and reliability of enterprise confidence analysis, generates enterprise confidence evaluation data C more accurately, and simultaneously, the parameters such as the weight value of main dimension public opinion data at a time point u, the standard confidence weight value at a time point (t) and the like in the formula can be adjusted according to actual conditions, so that the method is suitable for different enterprise confidence analysis scenes, and improves the applicability and flexibility of the algorithm. When the enterprise confidence analysis formula conventional in the art is used, enterprise confidence evaluation data can be obtained, and the enterprise confidence evaluation data can be calculated more accurately by applying the enterprise confidence analysis formula provided by the invention.
Preferably, step S3 comprises the steps of:
step S31: importing enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated, wherein the prediction model result summarization data set comprises enterprise credit rating data, key index data and enterprise derivative data;
step S32: carrying out credit rating key index extraction on the enterprise credit rating data by utilizing the key index data to generate enterprise key index credit rating data; performing text description on the enterprise key index credit rating data and the enterprise derivative data through a natural language technology to generate a text description data set;
step S33: performing chart visualization on the text description data set to generate a data visualization chart; integrating the text description dataset and the data visualization chart to obtain a first text-credit rating association option;
step S34: performing intelligent AI scoring on the first text-credit rating associated options by using the enterprise credit rating optimization model to generate first text-credit associated option scoring data; manually scoring based on the first text-credit associated option scoring data to generate second text-credit associated option scoring data;
Step S35: performing score difference comparison on the first text-credit associated option score data and the second text-credit associated option score data to generate associated score difference data; the association score difference data carries out association score correction on the first text-credit association option score data so as to generate text-credit association option comprehensive score data; and carrying out repayment capability assessment on the text-credit associated option comprehensive scoring data through an enterprise repayment capability assessment formula to generate enterprise repayment capability assessment data.
According to the invention, the enterprise characteristic data from multiple sources is imported into the enterprise credit rating optimization model, and the prediction result summarized data set output by the model comprises enterprise credit rating data, key index data and enterprise derivative data. Comprehensive enterprise credit rating information is provided, data of all aspects are integrated, and the enterprise credit status can be estimated more comprehensively and accurately. The key indexes are extracted from the enterprise credit rating data, so that the key indexes are helpful for understanding main influencing factors of enterprise credit rating, and the key index data and the enterprise derivative data are subjected to text description by using natural language technology, so that deeper understanding of the rating factors is provided. Intuitive and interpretable key indicators are provided, while understandability and user friendliness are enhanced by textual descriptions. And performing chart visualization on the text description data set, integrating the text description and the data visualization chart to form a first text-credit rating association option, and enhancing the information transmission effect through chart visualization so that the credit rating information related to the text is clearer and more understandable. And the enterprise credit rating optimization model is utilized to carry out intelligent AI scoring on the first text-credit rating associated options, manual scoring is carried out based on the AI scoring, second text-credit associated option scoring data is generated, intelligent AI and manual scoring are combined, more comprehensive evaluation is provided, model errors are reduced, and credibility is increased. And comparing the AI score with the manual score to generate associated score difference data, correcting the score of the first text-credit associated option according to the score difference data, and generating text-credit associated option comprehensive score data. And carrying out repayment capability assessment on the comprehensive scoring data through an enterprise repayment capability assessment formula to generate enterprise repayment capability assessment data. By comparing and correcting the grading difference, the accuracy and reliability of the assessment are improved, and the assessment of the repayment capability of the enterprise is added.
In the embodiment of the invention, the enterprise credit rating optimization model is imported into the enterprise multi-source characteristic data, the enterprise credit rating prediction is carried out by using the model, and a prediction model result summarization data set is generated, wherein the prediction model result summarization data set comprises enterprise credit rating data, key index data and enterprise derivative data. And extracting key indexes from the enterprise credit rating data by utilizing the key index data, converting the enterprise key index credit rating data and the enterprise derivative data into text descriptions by utilizing a natural language technology, and generating a text description data set. Performing diagramming processing on the text description data set to generate a data visualization chart, integrating the text description data set and the data visualization chart to form a first text-credit rating associated option, performing intelligent AI scoring on the first text-credit rating associated option by using an enterprise credit rating optimization model, performing manual scoring based on an intelligent scoring result, and generating second text-credit associated option scoring data. And comparing the first text-credit associated option scoring data with the second text-credit associated option scoring data to generate associated scoring difference data, and carrying out associated scoring correction on the first text-credit associated option scoring data by using the associated scoring difference data to generate text-credit associated option comprehensive scoring data. And carrying out repayment capability assessment on the text-credit associated option comprehensive scoring data by using an enterprise repayment capability assessment formula to generate enterprise repayment capability assessment data.
Preferably, the enterprise repayment capability evaluation formula in step S35 is specifically as follows:
where D is expressed as an overall repayment capability score for the business, Q is expressed as a total value of the business, r is expressed as a non-mobile liability total of the business, k is expressed as a mobile liability total of the business, z is expressed as a current loan interest rate of the business, x is expressed as a sales increase rate of the business, b 1 Expressed as the initial time of cash flow into the integral, b 2 Expressed as the end time of the cash flow credit, v expressed as the total amount of mobile liabilities of the business, m expressed as the business's time period b 1 To b 2 The total amount of cash inflow in the system, epsilon, is expressed as a coefficient for adjusting the cash inflow to be inflated, and sigma is expressed as an abnormal correction amount for evaluating the repayment ability of the enterprise.
The invention constructs an enterprise repayment capability assessment formula, wherein the integral repayment capability score of the enterprise in the formula is an output result of the method and is used for reflecting the repayment capability level of the enterprise. The total asset value of an enterprise, i.e., the total amount of all assets of the enterprise. A larger "Q" value generally means that the business has a greater repayment capability. The total amount of non-mobile liabilities of the corporation, including long term borrowing, long term payable, etc. A smaller value of r indicates that the non-mobile liability of the corporation is relatively low, facilitating an increase in payability scores. The mobile liability total of the corporation, including short term borrowing, accounts payables, etc. A smaller value of k indicates a relatively lower mobile liability for the business, which is advantageous for an improved repayment capability score. A functional relationship is formed according to the correlation between the total asset value of the enterprise and the parameters:
This formula comprehensively considers the current loan interest rate of the enterprise, namely the loan cost faced by the enterprise. Lower z values are beneficial to reduce financial burden on the business and improve repayment capability scores. The sales growth rate of the enterprise reflects the growth rate of sales performance of the enterprise. A higher value of x indicates a faster increase in sales for the business, which is beneficial for an increase in repayment capability scores. The time range of the cash inflow integration, i.e., the observation period of the cash inflow. By adjusting b 1 And b 2 The effect of cash inflow over different time periods on the repayment capacity may be considered. The total mobile liability of the enterprise is used for calculating the expansion adjustment of cash inflow. A smaller v value indicates a relatively lower mobile liability for the business, facilitating an increase in payability scores. During time period b 1 To b 2 The total amount of cash inflow in the enterprise, namely the cash inflow condition of the enterprise in a specific time range. A larger value of m indicates that the business has a stronger cash inflow capacity, facilitating an increase in repayment capacity scores. The anomaly correction amount sigma is evaluated by the corporate repayment capability for correcting errors and deviations due to the complexity and non-idealities of the actual system. The method can correct the difference between theoretical assumption in the formula and an actual system, improves the accuracy and reliability of enterprise repayment capability assessment, generates the integral repayment capability score D of the enterprise more accurately, and simultaneously adjusts the current loan interest rate of the enterprise, the total mobile liability of the enterprise and other parameters in the formula according to the actual situation, thereby adapting to different enterprise repayment capability assessment scenes and improving the applicability and flexibility of the algorithm. When the enterprise repayment capability assessment formula conventional in the art is used, the overall repayment capability score of the enterprise can be obtained, and the enterprise repayment capability score can be calculated more accurately by applying the enterprise repayment capability assessment formula provided by the invention. The formula comprehensively considers the liability condition, loan interest rate, sales growth rate and cash inflow time distribution condition of the enterprise. The calculated repayment capability score may assist in assessing the repayment status of the corporation under certain conditions. By reasonably setting the values of the parameters, the debt of enterprises can be paid according to specific conditions The capability is comprehensively evaluated to support decision making and risk management.
Preferably, step S6 comprises the steps of:
step S41: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; monitoring the real-time repayment capability evaluation data set in real time to generate a real-time analysis result data set;
step S42: performing event classification based on the real-time analysis result data set to generate an event identification classification data set; updating the enterprise repayment capability evaluation data in real time through the event identification classification data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
The invention ensures timeliness and accuracy of data by collecting real-time data of the enterprise repayment capability assessment data, monitors the real-time repayment capability assessment data set in real time, and possibly monitors financial indexes, market changes and the like of the enterprise to acquire latest enterprise repayment capability information and generate a real-time analysis result data set which comprises analysis results of the real-time repayment capability assessment data. Based on the real-time analysis result data set, event classification is carried out, different types of events are related to possible influences of the events on credit rating of enterprises, an event identification classification data set is generated, classification information of the different events is contained, the event identification classification data set is utilized to update the enterprise repayment capability assessment data in real time, dynamic adjustment of assessment model parameters or other updating strategies can be involved, and an enterprise credit assessment updating data set is generated and contains the enterprise credit rating data after real-time updating. And carrying out data merging on the enterprise credit evaluation updating data set, integrating the real-time updated information with the previous evaluation data to finally obtain an enterprise intelligent credit rating report, wherein the report possibly comprises the latest credit rating, the dynamic change of key indexes, the influence of events and the like, and ensuring that the enterprise credit rating information is latest through real-time data collection and monitoring, thereby being beneficial to timely making decisions in a changed market environment. Through event classification and real-time updating, the system can flexibly cope with the influence of different events on credit rating, so that the evaluation is more accurate, and the intelligent credit rating report integrates real-time data and historical evaluation data, thereby providing comprehensive enterprise credit conditions for users.
In the embodiment of the invention, by determining which data sources to collect enterprise repayment capability assessment data, which may include financial statements, transaction records, supply chain data and the like, data is collected in real time by using appropriate tools and technologies, which may include API calls, crawler technologies, real-time database connections and the like, the collected real-time data is cleaned and preprocessed, the accuracy and consistency of the data are ensured, and a real-time monitoring system is deployed for monitoring the change of the repayment capability assessment data set in real time. This may be monitored by setting thresholds, rules, or using machine learning algorithms. On the basis of real-time monitoring, a real-time analysis result data set is generated, and the repayment capability status of the current enterprise is recorded. An event classification model is established, the model can classify different events according to the real-time analysis result data set, such as financial dilemma, market change and the like, the event classification model is used for analyzing the real-time analysis result data set to generate an event identification classification data set, occurrence of different events is recorded, the enterprise debt repayment capability assessment data is updated based on the event identification classification data set, and the assessment data is ensured to reflect the latest situation. Combining the enterprise credit evaluation data set updated in real time with the historical evaluation data set to form a complete data set, establishing a credit rating model, and generating a final credit rating by adopting a machine learning algorithm or other statistical methods, and evaluating the combined data set by utilizing the credit rating model to generate an enterprise intelligent credit rating report. The report may include information on enterprise credit ratings, key risk factors, event impact, etc. And outputting the generated intelligent credit rating report to the user in the modes of interface display, email sending and the like.
In this specification, there is provided an intelligent credit rating report construction system for performing the above-described method of intelligent credit rating report construction, the intelligent credit rating report construction system comprising:
the semantic mining module is used for acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
the dimension feature extraction module is used for carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
the text association module is used for importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
The real-time updating module is used for carrying out real-time data collection on the enterprise repayment capability assessment data so as to generate a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
The method has the beneficial effects that the multi-source unified data is formed by integrating a plurality of data sources including target enterprise data and due-job investigation document data and integrating and preprocessing the data, so that the conditions and characteristics of the enterprise are comprehensively and comprehensively known. Through semantic association mining, enterprise association graph data are generated, so that the association relation among different data items can be understood, and the meaning and the mutual association behind the data can be better grasped. By eliminating emotion associated items, the data quality can be improved, the negative influence on the credit rating is reduced, and the credit rating is more objective and accurate. Multidimensional features are extracted from the multisource associated data, a credit rating optimization model is trained based on the features, and modeling capacity and prediction accuracy of enterprise credit ratings are enhanced. The enterprise repayment capability evaluation data are collected, updated and classified in real time, so that the change of the enterprise condition can be captured in time, and timeliness and accuracy of the credit rating report are ensured. And combining the enterprise credit evaluation data set updated in real time to generate an intelligent credit rating report, wherein the intelligent credit rating report presents the rating result of the multiparty data comprehensive consideration to the user, and provides more comprehensive and accurate enterprise credit evaluation information. And the repayment capability assessment is carried out through the text-credit related option comprehensive scoring data, so that the grading result is more comprehensive, and the grading result not only depends on numerical data, but also comprises comprehensive analysis of text description. Therefore, the invention improves the accuracy and the comprehensiveness of credit rating by integrating the characteristics of multi-source data, multi-dimensional analysis and dynamic updating.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An intelligent credit rating report construction method is characterized by comprising the following steps:
step S1: acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
Step S2: carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
step S3: importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
step S4: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
2. The method for constructing an intelligent credit rating report according to claim 1, wherein the step S1 comprises the steps of:
step S11: acquiring target enterprise data by using a distributed crawler technology; compiling investigation documents according to the target enterprise data, so as to obtain due-job investigation document data;
step S12: integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; performing data cleaning on the enterprise multi-source heterogeneous data set to generate enterprise multi-source cleaning data; filling the data missing values of the enterprise multi-source cleaning data to generate enterprise multi-source filling data;
step S13: carrying out data standardization on the enterprise multi-source filling data by a maximum-minimum standardization method to generate standard enterprise multi-source data;
step S14: carrying out data structuring unified processing on the standard enterprise multi-source data to generate enterprise multi-source unified data; and carrying out semantic association mining on the enterprise multi-source unified data by using a natural language technology, and generating enterprise association graph data.
3. The method for constructing an intelligent credit rating report according to claim 2, wherein the step S2 comprises the steps of:
Step S21: carrying out emotion dictionary construction according to enterprise association graph data to generate an emotion dictionary; carrying out emotion marking on the target enterprise data and the due-job investigation document data based on the emotion dictionary, and generating emotion marking data;
step S22: carrying out context weight distribution on the emotion marking data to generate vocabulary emotion weight data; semantic analysis is carried out on the enterprise multisource unified data by using vocabulary emotion weight data and emotion dictionary, and enterprise multisource semantic analysis data are generated;
step S23: carrying out emotion recognition on the enterprise multisource semantic analysis data to generate enterprise multisource emotion feature data; carrying out emotion association rejection processing on the enterprise multisource emotion feature data so as to generate enterprise multisource association data;
step S24: carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data;
step S25: carrying out data division on the enterprise multisource characteristic data to generate a model training set and a model testing set; model training is carried out on the model training set through a supervised learning algorithm, and a credit rating pre-model is generated; performing model test on the credit rating pre-model according to the model test set, so as to generate an enterprise credit rating model;
Step S26: performing model performance evaluation on the enterprise credit rating model by using a cross verification method to generate model performance evaluation data; and performing parameter tuning on the enterprise credit rating model according to the model performance evaluation data, so as to generate an enterprise credit rating optimization model.
4. The intelligent credit rating report construction method according to claim 3, wherein the step S24 comprises the steps of:
step S241: extracting financial data from the enterprise multisource associated data to obtain enterprise financial index data; calculating enterprise profit margin for the enterprise financial index data to generate enterprise profit margin assessment data; carrying out enterprise asset liability assessment on the enterprise financial index data through enterprise profit margin assessment data to generate enterprise asset liability assessment data; carrying out enterprise cash flow calculation on the enterprise profit margin assessment data through the enterprise asset liability assessment data to generate enterprise cash flow assessment data; carrying out dimension average processing on enterprise profit margin evaluation data, enterprise asset liability evaluation data and enterprise asset liability evaluation data to generate enterprise forward first fluctuation data;
step S242: carrying out investment and financing data extraction on the enterprise multisource associated data to obtain enterprise investment and financing index data; performing financing round calculation on enterprise financing index data to generate enterprise financing round data; carrying out investment amount evaluation on the enterprise investment and financing index data through the enterprise financing round data to generate enterprise investment amount evaluation data; performing dimension average processing on the enterprise financing round data and the enterprise investment amount evaluation data to generate enterprise forward second fluctuation data;
Step S243: carrying out data integration according to the enterprise forward first fluctuation data and the enterprise forward second fluctuation data to generate enterprise built-in fluctuation data; collecting enterprise external data through enterprise internal fluctuation data to generate enterprise external environment data; carrying out external dimension evaluation on the external environment data of the enterprise so as to generate external fluctuation data of the enterprise;
step S244: public opinion monitoring processing is carried out based on the enterprise built-in fluctuation data and the enterprise external fluctuation data, so that enterprise public opinion monitoring data are generated; performing enterprise star classification on the enterprise public opinion monitoring data by using an enterprise confidence analysis formula to generate enterprise confidence star class data;
step S245: comparing the enterprise confidence star level data with a preset star level threshold, and when the enterprise confidence star level data is greater than or equal to the star level threshold, integrating the enterprise built-in fluctuation data, the enterprise external fluctuation data and the enterprise public opinion monitoring data to generate enterprise multi-source characteristic data; and when the enterprise confidence star level data is smaller than the star level threshold, rejecting the enterprise corresponding to the enterprise multisource associated data.
5. The intelligent credit rating report construction method according to claim 4, wherein the step S244 includes the steps of:
Step S2441: first public opinion monitoring is conducted on the built-in fluctuation data of the enterprise, and first public opinion data are generated; performing second public opinion monitoring on the external fluctuation data of the enterprise to generate second public opinion data, wherein the public opinion monitoring comprises positive public opinion monitoring and negative public opinion monitoring;
step S2442: front public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and front first public opinion data and front second public opinion data are generated; negative public opinion data extraction is carried out on the first public opinion data and the second public opinion data, and negative first public opinion data and negative second public opinion data are generated;
step S2443: carrying out mean value weight distribution on the negative first public opinion data and the positive first public opinion data to generate first public opinion weight data; carrying out mean value weight distribution on the negative second public opinion data and the positive second public opinion data to generate second public opinion weight data; the first public opinion weight data and the second public opinion weight data are subjected to weight comparison, and when the first public opinion weight data is larger than the second public opinion weight data, the first public opinion weight data is marked as main dimension public opinion data, and the second public opinion weight data is marked as secondary dimension public opinion data; when the first public opinion weight data is smaller than the second public opinion weight data, marking the second public opinion weight data as main dimension public opinion data, and marking the first public opinion weight data as secondary dimension public opinion data;
Step S2444: carrying out enterprise confidence calculation on the main dimension public opinion data and the secondary dimension public opinion data by using an enterprise confidence analysis formula to generate enterprise confidence evaluation data; and performing star classification on the enterprise confidence evaluation data through a data visualization method to generate enterprise confidence star-level data.
6. The intelligent credit rating report construction method according to claim 4, wherein the enterprise confidence analysis formula in step S2444 is as follows:
wherein C is expressed as enterprise confidence rating data, N is expressed as total number of public opinion data samples, T is expressed as observation time period of public opinion data, f (T) is expressed as weight value of main dimension public opinion data at time point T, g (T) is expressed as weight value of secondary dimension public opinion data at time point T, h (T) is expressed as standard confidence weight value at time point (T), w 1 Source judgment coefficient expressed as main dimension public opinion data, w 2 The source judgment coefficient is expressed as secondary dimension public opinion data, and mu is expressed as enterprise confidence analysis anomaly adjustment value.
7. The intelligent credit rating report construction method according to claim 6, wherein the step S3 comprises the steps of:
Step S31: importing enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated, wherein the prediction model result summarization data set comprises enterprise credit rating data, key index data and enterprise derivative data;
step S32: carrying out credit rating key index extraction on the enterprise credit rating data by utilizing the key index data to generate enterprise key index credit rating data; performing text description on the enterprise key index credit rating data and the enterprise derivative data through a natural language technology to generate a text description data set;
step S33: performing chart visualization on the text description data set to generate a data visualization chart; integrating the text description dataset and the data visualization chart to obtain a first text-credit rating association option;
step S34: performing intelligent AI scoring on the first text-credit rating associated options by using the enterprise credit rating optimization model to generate first text-credit associated option scoring data; manually scoring based on the first text-credit associated option scoring data to generate second text-credit associated option scoring data;
Step S35: performing score difference comparison on the first text-credit associated option score data and the second text-credit associated option score data to generate associated score difference data; the association score difference data carries out association score correction on the first text-credit association option score data so as to generate text-credit association option comprehensive score data; and carrying out repayment capability assessment on the text-credit associated option comprehensive scoring data through an enterprise repayment capability assessment formula to generate enterprise repayment capability assessment data.
8. The intelligent credit rating report construction method according to claim 7, wherein the enterprise repayment capability assessment formula in step S35 is as follows:
where D is expressed as an overall repayment capability score for the business, Q is expressed as a total asset value for the business, and r is expressed asFor the total amount of non-mobile liabilities of the enterprises, k is denoted as the total amount of mobile liabilities of the enterprises, z is denoted as the current loan interest rate of the enterprises, x is denoted as the sales increase rate of the enterprises, b 1 Expressed as the initial time of cash flow into the integral, b 2 Expressed as the end time of the cash flow credit, v expressed as the total amount of mobile liabilities of the business, m expressed as the business's time period b 1 To b 2 The total amount of cash inflow in the system, epsilon, is expressed as a coefficient for adjusting the cash inflow to be inflated, and sigma is expressed as an abnormal correction amount for evaluating the repayment ability of the enterprise.
9. The intelligent credit rating report construction method according to claim 7, wherein the step S6 includes the steps of:
step S41: collecting real-time data of the enterprise repayment capability assessment data, thereby generating a real-time repayment capability assessment data set; monitoring the real-time repayment capability evaluation data set in real time to generate a real-time analysis result data set;
step S42: performing event classification based on the real-time analysis result data set to generate an event identification classification data set; updating the enterprise repayment capability evaluation data in real time through the event identification classification data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
10. An intelligent credit rating report construction system for performing an intelligent credit rating report construction method as claimed in claim 1, the intelligent credit rating report construction system comprising:
The semantic mining module is used for acquiring target enterprise data and due-job investigation document data; integrating the due-job investigation document data and the target enterprise data to generate an enterprise multi-source heterogeneous data set; carrying out data preprocessing on the enterprise multi-source heterogeneous data set to generate enterprise multi-source unified data; semantic association mining is carried out on the enterprise multi-source unified data, and enterprise association graph data are generated;
the dimension feature extraction module is used for carrying out emotion association rejection processing on the enterprise multisource unified data according to the enterprise association graph data so as to generate enterprise multisource association data; carrying out multidimensional feature extraction on the enterprise multisource associated data to generate enterprise multisource feature data; model training is carried out on the enterprise multisource characteristic data, and an enterprise credit rating optimization model is generated;
the text association module is used for importing the enterprise multisource characteristic data into an enterprise credit rating optimization model to conduct enterprise credit rating prediction, so that a prediction model result summarization data set is generated; performing text description on the prediction model result summarization dataset to generate a first text-credit rating association option; performing association score correction on the first text-credit rating association options to generate text-credit association option comprehensive score data; performing repayment capability assessment on the text-credit associated option comprehensive scoring data to generate enterprise repayment capability assessment data;
The real-time updating module is used for carrying out real-time data collection on the enterprise repayment capability assessment data so as to generate a real-time repayment capability assessment data set; carrying out real-time event update classification on the real-time debt repayment capability evaluation data set to generate an enterprise credit evaluation update data set; and carrying out data merging on the enterprise credit evaluation updating data set so as to obtain an enterprise intelligent credit rating report.
CN202410012492.XA 2024-01-02 2024-01-02 Intelligent credit rating report construction method and system Pending CN117764724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410012492.XA CN117764724A (en) 2024-01-02 2024-01-02 Intelligent credit rating report construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410012492.XA CN117764724A (en) 2024-01-02 2024-01-02 Intelligent credit rating report construction method and system

Publications (1)

Publication Number Publication Date
CN117764724A true CN117764724A (en) 2024-03-26

Family

ID=90325887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410012492.XA Pending CN117764724A (en) 2024-01-02 2024-01-02 Intelligent credit rating report construction method and system

Country Status (1)

Country Link
CN (1) CN117764724A (en)

Similar Documents

Publication Publication Date Title
Chu et al. A global supply chain risk management framework: An application of text-mining to identify region-specific supply chain risks
No et al. Multidimensional audit data selection (MADS): A framework for using data analytics in the audit data selection process
García et al. An insight into the experimental design for credit risk and corporate bankruptcy prediction systems
US11372896B2 (en) Method and apparatus for grouping data records
Rožanec et al. Knowledge graph-based rich and confidentiality preserving Explainable Artificial Intelligence (XAI)
Tuarob et al. DAViS: a unified solution for data collection, analyzation, and visualization in real-time stock market prediction
CN115358481A (en) Early warning and identification method, system and device for enterprise ex-situ migration
Jaiswal et al. Data Mining Techniques and Knowledge Discovery Database
Jeyaraman et al. Practical Machine Learning with R: Define, build, and evaluate machine learning models for real-world applications
Fang et al. LDA-based industry classification
Wei et al. Using machine learning to detect PII from attributes and supporting activities of information assets
CN117764724A (en) Intelligent credit rating report construction method and system
Pakhchanyan et al. Machine learning for categorization of operational risk events using textual description
Gebru Association pattern discovery of import export items in ethiopia
Ding et al. LSTM Deep Neural Network Based Power Data Credit Tagging Technology.
Dagnaw et al. Data management practice in 21st century: systematic review
Seo et al. Measuring News Sentiment of Korea Using Transformer
Widad et al. Quality Anomaly Detection Using Predictive Techniques: An Extensive Big Data Quality Framework for Reliable Data Analysis
RU2777958C2 (en) Ai transaction administration system
Li et al. LSTM Deep Neural Network Based Power Data Credit Tagging Technology
Zahra et al. Sentiment Analysis Of Twitter Dataset Using Lle And Classification Methods
Phutela et al. Applying Descriptive and Predictive Analytics on Academic Dataset
Sun Sourcing Risk Detection and Prediction with Online Public Data: An Application of Machine Learning Techniques in Supply Chain Risk Management
Föhr Artificial Intelligence Technologies and the Risk-based Audit Approach: A Categorization and Classification Method
Tsapani et al. Knowledge mining from accounting data as imechanism for decision support

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination