CN114817557A - Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph - Google Patents

Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph Download PDF

Info

Publication number
CN114817557A
CN114817557A CN202210302732.0A CN202210302732A CN114817557A CN 114817557 A CN114817557 A CN 114817557A CN 202210302732 A CN202210302732 A CN 202210302732A CN 114817557 A CN114817557 A CN 114817557A
Authority
CN
China
Prior art keywords
enterprise
credit investigation
big data
enterprise credit
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210302732.0A
Other languages
Chinese (zh)
Inventor
宋美娜
刘毓
鄂海红
欧中洪
张光卫
于勰
董亚飞
李国英
冯煜
国晓雪
郭京荆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202210302732.0A priority Critical patent/CN114817557A/en
Priority to PCT/CN2022/087210 priority patent/WO2023178767A1/en
Publication of CN114817557A publication Critical patent/CN114817557A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an enterprise risk detection method and device based on an enterprise credit investigation big data knowledge graph, wherein the method comprises the following steps: constructing an enterprise credit investigation big data unified information model through scattered data subdomain data; constructing a first enterprise credit investigation big data field ontology in a top-down mode based on an enterprise credit investigation big data unified information model; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, and selecting high-quality new words to expand the scale of a first enterprise credit investigation big data field body so as to construct a second enterprise credit investigation big data field body; and based on the constructed body, constructing a knowledge graph of the enterprise credit investigation big data by utilizing the enterprise credit investigation big data, acquiring the characteristics through the knowledge graph, inputting the acquired characteristic data into the trained wind control model, outputting the classification result, and using the classification result for classifying enterprises. The method and the system improve the accuracy of the knowledge map ontology in the enterprise credit investigation field and improve the performance of the wind control model.

Description

Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph
Technical Field
The invention relates to the field of enterprise risk detection, in particular to an enterprise risk detection method and device based on an enterprise credit investigation big data knowledge graph.
Background
At present, in an enterprise risk detection method based on a knowledge graph, a mainstream mode is to extract enterprise node attributes in the knowledge graph as basic attribute features, extract relationships between enterprises and other enterprise entities in the knowledge graph as incidence relationship features, and input the basic attribute features and the incidence relationship features of the enterprises as the features of a subsequent wind control model. Some extracts the characteristic information of the enterprise in the network, including the quantity, proportion and the like of the default enterprise in the first-order and second-order neighbor relations in the network as the relational characteristics of the enterprise, and inputs a gradient lifting decision tree classification model by combining the basic attribute characteristics of the enterprise. One defines three kinds of knowledge maps related to enterprise risks according to business and data backgrounds. The knowledge graph network respectively relates knowledge graphs of upstream and downstream of enterprises, investment financing and close association, and obtains the relationship between the enterprises by using a community discovery algorithm. A person comprehensively mines enterprise association through data such as stock right relationship, personnel relationship and the like, constructs an enterprise credit investigation knowledge map, constructs two models based on the map, and respectively provides an enterprise association relationship analysis model and an enterprise group association risk model for helping to identify enterprise risks in the credit whole process of the business bank.
As described above, in the current enterprise risk detection method based on the knowledge graph, the features used in the method are mainly divided into two types, the first type is a basic attribute feature (mainly data of an enterprise in the financial and judicial fields), and the second type is an association feature (representing the close relationship between an enterprise entity and other enterprise entities in the knowledge graph).
However, different industries cannot share credit investigation data based on the characteristic that the privacy of the credit investigation data is high, and the credit investigation data has the challenges of incompleteness and information isolated island. The basis of constructing the enterprise credit investigation map is enterprise credit investigation data, so that the existing enterprise credit investigation maps at the present stage have the problems of information loss and the like, the enterprise entity attributes in the enterprise credit investigation maps only come from the fields of finance, judicial and the like, the credit condition of an enterprise is difficult to be completely expressed, the data dimension needs to be increased, and the model effect needs to be improved.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to solve the problem of information loss of an enterprise credit investigation map, improve the accuracy of default enterprise prediction and provide an enterprise risk detection method based on an enterprise credit investigation big data knowledge map.
The invention further aims to provide an enterprise risk detection device based on the enterprise credit investigation big data knowledge graph.
In order to achieve the above purpose, the present invention provides an enterprise risk detection method based on an enterprise credit investigation big data knowledge graph, including:
acquiring an enterprise credit investigation big data unified information model based on a plurality of scattered data subdomains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture; extracting the relationship between key characters and enterprises through the enterprise information of the hierarchical key personnel architecture and the enterprise personnel information of the hierarchical enterprise information architecture so as to realize large data cross-domain connection of enterprise credit investigation; constructing a first enterprise credit investigation big data field ontology in a top-down mode based on the enterprise credit investigation big data unified information model for realizing the cross-domain connection; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, and selecting high-quality new words to expand the scale of the first enterprise credit investigation big data field body so as to construct a second enterprise credit investigation big data field body; based on the second enterprise credit investigation big data field ontology, establishing an enterprise credit investigation big data knowledge graph by utilizing the enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge graph in a graph database; and acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into a trained wind control model for calculation and classification, and outputting a classification result.
According to the enterprise risk detection method based on the enterprise credit investigation big data knowledge graph, disclosed by the embodiment of the invention, the body scale is expanded by means of strict concept definition limitation and relation limitation from top to bottom and integrating a bottom-up mode, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is greatly improved, a solid foundation is laid for generating a high-quality knowledge graph later, and the characteristics of enterprise research and development innovation capability are innovatively introduced as the input of a wind control model, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is improved, and the performance of the wind control model is also improved.
In addition, the enterprise risk detection method based on the enterprise credit investigation big data knowledge graph according to the embodiment of the invention further comprises the following steps:
further, the hierarchical enterprise information architecture of the enterprise credit investigation big data unified information model comprises: enterprise basic information, enterprise personnel information, enterprise business information, enterprise asset information, enterprise intellectual property information, enterprise financial information, enterprise equity information, judicial data, enterprise risk information, and auxiliary reference information subdomains.
Further, the step of performing entity extraction and relationship extraction on the data in the enterprise credit investigation big data field in a bottom-up construction mode, selecting a high-quality new word to expand the scale of the first enterprise credit investigation big data field ontology so as to construct a second enterprise credit investigation big data field ontology includes: performing entity extraction and relationship extraction on data in the big credit data field of the user enterprise in a bottom-up construction mode; identifying named entities and relationship examples in the data based on the entity extraction and the relationship extraction, and performing quality judgment on the named entities and the relationship examples which cannot be identified; and determining a quality rank based on the quality judgment, selecting a high-quality new word and expanding the first enterprise credit investigation big data field ontology to construct the second enterprise credit investigation big data field ontology.
Further, the enterprise feature data acquisition includes: acquiring basic attribute characteristics, incidence relation characteristics and research, development and innovation capability characteristics of enterprises; acquiring basic attribute characteristics of the enterprise and research, development and innovation capability characteristics of the enterprise from the enterprise credit investigation big data knowledge graph; extracting enterprise relational features through four types of relations, and extracting network features in the enterprise credit investigation big data knowledge graph through a shortest path algorithm and a community discovery algorithm to obtain the incidence relational features of the enterprises; the four types of relations comprise a participation relation, an investment relation, a transaction relation and a litigation relation.
Further, the wind control model comprises: data preprocessing, feature processing engineering and result classification.
Further, the data preprocessing comprises: preprocessing the acquired enterprise characteristic data, converting date type data into character type variables, then converting all the character type variables to obtain numerical type data, and extracting the IV value, WOE, efficiency and rate of the numerical type data.
Further, the formula of the IV value, WOE, efficacy, and rate is:
Figure BDA0003563449960000031
Figure BDA0003563449960000032
Figure BDA0003563449960000033
Figure BDA0003563449960000034
among them, Good i And Bad i Showing and counting the number of non-default enterprises and default enterprises, Good in each sub-box T And Bad T Respectively representing the total number of non-default enterprises and the number of default enterprises.
Further, the feature processing engineering includes: features missing more than 50% of the value, features containing only unique values, and other features with a correlation of greater than 60% with features, features with a feature importance of 0.0 in the gradient enhancer, are deleted, and the low importance feature that contributes 99% of the cumulative feature importance from the gradient enhancer is not contributed.
Further, the result classification includes: acquiring the enterprise characteristic data sample and an enterprise label; obtaining a trained LightGBM classification model by utilizing the enterprise characteristic data sample and an enterprise label supervised training LightGBM classification model; inputting the features processed by the feature processing engineering into the trained LightGBM classification model, and performing calculation classification to obtain a classification result; wherein the classification result is classified as default and normal.
In order to achieve the above object, another aspect of the present invention provides an enterprise risk detection apparatus based on an enterprise credit investigation big data knowledge graph, including:
the information acquisition module is used for acquiring an enterprise credit investigation big data unified information model based on a plurality of scattered data subdomains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture;
the relation connection module is used for extracting the relation between key characters and enterprises through the enterprise information of the hierarchical key personnel architecture and the enterprise personnel information of the hierarchical enterprise information architecture so as to realize cross-domain connection of large enterprise credit investigation data;
the body construction module is used for determining the enterprise credit investigation big data field in a top-down mode and constructing a first enterprise credit investigation big data field body based on the enterprise credit investigation big data unified information model for realizing the cross-domain connection; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, selecting high-quality new words and expanding the scale of the first enterprise credit investigation big data field body to construct a second enterprise credit investigation big data field body;
the map construction module is used for constructing an enterprise credit investigation big data knowledge map by utilizing the enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge map in a map database based on the second enterprise credit investigation big data field body;
and the calculation classification module is used for acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into a trained wind control model for calculation classification and outputting a classification result.
According to the enterprise risk detection device based on the enterprise credit investigation big data knowledge graph, disclosed by the embodiment of the invention, the body scale is expanded by means of strict concept definition limitation and relation limitation from top to bottom and integrating a bottom-up mode, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is greatly improved, a solid foundation is laid for generating a high-quality knowledge graph later, and the characteristics of enterprise research and development innovation capability are innovatively introduced as the input of a wind control model, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is improved, and the performance of the wind control model is also improved.
The invention has the beneficial effects that:
(1) the technology for constructing the enterprise credit investigation big data knowledge map solves the problems that the existing enterprise credit investigation maps have information loss at the present stage;
(2) the performance of the wind control model introduced with the characteristics of enterprise research and development innovation capability provided by the invention exceeds that of the traditional wind control model based on the enterprise credit investigation knowledge map, so that the default enterprise can be conveniently identified in advance, and the risk is reduced.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic diagram of an enterprise risk detection architecture based on an enterprise credit investigation big data knowledge-graph according to an embodiment of the present invention;
FIG. 2 is a flowchart of an enterprise risk detection method based on an enterprise credit investigation big data knowledge-graph according to an embodiment of the invention;
FIG. 3 is a block diagram of a hierarchical enterprise information of an enterprise credit investigation big data unified information model according to an embodiment of the present invention;
FIGS. 4(a) and 4(b) are schematic diagrams of an enterprise financial information secondary architecture of an enterprise credit investigation big data unified information model according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an architecture of a hierarchical key person of an enterprise credit investigation big data unified information model according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart diagram illustrating a top-down and bottom-up assisted enterprise credit big data knowledge-graph ontology according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of a design of a wind control model according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an enterprise risk detection device based on an enterprise credit investigation big data knowledge graph according to an embodiment of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The enterprise risk detection method and device based on the enterprise credit investigation big data knowledge graph provided by the embodiment of the invention are described below with reference to the accompanying drawings.
The enterprise risk detection method based on the enterprise credit investigation big data knowledge graph of the embodiment of the invention has the overall flow as shown in figure 1. On the basis of an original wind control model, the embodiment of the invention adds the characteristics of enterprise research and development innovation capability to increase the hierarchy and dimensionality of the characteristics. In addition to the increase of features, the LightGBM is used in the enterprise wind control model in the enterprise risk monitoring method according to the embodiment of the present invention, because the LightGBM actually uses a gradient lifting framework based on a decision tree algorithm. Therefore, the LightGBM can also obtain the importance of the features to the model in the training process, and the importance is used for evaluating the influence degree of different features on the default of the enterprise.
FIG. 2 is a flowchart of a method for enterprise risk detection based on enterprise credit investigation big data knowledge-graph according to an embodiment of the present invention.
As shown in fig. 2, the enterprise risk detection method based on enterprise credit investigation big data knowledge-graph includes the following steps:
step S1, acquiring a big data unified information model of enterprise credit investigation based on a plurality of scattered data subdomains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture.
Specifically, the embodiment of the invention researches relevant papers and patents of enterprise credit investigation knowledge maps by applying expert knowledge and researching a series of relevant enterprise credit investigation data standard systems, extracts an enterprise-key character combined framework from various dispersed data subdomains such as government affairs, industry and commerce, judicial expertise, public opinion and the like, designs a set of hierarchical enterprise information framework and key personnel framework facing to enterprise credit investigation big data scenes, and realizes global entity association of enterprise credit investigation big data by taking the relationships among various entities as connecting edges.
The hierarchical enterprise information architecture of the enterprise credit investigation big data unified information model is jointly supported by 10 information subdomains such as enterprise basic information, enterprise personnel information, enterprise operation information, enterprise asset information, enterprise intellectual property information, enterprise financial information, enterprise right of stock information, judicial data, enterprise risk information, auxiliary references and the like, and the hierarchical enterprise information architecture of the enterprise credit investigation big data unified information model is shown in fig. 3.
As shown in fig. 4(a) and 4(b), a fine-grained view of an enterprise information architecture is shown by taking enterprise financial data as an example.
And step S2, extracting the relation between the key character and the enterprise through the enterprise information of the hierarchical key personnel architecture and the enterprise personnel information of the hierarchical enterprise information architecture so as to realize the cross-domain connection of the enterprise credit investigation big data.
The view of the enterprise credit investigation big data unified information model hierarchical key personnel architecture is composed of four information subdomains such as basic information, working information, social relation and historical risk, and by means of enterprise information in the key personnel architecture and enterprise personnel information in the enterprise information architecture, the association barrier of the architecture and the enterprise architecture can be opened to form the mapping relation between entity objects, so that the hierarchy and the association of 'enterprise-key personnel' of the credit investigation big data are realized, and the problem of difficult cross-domain connection of the enterprise credit investigation big data is primarily solved.
As shown in fig. 5, a view of a hierarchical key personnel architecture for an enterprise credit big data unified information model.
A hierarchical enterprise information architecture and a key character information architecture facing an enterprise credit investigation big data scene hope to realize the global entity association of the enterprise credit investigation big data in a double-core mode, and the relation between entities needs to be defined. The entity relationship settings are as in table 1.
Table 1: entity relationship design table
Figure BDA0003563449960000061
Figure BDA0003563449960000071
Step S3, constructing a first enterprise credit investigation big data field ontology in a top-down mode based on an enterprise credit investigation big data unified information model for realizing cross-domain connection; and performing entity extraction and relation extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, and selecting high-quality new words to expand the scale of the first enterprise credit investigation big data field ontology so as to construct a second enterprise credit investigation big data field ontology.
Specifically, the first step of constructing a high-quality enterprise credit investigation big data knowledge graph is to define an accurate and clear knowledge mode (schema), namely, to give an ontology describing a basic cognitive framework in the enterprise credit investigation field. However, the traditional construction method focusing only on the "top-down method" has a large dependency on domain experts. The bottom-up method and massive, multi-source and heterogeneous data are great challenges for constructing an ontology from bottom to top and fusing subsequent knowledge.
Based on the defects of the single knowledge graph ontology construction method, the enterprise credit investigation big data knowledge graph ontology construction method based on top-down as the main method and bottom-up as the auxiliary method is used, concepts and relations are constrained by the top-down method, the scale of the ontology is expanded by combining the bottom-up method, the accuracy and the fineness of the knowledge graph ontology are greatly improved, a solid foundation is laid for generating a high-quality knowledge graph later, and the specific construction process is shown in FIG. 6.
The domain ontology is formed in a top-down manner, and the domain ontology needs to be constructed by mining knowledge of a domain knowledge base and listening to suggestions of domain experts. The domain knowledge base includes but is not limited to internet knowledge base, encyclopedia website, industry authority guide, metadata national standard and relational database, etc. of the domain. For example, the enterprise-key personnel system summarized based on the hierarchical enterprise information architecture and the key personnel information architecture according to the embodiment of the invention orderly organizes mass data resources in the field of enterprise credit investigation big data. From the label system, high-quality concepts and attributes in the enterprise credit investigation field and the mutual relations among the concepts can be screened out, and a field ontology prototype is constructed.
Domain ontologies created using a top-down approach have been able to guide the construction of enterprise credit investigation big data knowledge graph instance libraries. However, due to the increase of the scale of data resources in the enterprise credit investigation domain, the enterprise credit investigation domain ontology model constructed only in a top-down manner is limited to the scale, and the requirements of subsequent knowledge map construction technologies (such as knowledge extraction and knowledge fusion) cannot be met. If the multi-source, massive and heterogeneous data resources in the enterprise credit investigation field can be arranged, utilized and perfected, a huge data driving force can be generated for the construction of the knowledge graph in the enterprise credit investigation field, so that the bottom-up construction mode is also an important link in the enterprise credit investigation big data knowledge graph body and data construction. The bottom-up assistant line construction process comprises the steps of firstly, carrying out entity extraction and relation extraction on data in the enterprise credit investigation field, extracting named entities and relation examples in the data, and carrying out quality judgment on the named entities and the relation examples which cannot be identified. And the credit investigation expert judges whether the new words with the top quality rank are high-quality phrases or not and expands the body structure of the current enterprise credit investigation field.
And step S4, constructing an enterprise credit investigation big data knowledge graph by utilizing the enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge graph in a graph database based on the second enterprise credit investigation big data field body.
Specifically, after the enterprise credit investigation big data knowledge map body is constructed by the method, the existing enterprise credit investigation big data is utilized to construct a knowledge map and is stored in a Neo4j map database, and a data basis is provided for a subsequent wind control model.
And step S5, acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into the trained wind control model for calculation and classification, and outputting a classification result.
Specifically, in an enterprise wind control model module, basic attribute characteristics, incidence relation characteristics, research and development innovation capability characteristics of an enterprise are obtained from an enterprise credit investigation big data knowledge graph, processed and jointly used as the input of a wind control model, and a LightGBM classification model is supervised and trained. The performance of the wind control model is improved by introducing the characteristics of enterprise research and development innovation capability. The processing flow of the embodiment of the present invention is shown in fig. 7, and includes:
(1) a data acquisition module:
in the enterprise credit investigation big data knowledge graph, the basic attribute capability features and the research and development innovation capability features of the enterprise exist in the form of enterprise node attributes, and the basic attribute capability features and the research and development innovation capability features are directly derived from a Neo4j database. The enterprise incidence relation characteristic is used for reflecting the relation between the enterprise entity and the default enterprise entity. Because the types of the nodes and the edges in the heterogeneous network are various, the characteristics and the difficulty degree of extracting the graph are improved, the proposal limits the enterprise credit investigation big data knowledge graph to be a homogeneous network, limits the nodes at two ends of the relation to be the enterprise only, folds and reduces the character nodes, reduces the interference of characters on the network, and ensures that each relation is between the enterprises. And the existing data and the traditional cognitive logic are combined, four types of enterprise relations with higher risks are reserved: a participation relation, an investment relation, a trading relation, and a litigation relation. And extracting enterprise relation features based on the four types of relations, wherein the modes for extracting the network features in the knowledge graph are a shortest path algorithm and a community discovery algorithm.
The extracted network features are shown in table 2:
table 2: enterprise association relation table
Figure BDA0003563449960000081
The development innovation capability features are shown in table 3:
table 3: developing innovation capability classes
Figure BDA0003563449960000091
(2) A data preprocessing module:
subsequent processing is performed on the new features of the model, which are obtained by extracting the IV Value (Information Value), WOE, effectiveness, and rate of the non-data type data using the credit rating card.
The enterprise data contains many attributes in pure character string format, such as specific length code type data of enterprise type, industry gate type, etc. Date type data such as an established date and an approval date are also included. For date type data, the date type data is converted at first, and then the date type data is converted into numerical type data in second unit, and then the numerical type data is converted into character type format. Then, all the character-type variables are converted into numerical data, and the IV Value (Information Value), WOE, effectiveness, and rate are extracted.
The WOE, IV, Efficiency, rate are given by the following formula:
Figure BDA0003563449960000092
Figure BDA0003563449960000093
Figure BDA0003563449960000101
Figure BDA0003563449960000102
wherein Good i And Bad i And the non-default enterprise number and the default enterprise number in each sub-box are counted. Good's Good T And Bad T Respectively representing the total number of non-default enterprises and the number of default enterprises.
(3) A characteristic engineering module:
in the feature engineering link, in order to process the problems that a large number of missing values exist in original data, the correlation between features is too high, and the like, the features need to be processed first. The main steps are to delete features with missing values over 50%, features with only unique values, and other features with correlation higher than 60%, features with feature importance 0.0 in the gradient enhancer (gbm), and low importance features with accumulated feature importance 99% not contributed from gbm.
(4) A classification module:
the LightGBM algorithm is used in the module, and classification results can be obtained by inputting the features processed by the feature engineering module into the model, wherein the classification results are classified into two types, namely default and normal. Because LightGBM actually uses a gradient boosting framework based on a decision tree algorithm. Therefore, the LightGBM can obtain the importance of the features to the model in the training process. The importance of a feature can be used as a measure of how much different features impact a business default.
Through the steps, the body scale is expanded by strictly defining concept limits and relation limits from top to bottom and fusing a bottom-up mode, the accuracy of the knowledge graph body in the enterprise credit investigation field is greatly improved, a solid foundation is laid for generating a high-quality knowledge graph later, the characteristics of enterprise research and development innovation capability are innovatively introduced to serve as the input of the wind control model, the accuracy of the knowledge graph body in the enterprise credit investigation field is improved, and the performance of the wind control model is also improved.
In order to implement the foregoing embodiment, as shown in fig. 8, in this embodiment, there is further provided an enterprise risk detection apparatus 10 based on an enterprise credit investigation big data knowledge graph, where the apparatus 10 includes: the system comprises an information acquisition module 100, a relation connection module 200, an ontology construction module 300, a map construction module 400 and a calculation classification module 500.
The information acquisition module 100 is configured to acquire a big data unified information model for enterprise credit investigation based on a plurality of scattered data sub-domains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture;
the relation connection module 200 is used for extracting the relation between the key character and the enterprise through enterprise information of the hierarchical key personnel architecture and enterprise personnel information of the hierarchical enterprise information architecture so as to realize cross-domain connection of large credit investigation data of the enterprise;
the body construction module 300 is used for determining an enterprise credit investigation big data field in a top-down mode and constructing a first enterprise credit investigation big data field body based on an enterprise credit investigation big data unified information model for realizing cross-domain connection; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, selecting high-quality new words and expanding the scale of a first enterprise credit investigation big data field body to construct a second enterprise credit investigation big data field body;
the map construction module 400 is used for constructing an enterprise credit investigation big data knowledge map by utilizing enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge map in a map database based on a second enterprise credit investigation big data field body;
and the calculation classification module 500 is used for acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into the trained wind control model for calculation classification and outputting a classification result.
According to the enterprise risk detection device based on the enterprise credit investigation big data knowledge graph, disclosed by the embodiment of the invention, the body scale is expanded by means of strict concept definition limitation and relation limitation from top to bottom and integrating a bottom-up mode, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is greatly improved, a solid foundation is laid for generating a high-quality knowledge graph later, and the characteristics of enterprise research and development innovation capability are innovatively introduced as the input of a wind control model, so that the accuracy of the knowledge graph body in the enterprise credit investigation field is improved, and the performance of the wind control model is also improved.
It should be noted that the explanation of the embodiment of the enterprise risk detection method based on the enterprise credit investigation big data knowledge graph is also applicable to the enterprise risk detection device based on the enterprise credit investigation big data knowledge graph of the embodiment, and details are not repeated here.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. An enterprise risk detection method based on an enterprise credit investigation big data knowledge graph is characterized by comprising the following steps:
acquiring an enterprise credit investigation big data unified information model based on a plurality of scattered data subdomains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture;
extracting the relationship between key characters and enterprises through the enterprise information of the hierarchical key personnel architecture and the enterprise personnel information of the hierarchical enterprise information architecture so as to realize large data cross-domain connection of enterprise credit investigation;
constructing a first enterprise credit investigation big data field ontology in a top-down mode based on the enterprise credit investigation big data unified information model for realizing the cross-domain connection; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, and selecting high-quality new words to expand the scale of the first enterprise credit investigation big data field body so as to construct a second enterprise credit investigation big data field body;
based on the second enterprise credit investigation big data field ontology, establishing an enterprise credit investigation big data knowledge graph by utilizing the enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge graph in a graph database;
and acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into a trained wind control model for calculation and classification, and outputting a classification result.
2. The method of claim 1, wherein the hierarchical enterprise information architecture of the enterprise credit big data unified information model comprises:
enterprise basic information, enterprise personnel information, enterprise business information, enterprise asset information, enterprise intellectual property information, enterprise financial information, enterprise equity information, judicial data, enterprise risk information, and auxiliary reference information subdomains.
3. The method according to claim 1, wherein said performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field through a bottom-up construction method, selecting a high-quality new word to expand the ontology scale of the first enterprise credit investigation big data field to construct a second enterprise credit investigation big data field ontology, comprises:
performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode;
identifying named entities and relationship examples in the data based on the entity extraction and the relationship extraction, and performing quality judgment on the named entities and the relationship examples which cannot be identified;
and determining a quality rank based on the quality judgment, selecting a high-quality new word and expanding the first enterprise credit investigation big data field ontology to construct the second enterprise credit investigation big data field ontology.
4. The method of claim 1, wherein the enterprise trait data acquisition comprises: acquiring basic attribute characteristics, incidence relation characteristics and research and development innovation capability characteristics of enterprises; wherein the content of the first and second substances,
acquiring basic attribute characteristics of the enterprise and research and development innovation capability characteristics of the enterprise from the enterprise credit investigation big data knowledge graph; extracting enterprise relational features through four types of relations, and extracting network features in the enterprise credit investigation big data knowledge graph through a shortest path algorithm and a community discovery algorithm to obtain the incidence relational features of the enterprises; the four types of relations comprise a participation relation, an investment relation, a transaction relation and a litigation relation.
5. The method of claim 1, wherein the wind control model comprises: data preprocessing, feature processing engineering and result classification.
6. The method of claim 5, wherein the data preprocessing comprises:
preprocessing the acquired enterprise characteristic data, converting date type data into character type variables, then converting all the character type variables to obtain numerical type data, and extracting the IV value, WOE, efficiency and rate of the numerical type data.
7. The method of claim 6, wherein the IV value, WOE, efficiency, and rate are expressed by the following formula:
Figure FDA0003563449950000021
Figure FDA0003563449950000022
Figure FDA0003563449950000023
Figure FDA0003563449950000024
among them, Good i And Bad i Showing and counting the number of non-default enterprises and default enterprises, Good in each sub-box T And Bad T Respectively representing the total number of non-default enterprises and the number of default enterprises.
8. The method of claim 4, wherein the feature processing project comprises:
features missing more than 50% of the value, features containing only unique values, and other features with a correlation of greater than 60% with features, features with a feature importance of 0.0 in the gradient enhancer, are deleted, and the low importance feature that contributes 99% of the cumulative feature importance from the gradient enhancer is not contributed.
9. The method of claim 4, wherein the result classification comprises:
acquiring the enterprise characteristic data sample and an enterprise label;
obtaining a trained LightGBM classification model by utilizing the enterprise characteristic data sample and an enterprise label supervised training LightGBM classification model;
inputting the features processed by the feature processing engineering into the trained LightGBM classification model, and performing calculation classification to obtain a classification result; wherein the classification result is classified as default and normal.
10. An enterprise risk detection device based on enterprise credit investigation big data knowledge-graph is characterized by comprising:
the information acquisition module is used for acquiring an enterprise credit investigation big data unified information model based on a plurality of scattered data subdomains; the enterprise credit investigation big data unified information model comprises a hierarchical enterprise information architecture and a hierarchical key personnel architecture;
the relation connection module is used for extracting the relation between key characters and enterprises through the enterprise information of the hierarchical key personnel architecture and the enterprise personnel information of the hierarchical enterprise information architecture so as to realize cross-domain connection of large enterprise credit investigation data;
the body construction module is used for determining the enterprise credit investigation big data field in a top-down mode and constructing a first enterprise credit investigation big data field body based on the enterprise credit investigation big data unified information model for realizing the cross-domain connection; performing entity extraction and relationship extraction on data in the enterprise credit investigation big data field in a bottom-up construction mode, selecting high-quality new words and expanding the scale of the first enterprise credit investigation big data field body to construct a second enterprise credit investigation big data field body;
the map construction module is used for constructing an enterprise credit investigation big data knowledge map by utilizing the enterprise credit investigation big data and storing the enterprise credit investigation big data knowledge map in a map database based on the second enterprise credit investigation big data field body;
and the calculation classification module is used for acquiring enterprise characteristic data by using the enterprise credit investigation big data knowledge graph, inputting the acquired enterprise characteristic data into a trained wind control model for calculation classification and outputting a classification result.
CN202210302732.0A 2022-03-24 2022-03-24 Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph Pending CN114817557A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210302732.0A CN114817557A (en) 2022-03-24 2022-03-24 Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph
PCT/CN2022/087210 WO2023178767A1 (en) 2022-03-24 2022-04-15 Enterprise risk detection method and apparatus based on enterprise credit investigation big data knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210302732.0A CN114817557A (en) 2022-03-24 2022-03-24 Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph

Publications (1)

Publication Number Publication Date
CN114817557A true CN114817557A (en) 2022-07-29

Family

ID=82529928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210302732.0A Pending CN114817557A (en) 2022-03-24 2022-03-24 Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph

Country Status (2)

Country Link
CN (1) CN114817557A (en)
WO (1) WO2023178767A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115934963A (en) * 2022-12-26 2023-04-07 深度(山东)数字科技集团有限公司 Business draft big data analysis method and application map for enterprise financial customer acquisition

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166167A1 (en) * 2019-12-02 2021-06-03 Asia University Artificial intelligence and blockchain-based inter-enterprise credit rating and risk assessment method and system
CN112131275B (en) * 2020-09-23 2023-07-25 长三角信息智能创新研究院 Enterprise portrait construction method of holographic city big data model and knowledge graph
CN113537796A (en) * 2021-07-22 2021-10-22 大路网络科技有限公司 Enterprise risk assessment method, device and equipment
CN114066242A (en) * 2021-11-11 2022-02-18 北京道口金科科技有限公司 Enterprise risk early warning method and device
CN114202223A (en) * 2021-12-16 2022-03-18 深圳前海微众银行股份有限公司 Enterprise credit risk scoring method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115934963A (en) * 2022-12-26 2023-04-07 深度(山东)数字科技集团有限公司 Business draft big data analysis method and application map for enterprise financial customer acquisition
CN115934963B (en) * 2022-12-26 2023-07-18 深度(山东)数字科技集团有限公司 Commercial draft big data analysis method and application map for enterprise finance acquisition

Also Published As

Publication number Publication date
WO2023178767A1 (en) 2023-09-28

Similar Documents

Publication Publication Date Title
CN109918511B (en) BFS and LPA based knowledge graph anti-fraud feature extraction method
Flood et al. The application of visual analytics to financial stability monitoring
CN110223168A (en) A kind of anti-fraud detection method of label propagation and system based on business connection map
CN110489561A (en) Knowledge mapping construction method, device, computer equipment and storage medium
CN112214614B (en) Knowledge-graph-based risk propagation path mining method and system
CN112699249B (en) Knowledge graph-based information processing method, device, equipment and storage medium
CN110489565A (en) Based on the object root type design method and system in domain knowledge map ontology
CN108492001A (en) A method of being used for guaranteed loan network risk management
CN113837859B (en) Image construction method for small and micro enterprises
CN112417176A (en) Graph feature-based method, device and medium for mining implicit association relation between enterprises
CN114443855A (en) Knowledge graph cross-language alignment method based on graph representation learning
Xu et al. CET-4 score analysis based on data mining technology
Cheong et al. Interpretable stock anomaly detection based on spatio-temporal relation networks with genetic algorithm
CN114817557A (en) Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph
Abrantes et al. Big data applied to tax evasion detection: A systematic review
CN110222180A (en) A kind of classification of text data and information mining method
Kydros et al. A framework for identifying the falsified financial statements using network textual analysis: a general model and the Greek example
CN116151967A (en) Fraudulent party identification system based on transaction knowledge graph
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN115618926A (en) Important factor extraction method and device for taxpayer enterprise classification
Polovnikov et al. Ownership Concentration and Wealth Inequality in Russia.
CN115455198A (en) Model training method, legal action information alignment and fusion method and terminal equipment thereof
Framewala et al. Blockchain analysis tool for monitoring coin flow
CN109828995B (en) Visual feature-based graph data detection method and system
Badyal et al. Insightful Business Analytics Using Artificial Intelligence-A Decision Support System for E-Businesses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination