CN114066242A - Enterprise risk early warning method and device - Google Patents

Enterprise risk early warning method and device Download PDF

Info

Publication number
CN114066242A
CN114066242A CN202111359407.XA CN202111359407A CN114066242A CN 114066242 A CN114066242 A CN 114066242A CN 202111359407 A CN202111359407 A CN 202111359407A CN 114066242 A CN114066242 A CN 114066242A
Authority
CN
China
Prior art keywords
enterprise
data
risk
early warning
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111359407.XA
Other languages
Chinese (zh)
Inventor
张彩虹
孔祥永
王浩
袁伟
蔡明�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Daokou Jinke Technology Co ltd
Original Assignee
Beijing Daokou Jinke Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Daokou Jinke Technology Co ltd filed Critical Beijing Daokou Jinke Technology Co ltd
Priority to CN202111359407.XA priority Critical patent/CN114066242A/en
Publication of CN114066242A publication Critical patent/CN114066242A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Abstract

The invention provides an enterprise risk early warning method and device, and relates to the technical field of internet. The device comprises a data preprocessing module, a native index extraction module, an index derivation module, an association relation mining module, an enterprise risk early warning rule generation module and a real-time risk early warning module. The method comprises the following steps: collecting information data of an enterprise main body and the enterprise main body, preprocessing the data, extracting a primary index and constructing an original database; performing index derivation according to the native indexes to form a derivation database; mining the association relationship between the enterprise main body and the enterprise main body by using a knowledge graph; selecting indexes with good discrimination, and constructing enterprise risk early warning rules; and carrying out enterprise risk early warning in real time by using the generated early warning rule. The invention carries out deep mining and analysis on panoramic data of enterprises and behavior data of enterprise owners, so that the risk change of the monitored enterprises can be more accurate, timely and comprehensive, and active discovery and timely early warning are realized.

Description

Enterprise risk early warning method and device
Technical Field
The invention relates to the technical field of enterprise risk data mining, knowledge maps and the Internet, in particular to an enterprise risk early warning method and device.
Background
Enterprise risk relates to data clutter, wide dimensionality, and complex subject incidence relation, and currently mainly depends on traditional means: the method comprises the steps of manually inquiring related data account numbers offline, acquiring the information of the enterprise, such as industry, business and judicial expertise, manually integrating the information, and mining the association relationship of the enterprise, wherein the information is poor in timeliness, low in accuracy, single in data analysis dimension and complex in association relationship, so that the information matching accuracy is low, the information asymmetry is high, the association relationship and risk variation between the enterprise and an association party cannot be timely obtained, and the opportunity of early risk finding and timely risk interception is lost.
Currently, there is also research on enterprise risk prediction. For example, the invention patent application with the publication number of CN112801773A discloses an enterprise risk early warning method, which comprises the steps of obtaining transaction flow information of a target enterprise, constructing corresponding transaction characteristic data, and performing risk detection on the target enterprise by using a preset abnormal repayment and information recovery source identification model, wherein the identification model adopts a decision tree model. In the technology, the enterprise risk is judged only through the transaction running information of the target enterprise, the judgment standard is single, and the enterprise risk is easy to be missed. For example, the invention patent application with publication number CN113297283A discloses a public opinion analysis method for enterprise risk early warning, which collects public opinion text data from a designated website to construct a data source sequence, matches risk labels of the public opinion text data with a preset risk label set to construct a risk label sequence, classifies the public opinion text data with emotion polarities by using an emotion classification model to construct an emotion polarity sequence, identifies enterprise entity names related to the public opinion text data to construct an enterprise related sequence, and finally performs public opinion analysis by combining several sequences to obtain enterprise risks. The technology mainly utilizes a preset risk label for detection, the risk label is generally obtained by manual integration, uncertainty exists, and enterprise risk is easily missed.
Therefore, there is a need for performing big data analysis based on holographic data of "enterprise and enterprise owner" dimension, so as to implement a technology capable of performing data analysis more quickly and accurately and finding enterprise risks more comprehensively in time.
Disclosure of Invention
The invention aims to provide an enterprise risk early warning method and device, which utilize the technical means of big data analysis and knowledge map to carry out deep mining and analysis on panoramic data of an enterprise and behavior data of an enterprise owner, construct a real-time dynamic monitoring early warning mechanism after enterprise loan, enable the risk change of the monitored enterprise to be more accurate, timely and comprehensive, and realize active discovery and timely early warning.
The invention discloses an enterprise risk early warning method, which comprises the following steps:
step 1: collecting information data of an enterprise main body and the enterprise main body, preprocessing the data, extracting native indexes and constructing an original database;
storing standardized data in an original database, wherein the standardized data comprise extracted native fields and corresponding values, and each native field is a native index;
step 2: performing index derivation according to the native indexes to form a derivation database;
the index derivation comprises: taking each native index as a feature, selecting the feature by using an Xgboost algorithm, and selecting a feature combination with strong importance to derive a new index;
index derivation new indices can also be derived from expert experience, including: counting the number of cases which are taken as being advertised by the enterprise and the amount of cases which are taken as being advertised by the enterprise.
And step 3: mining the association relationship between the enterprise main body and the enterprise main body by using a knowledge graph; taking enterprises or individuals as entity nodes, wherein the node attributes comprise risk probabilities of the enterprises or the individuals, and if the nodes have relationships, the nodes are connected by edges;
and 4, step 4: selecting indexes with good discrimination for enterprise risk prediction, and constructing enterprise risk early warning rules;
calculating the information quantity IV value of each index in the original database and the derivative database according to the collected training sample set, and selecting the index with IV greater than 0.1 for risk prediction;
the established enterprise risk early warning rules comprise risk early warning rules which are set in different dimensions of industry and commerce, judicial expertise, mortgage, tax involvement, abnormal operation, supervision and punishment, financial loan, risk public sentiment, financial risk, tax risk, industry risk and incidence relation for the enterprise; risk early warning rules with different dimensions of anti-fraud, multi-head loan, social behaviors and travel behaviors are set for business owners; setting risk grades in advance according to a training sample set and expert experience, and setting risk threshold values and risk weights of different enterprise risk early warning rules;
and 5: carrying out risk early warning on the enterprise in real time by using the generated enterprise risk early warning rule;
carry out batch processing to the enterprise data of gathering in real time, include: data cleaning, namely acquiring index data required by each rule, performing risk scoring by using each rule, and synthesizing early warning results of each rule to obtain a final enterprise risk value and a final risk grade;
based on the incidence relation between the enterprises and the enterprise owners mined by the knowledge graph, the risk conduction enterprises are positioned by using a label propagation algorithm, and risk early warning is carried out on the enterprises.
Correspondingly, the enterprise risk early warning device provided by the invention comprises:
the data preprocessing module is used for preprocessing the acquired enterprise and enterprise owner data, including data verification and auditing, data cleaning, data matching, data conversion and data loading;
the primary index extraction module is used for extracting primary fields recorded in the acquired data of enterprises and enterprise owners, and storing each primary field serving as a primary index into an original database;
the index derivation module reads the original indexes of the original database, performs index derivation and stores derived new indexes into a derivation database; the index derivation comprises: taking each native index as a feature, selecting the feature by using an Xgboost algorithm, and selecting a feature combination with strong importance to derive a new index; deriving new indexes by expert experience, wherein the new indexes comprise statistics of the number of cases which are taken as being taken by an enterprise and the sum of the cases which are taken as being taken by the enterprise;
the incidence relation mining module is used for mining incidence relations between the enterprise main body and the enterprise main body to generate a knowledge graph; in the knowledge graph, enterprises or individuals are used as entity nodes, the nodes are connected with each other by edges if the nodes have relations, the node attributes comprise risk probabilities of the enterprises or individuals, and the association relation between an enterprise main body and the enterprise main body is mined by connecting the edges;
the enterprise risk early warning rule generation module selects indexes with strong discrimination from the original database and the derivative database to predict enterprise risk, then sets a rule configuration scheme in advance according to training sample data and expert experience, generates enterprise risk early warning rules and stores the enterprise risk early warning rules in the early warning rule base; the index with strong discrimination is obtained by calculating the information content IV of the index according to the training sample set, and taking the index with IV greater than 0.1 as the index with strong discrimination;
real-time risk early warning module utilizes early warning rule base to carry out batch processing and early warning to the enterprise data of real-time collection, includes: data cleaning, namely acquiring index data required by each rule, performing risk scoring by using each rule, and synthesizing early warning results of each rule to obtain a final enterprise risk value and a final risk grade; meanwhile, based on the association relationship between the enterprise and the enterprise owner mined by the knowledge graph, the risk conduction enterprise is positioned by using a label propagation algorithm, and risk early warning is carried out on the enterprise.
Compared with the prior art, the invention has the advantages and positive effects that:
(1) according to the method and the device, the panoramic data of the enterprise and the behavior data of the enterprise owner are deeply mined and analyzed, so that the risk change of the monitored enterprise can be more accurate, timely and comprehensive, and active discovery and timely early warning are realized; and the data processing efficiency and the business/IT agility are improved, so that a manager can change business strategies and strategies more quickly and frequently, and the insight of a decision maker is promoted through data drive to help enterprises to develop business.
(2) The method and the device of the invention process and analyze data by a technical means of big data analysis and knowledge map, realize batch running, early warning and result output of enterprise risks by combining a process management and decision engine, effectively solve the risks which cannot be detected by the traditional means and realize real-time dynamic early warning and monitoring of enterprise risks.
Drawings
FIG. 1 is a flowchart of an enterprise risk early warning method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of extracting a primary index from a value-added tax invoice according to an embodiment of the present invention;
fig. 3 is a schematic diagram of risk early warning rule construction according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
According to the enterprise risk early warning method and device, the panoramic data of the enterprise and the behavior data of an enterprise owner are deeply mined and analyzed by using a technical means of big data analysis and knowledge map, and a real-time dynamic monitoring early warning mechanism after enterprise loan is constructed. Meanwhile, the invention can automatically generate a risk early warning report or carry out visual display in a system mode and carry out dynamic reminding.
The knowledge graph technology can be used for processing complex and diverse association analysis, and the analysis and processing of different enterprise role relationships are met; meanwhile, the thinking process of a human can be simulated to discover, ask for evidence and reason, the interactive machine learning technology is utilized, the learning functions of interactive actions such as reasoning, error correction, marking and the like are supported, the knowledge logic and the model are continuously precipitated, the intelligence of the system is improved, the knowledge is precipitated in an enterprise, and the dependence on experience is reduced; and compared with the traditional storage mode, the data retrieval speed of the schema data storage mode is higher, the map library can calculate the attribute distribution of over millions of potential entities, the second-level return result can be realized, the real-time response of man-machine interaction is really realized, and the user can make an instant decision.
The enterprise risk early warning method disclosed by the invention comprises the following steps 1-5 as shown in figure 1, and mainly comprises the following steps: and (3) building a data mart, constructing an enterprise risk label library, mining the relation based on the knowledge graph, and using a post-credit risk early warning model.
Step 1: and collecting related information data of the enterprise main body and the enterprise main body, cleaning, converting and integrating the data, and constructing a data mart. Extracting the original indexes from the collected data and storing the original indexes into an original database.
The enterprise risk includes enterprise subject risk and enterprise subject risk. The enterprise main risk is evaluated mainly through the basic strength of the enterprise, the risk of team members, the operation stability of the enterprise, the profit capacity of the enterprise, the growth capacity of the enterprise, the performance capacity of the enterprise and the associated performance history, and deep mining and analysis are performed mainly according to credit investigation information, business change information, illegal complaint information, quality assurance information, debt and debt information, abnormal operation information, financial operation information, tax declaration information, invoice transaction information, enterprise public opinion information and the like of the enterprise. Accurate portrayal of an enterprise is constructed by acquiring panoramic data of the enterprise, and rapid service response speed and coverage capability are kept through mutual backup and complementation of multiple data sources, so that stable output of wind control early warning results of the enterprise is ensured. The risk of the enterprise owner is mainly evaluated through the credit condition of the enterprise operator, the individual judicial risk, the working experience and other behavior risks, and is mainly based on the individual credit information, the judicial complaint information, the multi-head loan information, the fraudulent behavior information and the like of the enterprise owner.
The data dimension for evaluating enterprise risk is wide, the data source is wide, the data structure is complex, the standardization degree is poor, therefore, the collected data needs to be preprocessed, including data verification and auditing, data cleaning, data matching, data conversion and loading, the preprocessed data is put into a warehouse, data quality management is carried out, data native indexes are formed, the data put into the warehouse is constructed into a data mart, and reliable support is provided for subsequent index derivation.
The data verification and auditing mainly refers to verifying the acquired data according to the data format and the data content agreed by data acquisition, outputting the data which do not meet the requirements to an error library, generating detailed error records, and returning the error data and error logs to a data provider for the data provider to analyze and modify a data output program. And for the data quality which does not reach the check minimum standard, the data is required to be retransmitted.
The data cleaning is to clean various data such as omission, error, inconsistency and the like, and ensure the correctness and consistency of the data. Data cleansing mainly comprises three aspects: the data is incomplete data, which is characterized by some essential information missing, and needs to be filtered out. And the second is wrong data, which is caused by that the service system is not sound enough and the necessary logic judgment is not carried out after receiving the input and the data is directly written into the background database. Such problems are discovered and screened by data correlation and data boundary setting. Thirdly, data are lost and repeated, and repeated data and lost data are found and processed according to a certain rule. Such as duplicate data with multiple records having identical main fields and inconsistent partial fields, etc. For each type of problem, the system of the invention screens and finds targeted cleaning rules, and the problem data is difficult to automatically process. The cleansing rule base may be dynamically extensible.
The data matching means that the collected data needs to be matched with metadata information, the credit information can be integrated and merged if the matching is successful, otherwise, manual identification and intervention are carried out, and the information providing units which need to be returned and cannot be matched and determined are revised.
And after the data is cleaned and matched, data conversion and loading are carried out. The task of data conversion is mainly to carry out field mapping, code conversion and calculation filling of some specific fields on the collected data and the data with inconsistent database table structures and code contents. And the data conversion module reads corresponding parameter configuration in the operation management system to work.
Data is extracted, converted, loaded and the like in the preprocessing process, and the quality problem of the data may be caused by system integration or historical data problems, so that quality management needs to be carried out on the data which is put into a warehouse, and the quality management mainly focuses on the following aspects:
correctness: whether the data is correctly represented in a real or verifiable source;
integrity: whether referential integrity between data exists or is consistent;
consistency: whether data is consistently defined or understood;
completeness: whether all required data exists;
effectiveness: whether the data is within acceptable ranges as defined by the enterprise.
Finally, there is standardized data in the original database, including the extracted native fields and corresponding values. Each native field is a native index.
As shown in fig. 2, for processing the value-added tax invoice data, the invoice information field is simple, but different native fields are formed according to the invoice state and the transaction amount. Firstly, cleaning invoice data, judging whether the invoice is a waste invoice, removing duplication of the judged waste invoice or valid invoice according to an invoice number (codenumber), and then obtaining a corresponding native field according to whether the amount of money is positive; and for the valid invoice, the sum is positive and marked as the valid expense sum, otherwise, the sum is not positive and marked as the valid red stroke sum. And then further integrating the data to obtain effective and red stroke amount, and calculating T1 and T2 and tax rate according to different marketing items. Wherein T1 is sales tax amount-intake tax amount, T2 is total sales amount, and tax rate is T1/T2. The effective and Red amounts, T1, T2, and tax rate are also native fields.
Step 2: and performing index derivation according to the original indexes to generate a derived index library.
And performing index derivation aiming at the acquired standardized original indexes. On one hand, by combining with expert experience and by means of deep knowledge of enterprise risk dimension, risk derivative indexes can be constructed through the primary indexes. For example, when a large number of cases exist in an enterprise, the enterprise may face some risks, which are related to case concentration, case roles, case types, case amounts, and the like at a certain time. If the enterprise is reported and the complaint amount is large in the future, the enterprise may face a large amount of liabilities in the future to influence the operation development of the enterprise; if the enterprise is the original report and the complaint amount is large and the cases are more, the self-protection capability of the enterprise is poor, the operation management has a leak, and the excessive energy is involved by external disputes, so that the enterprise development is not facilitated. Therefore, based on this analysis, the present invention can construct derived indicators: counting the number of cases which are reported by an enterprise, the sum of cases which are reported by the enterprise and the like as new indexes for subsequent analysis, and counting other new indexes according to expert experience; on the other hand, feature engineering can be adopted, Xgboost (eXtreme Gradient boosting) algorithm is utilized to select features, and combinability derivation indexes are quickly constructed, wherein the features refer to original indexes. And aggregating all the derived indexes together to form a data derived tag library.
The method utilizes the Xgboost algorithm to select the characteristics and selects according to the importance of the characteristics. The selection process comprises the following steps: constructing decision trees for the features by using an Xgboost algorithm, and calculating the improved performance measurement quantity of each feature on the splitting points in each decision tree, wherein the performance measurement can be Gini purity of the selected splitting node or other measurement functions; taking the value of the improved performance measurement calculated for the feature nodes as a weight, recording the weight and the times selected by the decision tree by each feature node, wherein the improved performance measurement of one feature for the split point is larger (closer to the root node), the weight is larger, the more decision trees are selected, and the more important the feature is; and finally, for each feature, carrying out weighted summation on the weight values and times of the feature in all decision tree trees, and then averaging to obtain the importance score of the feature. And sorting the features according to the importance scores, wherein the higher the score is, the stronger the importance is, and selecting the features with high importance from the features for derivation. An importance score threshold may be preset to classify whether a feature belongs to a feature of strong importance.
When the combinability derivation indexes are constructed, a plurality of indexes with strong importance can be randomly selected, and new indexes are obtained by utilizing mathematical operation. Mathematical operation formulas, such as calculations of summation, averaging, and the like, can be designed in advance, and the features are combined to obtain new features. Features may also be selected for stitching to form new features.
The invention focuses on the comprehensiveness and accuracy of enterprise data dimensionality, and for analyzed data comprising enterprise main bodies and enterprise owners, constructed risk indexes comprise primary indexes and derivative indexes, so that the enterprise risk monitoring method can monitor enterprise risks more comprehensively compared with the prior art.
And step 3: and mining the association relationship between the enterprise main body and the enterprise main body by using the knowledge graph.
Enterprise risk is not just the risk of the enterprise, but often the risk can be conducted through its associated enterprises, such as the holdings or investments, or the enterprise risk can be mined through the associated transactions. Therefore, aiming at risk enterprises in the enterprise historical business and all-dimensional data thereof, such as group relation, investment and financing relation, upstream-downstream relation, guarantee relation, arbitrary role relation between the enterprises and individuals, actual control relation, beneficiary relation, associated complaint relation and the like, conducting paths of entities, attributes, relations and associated relation are subjected to detailed analysis, knowledge is summarized and extracted, then the knowledge is subjected to simulation learning by using a knowledge map technology, the extraction and classification of the knowledge are realized, the relation extraction and the knowledge fusion are performed, the bottom layer penetration of the enterprises is realized, the risk conduction enterprises are positioned by using a label propagation algorithm, the future potential risks of the enterprises are predicted in time, and the enterprise risk early warning is performed. For example, through the penetration of the equity, a company A finds that a company B and a company C which are externally held and have a large number of associated invoice transactions with a company D which is externally held by a legal representative of the company A, so that the income scale is increased virtually, and the taxes are reduced through cost items; therefore, internal transactions among enterprises can be found through the equity relation and the job information among the enterprises, and the association risk among the enterprises is deeply mined.
The established knowledge graph takes the enterprises or the individuals as entity nodes, the node attributes comprise risk probabilities of the enterprises or the individuals, and the relationships among the nodes represent the enterprises/the individuals and the enterprises/the individuals.
When the association relationship is mined, such as through stock right penetration, a stock right relationship network among enterprises is established, through relationship combing of financing relationship, occupational relationship, transaction relationship, guarantee relationship and the like, the knowledge graph technology is adopted, and the association knowledge network among the enterprises is established, so that reliable source support is provided for subsequent enterprise risk discovery, monitoring and early warning.
The method carries out bottom layer penetration on the enterprises, establishes group relations, investment relations, upstream and downstream relations and guarantee relations between the enterprises, arbitrary roles, actual control, consistent action relations between the enterprises and individuals and the like, timely predicts the potential risks in the future through the conduction effect of the associated risks, carries out clue preservation through the risk information of the enterprises and provides reliable clue basis for subsequent risk disposal.
And 4, step 4: and selecting the characteristics for enterprise risk prediction, constructing an enterprise risk early warning rule, and storing the enterprise risk early warning rule into a risk rule base.
Firstly, indexes in an original database and a derivative database are integrated, risk indexes with strong quality discrimination are screened out, the embodiment of the invention calculates the IV (Information Value) Value of each index, and then the index meeting the requirement is selected.
The magnitude of the IV value represents the strength of the prediction capability of the index, and the larger the IV value is, the stronger the prediction capability of the index is. Generally, when the IV of the index is>When 0.025, the index is usable. In the embodiment of the invention, IV is selected when the rule indexes are screened>The index of 0.1 makes a risk prediction. Example of the invention, discovery IV>The index of 0.3 has the advantage of strong discrimination, but the number of indexes is small, and the indexes lower than 0.3 also have good discrimination, so for example, when the number of indexes is preset, the IV can be selected preferentially>Index 0.3, then IV>And selecting indexes according to IV value sequencing within the range of 0.1. And collecting training samples and constructing training data corresponding to each index. Each training sample includes a feature value and an enterprise risk probability. The IV value is calculated by the following method: grouping the training samples of the indexes, and carrying out equipartition grouping according to the number of the samples in the embodiment of the invention; set the IV value IV of n groups and i groupiThe calculation is as follows:
Figure BDA0003349682590000071
wherein, WOEiWeight of evidence, y, representing the ith groupiIndicates the number of responding clients in the ith group, niIndicates the number of unresponsive clients in the ith group, ysIndicates the number of all responding clients in the training sample set, nsRepresents the number of all unresponsive clients in the training sample set, PyiRepresents the proportion of the responding clients of the ith group in all the responding clients, PyiIndicating the proportion of the i-th group of non-responding clients to all non-responding clients. In the present invention, the responding client represents an enterprise at risk of alarming.
After obtaining the IV values of n groups, the IV values of the index are calculated as follows:
Figure BDA0003349682590000072
and after the indexes are screened according to the IV value, a risk early warning rule is further constructed according to the selected indexes. The rule threshold is determined through the sub-box of the rule index, and the rule risk level and the risk weight can be set by combining the risk degrees corresponding to different threshold intervals and the risk attribute of the rule dimension. And verifying the effectiveness of the enterprise risk early warning rule based on the historical data sample.
According to the importance of the index label and the enterprise risk requirement, the invention constructs different dimension risk rules, and covers the following steps for enterprise main bodies: dimension risk rules such as industry and commerce, judicial expertise, mortgage, tax involvement, abnormal operation, supervision and punishment, financial loan, public risk, financial risk, tax risk, industry risk, association relation and the like; for a personal subject, cover: anti-fraud, multi-head loan, social behavior, travel behavior, and other dimension risk rules. And (4) carrying out risk quantification according to the risk degree of each type of rule, and setting a risk threshold, a risk weight, a risk grade and the like. When risk quantification is carried out, enterprise risk levels, risk threshold values in different early warning rules, weight of risks judged by the different early warning rules and the like are set according to collected training data and expert experience. The set quantitative value is continuously updated according to the continuously collected data so as to improve the accuracy of risk early warning.
As shown in fig. 3, the decision engine includes the constructed risk early warning rule, and the variables required by the configuration rule, including the variable library, the constant library, and the parameter library, need to be imported in advance. The risk early warning rules are stored in the form of decision sets, decision tables or decision trees. The decision set outputs scattered early warning logics in a rule set form, the decision table outputs rules in a table form through the decision table, and the decision tree displays and outputs complex rules in a tree structure. The rule configuration mode and the variable parameter may be generated in advance. Wherein the decision tree may be generated using the Xgboost algorithm.
And 5: and carrying out risk early warning on the enterprise in real time by using the generated enterprise risk early warning rule.
The invention adopts an Extract-Transform-Load (ETL) process to manage data, and utilizes an ETL tool KETTLE to realize data access, cleaning and storage, and realize calculation and storage of derivative indexes. As shown in fig. 3, the decision engine is set to be linked with the key to realize the input of the screening index, and different functional modules of the decision engine are used to realize the configuration of rule threshold, score, grade and the like, the rule test, the rule deployment and the online, so as to realize the conversion from the derived data to the risk rule.
When real-time early warning is carried out, enterprise data collected at present are cleaned through a KETTLE, index data required by each risk rule are obtained, and the index data are input into a decision engine to carry out risk early warning, so that enterprise rule batch monitoring triggering, all rules are automatically scanned, and early warning results are fed back in a second level. And finally, according to the enterprise risk early warning result, combining the risk touch dimensionality, the risk rule score and the risk grade, comprehensively evaluating the enterprise risk early warning model score, realizing enterprise risk quantification, presenting monotonicity of the early warning score, and judging the enterprise risk grade according to the early warning score result, so that the risk enterprise can be quickly positioned, and providing the risk early warning signal details to provide reliable clue support for follow-up risk investigation and risk disposal. According to the method, the incidence relation between the enterprise and the enterprise owner is mined according to the knowledge graph, so that the risk conduction enterprise can be positioned by using a label propagation algorithm, the potential risk of the enterprise in the future is predicted in time, and the risk early warning of the enterprise is carried out.
The invention has two results display modes for enterprise risk early warning: firstly, adding monitoring to a monitored enterprise through a system to carry out automatic batch running and automatic output of results, thereby realizing visual display of a system interface; and secondly, an early warning report generated automatically by the monitored enterprise can be provided, and the visual output of risk early warning is realized.
Correspondingly, the invention provides an enterprise risk early warning device, which comprises: the system comprises a data preprocessing module, a native index extraction module, an index derivation module, an association relation mining module, an enterprise risk early warning rule generation module and a real-time risk early warning module.
The data preprocessing module is used for preprocessing the collected enterprise and enterprise owner data, including data verification and auditing, data cleaning, data matching, data conversion and loading. The preprocessing process in the specific data preprocessing module is described in step 1 above, and is not described here again.
The primary index extraction module is used for extracting primary fields recorded in the acquired original data of enterprises and enterprise owners, and storing each primary field serving as a primary index into an original database.
And the index derivation module reads the original indexes of the original database, performs index derivation and stores the derived new indexes into the derived database. The specific derivation method is described in step 2 above and will not be described herein.
And the incidence relation mining module is used for mining the incidence relation between the enterprise main body and the enterprise main body to generate a knowledge graph. In the knowledge graph, enterprises or individuals are used as entity nodes, the nodes are connected with each other by edges if there is a relationship, the node attribute comprises the risk probability of the enterprises or individuals, and the association relationship between the enterprise main body and the enterprise main body can be found by the connecting edges.
The enterprise risk early warning rule generation module firstly selects indexes with strong discrimination from an original database and a derivative database to predict enterprise risk, then sets a rule configuration scheme in advance according to training sample data and expert experience, generates enterprise risk early warning rules and stores the enterprise risk early warning rules in an early warning rule base. The specific selection of the index with strong discrimination and the generation of the early warning rule are recorded in the step 4 above, and are not further described here.
The real-time risk early warning module carries out batch processing and early warning on enterprise data acquired in real time by using an early warning rule base, including data cleaning, acquiring index data required by each rule, carrying out risk scoring by using each rule, and synthesizing early warning results of each rule to obtain a final enterprise risk value and a final risk level; meanwhile, based on the association relationship between the enterprise and the enterprise owner mined by the knowledge graph, the risk conduction enterprise is positioned by using a label propagation algorithm, and risk early warning is carried out on the enterprise.
Example (b): the device and the method of the invention are adopted to carry out risk early warning on a certain enterprise, and the early warning result is as follows:
and (3) enterprise name: hubei Qingyi Jia Endustrie Co., Ltd
And (3) touch risk early warning rules: suspected short-term false increase trade
Event details are as follows: the proportion of red fund sales in 2018 is 98.2%, the sales amount in the same year is about 2.14 hundred million, the red fund sales in the same year is 2.10 hundred million, the actual invoice amount for selling the items is about 400 ten thousand, and the transaction is suspected to be increased in a short period.
Potential risk of enterprises: the time difference between the invoicing and the red punching is not used for carrying out bill or contract financing and the like, so that a larger risk exists. The regular touch provides risk clue retention for the monitoring party, prompts the monitoring party in time, and checks the real operation and transaction condition of the risk enterprise.
In addition to the technical features described in the specification, the technology is known to those skilled in the art. The description of the known art is omitted. The embodiments described in the above embodiments do not represent all embodiments consistent with the present application, and various modifications or variations which may be made by those skilled in the art without inventive efforts based on the technical solution of the present invention are still within the protective scope of the present invention.

Claims (10)

1. An enterprise risk early warning method is characterized by comprising the following steps:
step 1: collecting information data of an enterprise main body and the enterprise main body, preprocessing the data, extracting native indexes and constructing an original database; storing standardized data in an original database, wherein the standardized data comprise extracted native fields and corresponding values, and each native field is a native index;
step 2: performing index derivation according to the original indexes of the original database, and storing derived new indexes into a derivative database;
the index derivation comprises: taking each native index as a feature, selecting the feature by using an Xgboost algorithm, and selecting a feature combination with strong importance to derive a new index;
and step 3: mining the association relationship between the enterprise main body and the enterprise main body by using a knowledge graph; taking enterprises or individuals as entity nodes, wherein the node attributes comprise risk probabilities of the enterprises or the individuals, and if the nodes have relationships, the nodes are connected by edges;
and 4, step 4: selecting indexes for enterprise risk prediction, and constructing enterprise risk early warning rules;
calculating the information quantity IV value of each index in the original database and the derivative database according to the collected training sample set, and selecting the index with IV greater than 0.1 for risk prediction;
the established enterprise risk early warning rules comprise risk early warning rules which are set in different dimensions of industry and commerce, judicial expertise, mortgage, tax involvement, abnormal operation, supervision and punishment, financial loan, risk public sentiment, financial risk, tax risk, industry risk and incidence relation for the enterprise; risk early warning rules with different dimensions of anti-fraud, multi-head loan, social behaviors and travel behaviors are set for business owners; setting risk grades in advance according to a training sample set and expert experience, and setting risk threshold values and risk weights of different enterprise risk early warning rules;
and 5: carrying out risk early warning on the enterprise in real time by using the generated enterprise risk early warning rule;
carry out batch processing to the enterprise data of gathering in real time, include: data cleaning, namely acquiring index data required by each rule, performing risk scoring by using each rule, and synthesizing early warning results of each rule to obtain a final enterprise risk value and a final risk grade;
based on the incidence relation between the enterprises and the enterprise owners mined by the knowledge graph, the risk conduction enterprises are positioned by using a label propagation algorithm, and risk early warning is carried out on the enterprises.
2. The method according to claim 1, wherein in step 1, the information data of the enterprise body comprises credit investigation information, industry and commerce change information, illegal complaint information, pledge information, debt and debt information, abnormal operation information, financial operation information, tax declaration information, invoice transaction information and enterprise public opinion information of the enterprise; the information data of the enterprise owner comprises personal credit information, judicial complaint information, multi-head loan information and fraud behavior information of the enterprise owner.
3. The method according to claim 1, wherein in step 1, the preprocessing comprises data verification and auditing, data cleaning, data matching, and data conversion and loading;
the data verification and audit refers to verifying and auditing the data format and content of the acquired information data;
data cleaning cleans data through the cleaning rule base that sets up, includes: (1) filtering data missing necessary information; (2) detecting and filtering error data by data association and setting data boundaries; (3) detecting and processing repeated data and missing data;
the data matching is to match the collected data with metadata information and integrate and merge the successfully matched data;
the data conversion and loading refers to the field mapping, code conversion and field calculation filling of data which are inconsistent with the database table structure and the coding content in the collected data.
4. The method according to claim 3, wherein in step 1, the preprocessed data are warehoused, and quality management is performed on the warehoused data, wherein the quality management includes quality management of five aspects of correctness, completeness, consistency, completeness and effectiveness.
5. The method according to claim 1 or 3, wherein in the step 1, the value-added tax invoice data is preprocessed, and the method for obtaining the native field is as follows: judging whether the value-added tax invoice obtained after data cleaning is a waste invoice, removing the weight of the judged waste invoice or valid invoice according to the invoice number, and obtaining a corresponding native field according to whether the invoice amount is positive, wherein if the waste invoice is positive, the waste invoice is marked as the waste invoice amount, otherwise, the waste invoice is marked as the waste red flushing amount; for the valid invoice, the sum is positive, the valid invoice is marked as the valid expense sum, and otherwise, the valid invoice is marked as the valid red money; further obtaining effective and red money amount, and calculating T1 and T2 and tax rate according to different sale items; wherein T1 is sales tax amount-intake tax amount, T2 is total sales amount, and tax rate is T1/T2; the effective and Red amounts, T1, T2, and tax rate are also native fields.
6. The method of claim 1, wherein the step 2, selecting features by using an Xgboost algorithm, comprises: constructing decision trees for the features by using an Xgboost algorithm, calculating the improved performance measurement of each feature on split points in each decision tree, taking the improved performance measurement as the weight of a feature node, and counting the times selected by the decision tree for each feature node; and for each feature, carrying out weighted summation on the weight and the times of the feature in all decision tree trees, and then averaging to obtain the importance score of the feature, wherein the higher the score is, the stronger the importance is, and the feature with the importance score exceeding a preset threshold is selected.
7. The method of claim 1, wherein in step 2, a new index is derived based on expert experience, comprising: counting the number of cases which are taken as being advertised by the enterprise and the amount of cases which are taken as being advertised by the enterprise.
8. The method according to claim 1, wherein in step 3, the relationships between nodes include a group relationship, a financing relationship, an upstream-downstream relationship, a guarantee relationship, an arbitrary role relationship between a business and an individual, an actual control relationship, a beneficiary relationship, and an associated complaint relationship.
9. The method as claimed in claim 1, wherein in step 4, when the number of indicators is preset, indicators with IV >0.3 are selected preferentially, and then the indicators are selected in the range of IV >0.1 according to the IV value.
10. An early warning device based on the early warning method of claim 1, comprising:
the data preprocessing module is used for preprocessing the acquired enterprise and enterprise owner data, including data verification and auditing, data cleaning, data matching, data conversion and data loading;
the primary index extraction module is used for extracting primary fields recorded in the acquired data of enterprises and enterprise owners, and storing each primary field serving as a primary index into an original database;
the index derivation module reads the original indexes of the original database, performs index derivation and stores derived new indexes into a derivation database; the index derivation comprises: (1) taking each native index as a feature, selecting the feature by using an Xgboost algorithm, and selecting a feature combination with strong importance to derive a new index; (2) deriving new indexes by expert experience, wherein the new indexes comprise statistics of the number of cases which are taken as being taken by an enterprise and the sum of the cases which are taken as being taken by the enterprise;
the incidence relation mining module is used for mining incidence relations between the enterprise main body and the enterprise main body to generate a knowledge graph; in the knowledge graph, enterprises or individuals are used as entity nodes, the nodes are connected with each other by edges if the nodes have relations, the node attributes comprise risk probabilities of the enterprises or individuals, and the association relation between an enterprise main body and the enterprise main body is mined by connecting the edges;
the enterprise risk early warning rule generation module selects indexes with strong discrimination from the original database and the derivative database to predict enterprise risk, then sets a rule configuration scheme in advance according to training sample data and expert experience, generates enterprise risk early warning rules and stores the enterprise risk early warning rules in the early warning rule base; the index with strong discrimination is obtained by calculating the information content IV of the index according to the training sample set, and taking the index with IV greater than 0.1 as the index with strong discrimination;
real-time risk early warning module utilizes early warning rule base to carry out batch processing and early warning to the enterprise data of real-time collection, includes: data cleaning, namely acquiring index data required by each rule, performing risk scoring by using each rule, and synthesizing early warning results of each rule to obtain a final enterprise risk value and a final risk grade; meanwhile, based on the association relationship between the enterprise and the enterprise owner mined by the knowledge graph, the risk conduction enterprise is positioned by using a label propagation algorithm, and risk early warning is carried out on the enterprise.
CN202111359407.XA 2021-11-11 2021-11-11 Enterprise risk early warning method and device Pending CN114066242A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111359407.XA CN114066242A (en) 2021-11-11 2021-11-11 Enterprise risk early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111359407.XA CN114066242A (en) 2021-11-11 2021-11-11 Enterprise risk early warning method and device

Publications (1)

Publication Number Publication Date
CN114066242A true CN114066242A (en) 2022-02-18

Family

ID=80273002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111359407.XA Pending CN114066242A (en) 2021-11-11 2021-11-11 Enterprise risk early warning method and device

Country Status (1)

Country Link
CN (1) CN114066242A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114781937A (en) * 2022-06-20 2022-07-22 华网领业(杭州)软件有限公司 Method and device for pre-paid card enterprise risk early warning and storage medium
CN115099586A (en) * 2022-06-10 2022-09-23 上海异工同智信息科技有限公司 Method and device for identifying operation risk
CN115269879A (en) * 2022-09-05 2022-11-01 北京百度网讯科技有限公司 Knowledge structure data generation method, data search method and risk warning method
CN115730605A (en) * 2022-11-21 2023-03-03 刘奕涵 Data analysis method based on multi-dimensional information
CN115760368A (en) * 2022-11-24 2023-03-07 中电金信软件有限公司 Credit business approval method and device and electronic equipment
CN115860927A (en) * 2023-03-02 2023-03-28 湖南财信数字科技有限公司 Data analysis method and device, computer equipment and storage medium
CN116485559A (en) * 2023-06-21 2023-07-25 杭州大鱼网络科技有限公司 Batch insurance business processing risk monitoring method and system
WO2023178767A1 (en) * 2022-03-24 2023-09-28 北京邮电大学 Enterprise risk detection method and apparatus based on enterprise credit investigation big data knowledge graph
CN117151867A (en) * 2023-09-20 2023-12-01 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117217568A (en) * 2023-07-24 2023-12-12 广东省投资和信用中心(广东省发展和改革事务中心) Economic monitoring method and system based on market subject information resource library
CN117350547A (en) * 2023-11-14 2024-01-05 深圳市明心数智科技有限公司 Method, device, equipment and storage medium for determining risk processing scheme of order
US11880803B1 (en) * 2022-12-19 2024-01-23 Tbk Bank, Ssb System and method for data mapping and transformation
CN117556225A (en) * 2024-01-12 2024-02-13 杭银消费金融股份有限公司 Pedestrian credit data risk management system
CN117151867B (en) * 2023-09-20 2024-04-30 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023178767A1 (en) * 2022-03-24 2023-09-28 北京邮电大学 Enterprise risk detection method and apparatus based on enterprise credit investigation big data knowledge graph
CN115099586A (en) * 2022-06-10 2022-09-23 上海异工同智信息科技有限公司 Method and device for identifying operation risk
CN114781937A (en) * 2022-06-20 2022-07-22 华网领业(杭州)软件有限公司 Method and device for pre-paid card enterprise risk early warning and storage medium
CN115269879A (en) * 2022-09-05 2022-11-01 北京百度网讯科技有限公司 Knowledge structure data generation method, data search method and risk warning method
CN115730605A (en) * 2022-11-21 2023-03-03 刘奕涵 Data analysis method based on multi-dimensional information
CN115730605B (en) * 2022-11-21 2024-02-02 暨南大学 Data analysis method based on multidimensional information
CN115760368A (en) * 2022-11-24 2023-03-07 中电金信软件有限公司 Credit business approval method and device and electronic equipment
US11880803B1 (en) * 2022-12-19 2024-01-23 Tbk Bank, Ssb System and method for data mapping and transformation
CN115860927A (en) * 2023-03-02 2023-03-28 湖南财信数字科技有限公司 Data analysis method and device, computer equipment and storage medium
CN116485559B (en) * 2023-06-21 2023-09-01 杭州大鱼网络科技有限公司 Batch insurance business processing risk monitoring method and system
CN116485559A (en) * 2023-06-21 2023-07-25 杭州大鱼网络科技有限公司 Batch insurance business processing risk monitoring method and system
CN117217568A (en) * 2023-07-24 2023-12-12 广东省投资和信用中心(广东省发展和改革事务中心) Economic monitoring method and system based on market subject information resource library
CN117151867A (en) * 2023-09-20 2023-12-01 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117151867B (en) * 2023-09-20 2024-04-30 江苏数诚信息技术有限公司 Enterprise exception identification method and system based on big data
CN117350547A (en) * 2023-11-14 2024-01-05 深圳市明心数智科技有限公司 Method, device, equipment and storage medium for determining risk processing scheme of order
CN117350547B (en) * 2023-11-14 2024-03-26 深圳市明心数智科技有限公司 Method, device, equipment and storage medium for determining risk processing scheme of order
CN117556225A (en) * 2024-01-12 2024-02-13 杭银消费金融股份有限公司 Pedestrian credit data risk management system
CN117556225B (en) * 2024-01-12 2024-04-05 杭银消费金融股份有限公司 Pedestrian credit data risk management system

Similar Documents

Publication Publication Date Title
CN114066242A (en) Enterprise risk early warning method and device
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN110704572B (en) Suspected illegal fundraising risk early warning method, device, equipment and storage medium
Wahlén Opportunities for making the invisible visible: Towards an improved understanding of the economic contributions of NTFPs
CN112132233A (en) Criminal personnel dangerous behavior prediction method and system based on effective influence factors
CN111738843B (en) Quantitative risk evaluation system and method using running water data
CN113449964A (en) Enterprise financial risk monitoring and early warning system and monitoring and early warning method
CN113177839A (en) Credit risk assessment method, device, storage medium and equipment
CN114118793A (en) Local exchange risk early warning method, device and equipment
Verma et al. Data mining: next generation challenges and futureDirections
CN112419029B (en) Similar financial institution risk monitoring method, risk simulation system and storage medium
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
CN114819494A (en) Enterprise risk early warning method, device, equipment and medium
KR102499182B1 (en) Loan regular auditing system using artificia intellicence
TWM622331U (en) System and device for risk prediction therefor
CN113379211A (en) Block chain-based logistics information platform default risk management and control system and method
CN111951105A (en) Intelligent credit wind control system based on multidimensional big data analysis
Grandstaff et al. An analysis of information systems literature: contributions to fraud research
TWI817237B (en) Method and system for risk prediction and computer-readable medium therefor
CN117635304A (en) Construction method of credit rating system of middle and small micro enterprises
CN117764724A (en) Intelligent credit rating report construction method and system
CN114612062A (en) False recruitment early warning method and system
Wang Application of Machine Learning Models in Detecting Financial Fraud in Publicly Traded Companies
Anusha et al. An Approach to Loan Approval prediction Using Boosting Ensemble Learning
KR20240013349A (en) Method and device for operating a platform for diagnosing corporate insolvency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination