WO2022121083A1 - 基于关联分析FP-Tree算法的企业风险预警方法 - Google Patents

基于关联分析FP-Tree算法的企业风险预警方法 Download PDF

Info

Publication number
WO2022121083A1
WO2022121083A1 PCT/CN2021/071403 CN2021071403W WO2022121083A1 WO 2022121083 A1 WO2022121083 A1 WO 2022121083A1 CN 2021071403 W CN2021071403 W CN 2021071403W WO 2022121083 A1 WO2022121083 A1 WO 2022121083A1
Authority
WO
WIPO (PCT)
Prior art keywords
risk
enterprise
index
early warning
rule
Prior art date
Application number
PCT/CN2021/071403
Other languages
English (en)
French (fr)
Inventor
吴志雄
甘建武
李晓琼
黄鼎
Original Assignee
南威软件股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南威软件股份有限公司 filed Critical 南威软件股份有限公司
Publication of WO2022121083A1 publication Critical patent/WO2022121083A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Definitions

  • the invention belongs to the field of enterprise risk early warning, and in particular relates to an enterprise risk early warning method based on an association analysis FP-Tree algorithm.
  • Enterprise risk early warning is an effective means to establish a risk assessment system, and then carry out risk pre-control, resolve the occurrence of risks, and minimize the losses caused by risks. Carrying out risk analysis and management of enterprise activities, preventing and resolving the occurrence of risks, and controlling the losses caused by risks to a minimum have become one of the important measures to ensure enterprise operation activities and create maximum benefits.
  • the enterprise risk early warning index system is a yardstick and an important basis for measuring the financial risk status of enterprises.
  • enterprise risk is divided into internal risk and external risk, including four comprehensive risk indicators: financial, technical, operational and strategic.
  • Financial risk factors including liquidity, financing, investment, liquidation, profitability, asset utilization, growth, etc.
  • Operational risk factors including judicial, abnormal operation, administrative punishment, etc.
  • enterprise risk early warning mostly adopts the following methods: in terms of external environmental risks, the six forces analysis model is used for reference to analyze the competitive environment in which enterprises are located; in terms of internal environmental risks, combined with the availability of domestic and foreign research literature and data, establish a
  • the index system is mainly based on financial risk factors, technical risk factors, operational risk factors, and strategic risk factors, and the commonly used rating methods include discriminant analysis method, comprehensive evaluation method, fuzzy analysis method, etc. and take corresponding countermeasures.
  • the early warning of the existing technology is based on the analysis of single indicator data or the overall indicator data, and due to the lack of basic professional knowledge of the enterprise, the high dimension of enterprise data and the current situation of large amount of enterprise data, and the current enterprise risk early warning is in information acquisition, update, and processing. It takes a long time to analyze and analyze, and cannot realize dynamic processing, which seriously affects the timeliness of risk early warning, which makes enterprise risk early warning have serious time errors to a large extent.
  • the purpose of the present invention is to provide an enterprise risk early warning method based on the association analysis FP-Tree algorithm, which can not only analyze enterprise risks from single index data, but also can integrate two or more index data to mine enterprise risks, and more comprehensively mine the existing enterprise risks. risk.
  • the technical scheme of the present invention is: a kind of enterprise risk early warning method based on association analysis FP-Tree algorithm, comprises the following steps:
  • Step S2 according to the risk index system, use big data analysis to form risk rules, that is, if the value of one or more indicators is equal to a predetermined value or a predetermined interval value, it is considered that the enterprise may have corresponding risks, and the risk rule set B is obtained:
  • X k is a subset of the index system X; risk k is the corresponding risk text description obtained by X k analysis and reasoning;
  • step S4 based on the training index data set, the risk level corresponding to the enterprise is obtained by calculating the enterprise credit dimension data, and the calculation formula is as follows:
  • creditScore new represents the normalized value of the latest credit risk score, 100 ⁇ creditScore new is used as the basic score of the risk score;
  • creditScore i represents the credit risk score of the previous i years, Represents the stability of the credit score;
  • riskListCount represents the number of blacklisted or untrustworthy lists in the past 5 years, 4.
  • riskListCount represents the risk of being blacklisted or untrustworthy;
  • Step S5 using the mutual entropy-interval nesting method to perform binning and chi-square test correlation screening indicators, characterize the indicators according to the binning results, and save the binning rules and the remaining index list after screening;
  • Step S6 obtain the association rule set: use the association analysis FP-Tree algorithm to mine the association rules of the enterprise behavior of each risk level of the enterprise, traverse the association rules and integrate them into an association rule set composed of an index set, a risk level and a confidence degree,
  • the association rule set consists of elements in the form of "(index set):(risk level, confidence level)" and the confidence level is greater than 0.5;
  • A represents a certain index set
  • B represents a certain risk level
  • count(A ⁇ B), count(A) respectively represent the number of samples in index set A and risk level B that exist in the same sample at the same time, and the number of samples in index set A The number of samples in which elements exist in the same sample;
  • Step S7 according to the association rule set obtained in step S6 and the risk rule set obtained in step S2, based on the enterprise index data set of the enterprise to be warned, the early warning enterprise is warned, the association rules hit by the early warning enterprise, and the enterprise risk level and possible risk points are predicted. , output the warning result.
  • step S5 the specific implementation method of using the mutual entropy-interval nesting method to perform binning and chi-square test correlation screening indicators is as follows:
  • the supervised cross-entropy-interval nesting method is used to bin the indicator variables and characterize the continuous variables according to the binning results to reduce the overfitting of the model. risk;
  • Step 1 preset a threshold threshold and the maximum number of bins n;
  • Step 1 take Divide [a,b] into two intervals [a,a 0 ], (a 0 ,b], combine mutual information and information entropy, and propose a new category uncertainty evaluation function MiEntropy:
  • t is the interval
  • m is the number of categories
  • p(ci ), p(t), p(t, ci ) are the ratio of the number of samples of class ci in the training set, the number of samples with index values in interval t, the number of samples with index values in interval t and belonging to class ci to the total number of samples in the training set, p(c i
  • the index value The ratio of the number of samples in the interval t and belonging to ci to the number of samples of the index value in the interval t, ⁇ is a hyperparameter and satisfies ⁇ [0,1];
  • Step 2 if MiEntropy([a,a 0 ]) ⁇ throshold or MiEntropy((a 0 ,b]) ⁇ throshold, add a 0 to the Boundary, and go to step 3;
  • the third step is to obtain the number of bins numb(I) of the indicator I according to Boundary:
  • the chi-square test correlation screening indicators are specifically: the correlation between the indicator variables and the enterprise risk is tested by the chi-square test, and the indicators that are not helpful for early warning are filtered out.
  • the results of the correlation analysis by the chi-square test are divided based on supervised binning. Sample space.
  • step S7 is as follows:
  • the transformation of indicator data is determined by the binning rule in step S5, and the original indicator data is converted into the corresponding character identifier to obtain the transformed indicator of the enterprise set Among them, C i is the result set after characterizing each index value of the ith sample enterprise; Represents the characterized result value of the ci th index of the ith sample enterprise;
  • the set of indicators representing the risk rule hit by the i -th early warning enterprise qi-th Indicates the risk level of the risk rule hit by the i -th early warning enterprise qi-th; Represents the confidence of the risk rule hit by the i -th early warning enterprise qi-th;
  • the risk level is determined by the risk level and confidence of the hit association rule, convert the risk level of the association rule into a corresponding score, and use the confidence as a weight to perform a weighted average to calculate the final risk score, And obtain the risk level according to the score interval of each risk level;
  • high risk is represented by P0
  • medium and high risk has two levels, namely P1 and P2, and the risk of P1 is greater than P2, low risk is represented by P3, and no risk is represented by P4;
  • riskScore i represents the risk score of the ith early warning enterprise ;
  • SP ij represents the risk level score of the risk rule hit by the ith early warning enterprise jth;
  • P ij represents the risk level of the jth hit risk rule of the ith early warning enterprise;
  • Conf ij represents the jth hit of the ith early warning enterprise The confidence level of each hit risk rule;
  • ri represents the sum of the confidence levels of the risk rule hit by the i -th early warning enterprise;
  • riskLevel is the function of mapping the risk score to the risk level;
  • the risk description is obtained by splicing each element in the risk point set with a semicolon.
  • the present invention has the following beneficial effects:
  • the invention is a specific application of the association analysis FP-Tree algorithm in the field of enterprise risk early warning analysis, which fills the blank of the association analysis algorithm in the field of enterprise risk early warning analysis, and the data processing before the FP-Tree mining association rules——using chi-square
  • the inspection principle is used for index screening and binning, and by removing indicators with weak correlations to improve the accuracy of early warning, it can more comprehensively explore the risks of corporate behavior.
  • the enterprise risk early warning analysis method based on the association analysis algorithm FP-Tree described in the present invention is a black box for the use of the end user, the end user does not need to care about the specific model building process, and only needs to and behavior information data are saved and updated in the enterprise information database, and the present invention displays the obtained early warning clues in the domain model risk clue list through the interface of the risk early warning system.
  • Figure 1 is a schematic structural diagram of the method of the present invention.
  • the invention provides an enterprise risk early warning method based on the association analysis FP-Tree algorithm, comprising the following steps:
  • Step S2 according to the risk index system, use big data analysis to form risk rules, that is, if the value of one or more indicators is equal to a predetermined value or a predetermined interval value, it is considered that the enterprise may have corresponding risks, and the risk rule set B is obtained:
  • X k is a subset of the index system X; risk k is the corresponding risk text description obtained by X k analysis and reasoning;
  • step S4 based on the training index data set, the risk level corresponding to the enterprise is obtained by calculating the enterprise credit dimension data, and the calculation formula is as follows:
  • creditScore new represents the normalized value of the latest credit risk score, 100 ⁇ creditScore new is used as the basic score of the risk score;
  • creditScore i represents the credit risk score of the previous i years, Represents the stability of the credit score;
  • riskListCount represents the number of blacklisted or untrustworthy lists in the past 5 years, 4.
  • riskListCount represents the risk of being blacklisted or untrustworthy;
  • Step S5 using the mutual entropy-interval nesting method to perform binning and chi-square test correlation screening indicators, characterize the indicators according to the binning results, and save the binning rules and the remaining index list after screening;
  • Step S6 obtain the association rule set: use the association analysis FP-Tree algorithm to mine the association rules of the enterprise behavior of each risk level of the enterprise, traverse the association rules and integrate them into an association rule set composed of an index set, a risk level and a confidence degree,
  • the association rule set consists of elements in the form of "(index set):(risk level, confidence level)" and the confidence level is greater than 0.5;
  • A represents a certain index set
  • B represents a certain risk level
  • count(A ⁇ B), count(A) respectively represent the number of samples in index set A and risk level B that exist in the same sample at the same time, and the number of samples in index set A The number of samples in which elements exist in the same sample at the same time;
  • Step S7 according to the association rule set obtained in step S6 and the risk rule set obtained in step S2, based on the enterprise index data set of the enterprise to be warned, the early warning enterprise is warned, the association rules hit by the early warning enterprise, and the enterprise risk level and possible risk points are predicted. , output the warning result.
  • the present invention adopts the following scheme steps to realize:
  • the name of the i-th metric For example, by studying the behavior data of enterprises in various links such as administrative inspection behavior, administrative punishment behavior information, administrative compulsory behavior, contract performance history, product quality inspection, complaint and report information, credit rating evaluation, etc., as well as the enterprise's own attributes, seven first-level indicators are designed. , a risk indicator system consisting of 30 secondary indicators and 81 tertiary indicators;
  • Step 2 According to the index system, the existing big data analysis is used to form risk rules, that is, the value of one or more indicators is equal to a specific value or belongs to a specific range value, so as to infer that the enterprise may have a certain risk, and obtain the risk.
  • Rule Set B
  • the set of indicators representing the risk rule hit by the i -th early warning enterprise qi-th Indicates the risk level of the risk rule hit by the i -th early warning enterprise qi-th; Represents the confidence of the risk rule hit by the i -th early warning enterprise qi-th;
  • enterprise-related behavior data enterprise basic information, administrative inspection behavior information, administrative punishment behavior information, administrative compulsory behavior information, contract performance history, complaint report information, enterprise credit score, enterprise product information table) etc.
  • Step 4 Obtain the target variable of the training sample, and calculate and obtain its corresponding risk level according to the enterprise credit dimension data design formula. Based on the data of the company's credit score in the past 5 years and the number of blacklisted or untrustworthy lists in the past 5 years, the risk level assessment of the training data set samples is carried out. , as the target variable "Y" of the training data set, and then input the obtained target variable "Y" and the training index data set into the association analysis algorithm for association rule mining;
  • Step 5 Use the cross-entropy-interval nesting method to perform binning and chi-square test to filter the indicators (filter the indicators that are not very helpful to the early warning model), characterize the indicators according to the binning results, and save the binning rules and screening. List of remaining indicators after.
  • the chi-square binning characterization index variable described in step 5 is specifically: for the index of the discrete variable attribute and the index of the continuous variable with more than 5 value types, the supervised mutual entropy-interval nesting method is used for the index variable. Perform binning and characterize continuous variables according to the binning results to reduce the risk of model overfitting. For example, for the indicator "registered capital (x1)", the original indicator data is divided into 3 boxes under chi-square binning, the character After transformation, the value of this indicator is converted to x1_bin0, x1_bin1 or x1_bin2.
  • Step 1 preset a threshold threshold and the maximum number of bins n;
  • Step 1 take Divide [a,b] into two intervals [a,a 0 ], (a 0 ,b], combine mutual information and information entropy, and propose a new category uncertainty evaluation function MiEntropy:
  • t is the interval
  • m is the number of categories
  • p(ci ), p(t), p(t, ci ) are the ratio of the number of samples of class ci in the training set, the number of samples with index values in interval t, the number of samples with index values in interval t and belonging to class ci to the total number of samples in the training set, p(c i
  • the index value The ratio of the number of samples in the interval t and belonging to ci to the number of samples of the index value in the interval t, ⁇ is a hyperparameter that satisfies ⁇ [0,1], and the default value is 0.5.
  • Step 2 if MiEntropy([a,a 0 ]) ⁇ throshold or MiEntropy((a 0 ,b]) ⁇ throshold, add a 0 to the Boundary, and go to step 3;
  • the third step is to obtain the number of bins numb(I) of the indicator I according to Boundary:
  • the chi-square test correlation screening indicators are specifically: using the chi-square test to test the correlation between the indicator variables and enterprise risks, and filter the indicators that are not helpful for early warning, but the results of the traditional chi-square test for correlation analysis depend on the sample space. Different divisions may result in different inference results, and this proposal divides the sample space based on supervised binning, which has high test power and is robust.
  • Step 6 obtain an association rule set. Based on the above steps, obtain the complete enterprise training sample index set and the target variable "Y”, use the classic association rule mining algorithm FP-Tree to mine the association rules of the enterprise behavior of each risk level of the enterprise in the training data, traverse the association rules and integrate them It is an association rule set consisting of an indicator set, a risk level and a confidence level.
  • the association rule set consists of elements in the form of "(index set):(risk level, confidence level)” and the confidence level is greater than 0.5.
  • the association rule set mined by the FP-Tree algorithm is as follows: ⁇ (x1_bin0,x3_bin1,x7_bin3,x15_bin4):(P0,0.98),... ⁇ .
  • association rule described in step 6 is to reflect the interdependence and correlation between one thing and other things. If there is an association relationship between things, then one of the things can be predicted by other things. Based on the extension of this idea, the association analysis algorithm is applied to enterprise risk early warning, and the classic association rule mining algorithm FP-Tree is used to mine the association rules between various risk levels of enterprises and enterprise behaviors.
  • Step 7 According to the obtained association rules and the risk rules of the index system sorted out in step 2, the enterprise to be warned is warned, the association rules hit by the enterprise are warned, and the enterprise risk level and possible risk points are predicted.
  • the alert result can be obtained according to the following steps:
  • step 5 characterize the indicator data of the enterprise to be alerted.
  • the transformation of the indicator data is determined by the binning rules in step 5.
  • the original indicator data is converted into the corresponding character identifier, and the indicator set of the enterprise is obtained.
  • the risk level is determined by the risk level and confidence level of the hit association rule.
  • the risk level of the association rule is converted into a corresponding score, and the confidence level is used as a weight to perform a weighted average to obtain the final risk score.
  • the score interval obtains the risk level.
  • Example of early warning result display An enterprise's risk level is P0 (high risk), and the clues are described as: lack of annual report publicity; frequent changes in enterprise registration, risk of unstable operation; possible risk of fraud in operation and finance; expired or invalid business license; affiliated enterprises If the proportion of abnormal operations is too high, there is a risk of being included in the operation of the enterprise; if the proportion of untrustworthy affiliates is too high, there is a risk of dishonesty.

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明涉及一种基于关联分析FP-Tree算法的企业风险预警方法。构建企业指标数据集,而后运用互熵-区间套法进行分箱、卡方检验相关性筛选指标,最后运用关联分析FP-Tree算法进行企业风险预警。本发明不仅能够从单指标数据分析企业风险,也可以综合两个及以上指标数据挖掘企业风险,更全面挖掘企业存在的风险。

Description

基于关联分析FP-Tree算法的企业风险预警方法 技术领域
本发明属于企业风险预警领域,具体涉及一种基于关联分析FP-Tree算法的企业风险预警方法。
背景技术
企业活动作为集合经济、技术、管理、组织等各方面的综合性社会活动,在各个方面都存在着不确定性。企业风险预警就是通过建立风险评估体系,进而进行风险预控,化解风险的发生,并将风险造成的损失降至最低程度的有效手段。开展企业活动的风险分析与管理,预防和化解风险的发生,将风险造成的损失控制在最低限度,已成为保证企业经营活动并创造最大效益的重要措施之一。企业风险预警指标体系是衡量企业金融风险状况的标尺和重要依据。构建符合企业特点的风险预警指标体系要遵循以下基本原则:(1)全面性原则;(2)科学性原则;(3)目的性原则;(4)典型性原则;(5)可操作性原则;(6)公正性原则。
现有技术中,将企业风险分为内部风险和外部风险,包含四大风险综合指标:财务类、技术类、经营类和战略类。
(1)财务风险因子:包括流动性、筹资、投资、清偿、盈利、资产利用、成长等方面。
(2)技术风险因子:包括商标、专利、软件著作权、作品、关键技术等。
(3)经营风险因子:包括司法、经营异常、行政处罚等。
(4)战略风险因子:包括竞品、企业关联、发展历史等。
当前,企业风险预警多采用以下方法:在外部环境风险方面,借鉴六力分析模型,分析企业所处的竞争环境;在内部环境风险方面,结合国内外研究文献和数据的可得性,建立以财务风险因子、技术风险因子、经营风险因子、战略风险因子为主的指标体系,而常用的评级的方法有判别分析法、综合评判法、模糊分析法等,最后,依据评判结果设置预警区间,并采取相应对策。
现有技术的预警是从单指标数据分析或整体指标数据进行分析,且由于企业基本专业知识欠缺、企业数据维度高以及企业数据量大的现状,以及目前企业风险预警在信息获取、更新、处理和分析上都需要较长的时间,且无法实现动态处理,严重影响了风险预警的时效性,这在很大程度上使得企业风险预警存在严重的时间误差。
发明内容
本发明的目的在于提供一种基于关联分析FP-Tree算法的企业风险预警方法,不仅能够从单 指标数据分析企业风险,也可以综合两个及以上指标数据挖掘企业风险,更全面挖掘企业存在的风险。
为实现上述目的,本发明的技术方案是:一种基于关联分析FP-Tree算法的企业风险预警方法,包括如下步骤:
步骤S1、根据历史企业相关行为数据,分析衡量企业风险状况的标尺和重要依据,设计风险指标体系X={x 1,x 2,…,x i},x i表示风险指标体系的第i个指标的名称;
步骤S2、根据风险指标体系,运用大数据分析形成风险规则,即由一个或多个指标的值等于一预定值或预定区间值,认为企业可能存在相应风险,得到风险规则集B:
Figure PCTCN2021071403-appb-000001
其中,X k为指标体系X的子集;risk k为由X k分析推理得到的相对应的风险文字描述;
步骤S3、采集企业相关行为数据,并构建企业风险预警模型的训练指标数据集及待预警企业指标数据集,训练指标数据集中训练集:测试集=4:1;
步骤S4、基于训练指标数据集,通过企业信用维度数据计算获得企业对应的风险等级,计算公式如下:
Figure PCTCN2021071403-appb-000002
Figure PCTCN2021071403-appb-000003
其中,creditScore new表示最新信用风险分值归一化后数值,100·creditScore new作为风险得分的基础分值;creditScore i表示前i年信用风险分值,
Figure PCTCN2021071403-appb-000004
代表信用分数的稳定性情况;riskListCount表示近5年来列入黑名单或失信名单次数,4·riskListCount代表被列入黑名单或失信名单风险;
步骤S5、运用互熵-区间套法进行分箱、卡方检验相关性筛选指标,根据分箱结果进行指标的字符化,并保存分箱规则以及筛选后剩余的指标列表;
步骤S6、获取关联规则集:运用关联分析FP-Tree算法挖掘企业各风险等级的企业行为的关联规则,遍历关联规则并将其整合为由指标集、风险等级及置信度组成的关联规则集,关联规则集由以“(指标集):(风险等级,置信度)”形式且置信度大于0.5的元素构成;
Figure PCTCN2021071403-appb-000005
其中,A表示某一指标集;B表示某一风险等级;
Figure PCTCN2021071403-appb-000006
表示由指标集A推理出风险等级B的置信度;count(A∩B)、count(A)分别表示指标集A中的元素和风险等级B同时存在同一样本的样本数量、指标集A中的元素存在同一样本的样本数量;
步骤S7、根据步骤S6得到的关联规则集以及步骤S2得到的风险规则集,基于待预警企业指标数据集对待预警企业进行预警,预警企业命中的关联规则,预测企业风险等级及可能存在的风险点,输出预警结果。
在本发明一实施例中,步骤S5中,运用互熵-区间套法进行分箱、卡方检验相关性筛选指标的具体实现方式如下:
对于值种类超过5个的离散变量属性的指标及连续变量的指标,运用有监督的互熵-区间套法对指标变量进行分箱并根据分箱结果字符化连续变量,降低模型过拟合的风险;
互熵-区间套法进行分箱步骤如下:
第0步,预先设定一个阈值threshold以及最大分箱数n;
对待分箱指标I,有
Figure PCTCN2021071403-appb-000007
初始分箱边界值集为Boundary={a,b},对指标I进行分箱:
第1步,取
Figure PCTCN2021071403-appb-000008
将[a,b]划分为两个区间[a,a 0]、(a 0,b],结合互信息和信息熵,提出一种新的类别不确定评价函数MiEntropy:
Figure PCTCN2021071403-appb-000009
其中,t为区间;C为类别集合,C={c 1,c 2,…,c m},m为类别个数;p(c i)、p(t)、p(t,c i)分别是训练集中c i类的样本数、指标值在区间t的样本数、指标值在区间t且属于c i类的样本数与训练集样本总数的比例,p(c i|t)指标值在区间t且属于c i的样本数与指标值在区间t的样本数的比例,η为超参数,且满足η∈[0,1];
应用MiEntropy对[a,a 0]、(a 0,b]进行评价,转第2步;
第2步,若MiEntropy([a,a 0])≥throshold或MiEntropy((a 0,b])≥throshold,则将a 0添加至 Boundary中,转第3步;
第3步,根据Boundary获得指标I的分箱数numb(I):
若numb(I)≥n,则停止分箱
若MiEntropy([a,a 0])≥throshold,取a=a,b=a 0并跳至第1步;
若MiEntropy((a 0,b])≥throshold,取a=a 0,b=b并跳至第1步;
若MiEntropy([a,a 0])≤MiEntropy((a 0,b])<throshold,取a=a 0,b=b并跳至第1步;
若MiEntropy((a 0,b])≤MiEntropy([a,a 0])<throshold,取a=a,b=a 0并跳至第1步;
第4步,分箱结束后,得到一个分箱边界集,将其按照小到大的顺序排序得到Boundary={a,a 1,a 2,…,a k,b},根据Boundary将指标I分为k+1箱:{[a,a 1],(a 1,a 2],…,(a k,b]};
卡方检验相关性筛选指标具体为:通过卡方检验检验指标变量与企业风险的相关性,过滤对预警助益不大的指标,卡方检验进行相关分析的结果基于有监督的分箱进行划分样本空间。
在本发明一实施例中,步骤S7的具体实现方式如下:
首先,对待预警企业指标数据集中待预警企业指标数据字符化:指标数据的转化,由步骤S5的分箱规则决定,将原始指标数据转换为相对应的字符标识,得到该企业的转化后的指标集
Figure PCTCN2021071403-appb-000010
其中,C i为第i个样本企业各指标值字符化后的结果集合;
Figure PCTCN2021071403-appb-000011
表示第i个样本企业第c i个指标字符化结果值;
其次,获取命中关联规则:遍历关联规则,若关联规则的指标集
Figure PCTCN2021071403-appb-000012
满足C i∩R j=R j,则表示该企业命中R j对应的关联规则,因此,得到企业命中风险规则指标集:
Figure PCTCN2021071403-appb-000013
其中,
Figure PCTCN2021071403-appb-000014
表示第i个预警企业第q i个命中的风险规则的指标集;
Figure PCTCN2021071403-appb-000015
表示第i个预警企业第q i个命中的风险规则的风险等级;
Figure PCTCN2021071403-appb-000016
表示第i个预警企业第q i个命中的风险规则的置信度;
然后,获取风险等级:风险等级由命中关联规则的风险等级和置信度所决定的,将关联规则的风险等级转换为对应分值,置信度作为权值进行加权平均,计算得到最终的风险分数,并根据各风险等级的分值区间获得风险等级;
Figure PCTCN2021071403-appb-000017
Figure PCTCN2021071403-appb-000018
其中,高风险用P0表示,中高风险有两个等级,即P1、P2,且P1的风险大于P2,低风险用P3表示,无风险用P4表示;riskScore i表示第i个预警企业的风险得分;SP ij表示第i个预警企业第j个命中的风险规则的风险等级得分;P ij表示第i个预警企业第j个命中的风险规则的风险等级;Conf ij表示第i个预警企业第j个命中的风险规则的置信度;r i表示第i个预警企业命中的风险规则的置信度之和;riskLevel为风险得分映射为风险等级的函数;
最后,获取风险描述:遍历步骤S2所得的风险规则集
Figure PCTCN2021071403-appb-000019
以及企业命中风险规则指标集
Figure PCTCN2021071403-appb-000020
若X k∩R ir=X k,则该企业大概率存在X k所对应的风险点risk k;遍历完成后,得到该企业风险点集
Figure PCTCN2021071403-appb-000021
对风险点集中各元素以分号进行拼接得到其风险描述。
相较于现有技术,本发明具有以下有益效果:
(1)高创新性。本发明是关联分析FP-Tree算法在企业风险预警分析领域的具体应用,填补了关联分析算法在企业风险预警分析领域的空白,而在FP-Tree挖掘关联规则前的数据处理——运用卡方检验原理进行指标筛选、分箱,通过去除相关性较弱的指标以提升预警准确度,更能够全面得挖掘企业行为的风险。
(2)时效性。每次进行企业预警时,代码脚本从原始数据表中获取实时数据生成指标,且指标筛选和分箱、关联规则均进行相应的动态更新,使得本发明能够实时根据外界的变化进行自动调整以适应其发生的变化,很大程度上降低企业风险预警在数据处理和分析上存在的时间误差。
(3)低门槛。由于本发明所述的基于关联分析算法FP-Tree的企业风险预警分析方法对最终用户的使用来说是黑盒的,终端用户无需关心具体的模型构建过程,只需要将所需的企业基本信息和行为信息数据保存、更新至企业信息数据库中,本发明通过风险预警系统界面将所得 到的预警线索展示于领域模型风险线索列表中。
附图说明
图1为本发明方法结构示意图。
具体实施方式
下面结合附图,对本发明的技术方案进行具体说明。
本发明提供了一种基于关联分析FP-Tree算法的企业风险预警方法,包括如下步骤:
步骤S1、根据历史企业相关行为数据,分析衡量企业风险状况的标尺和重要依据,设计风险指标体系X={x 1,x 2,…,x i},x i表示风险指标体系的第i个指标的名称;
步骤S2、根据风险指标体系,运用大数据分析形成风险规则,即由一个或多个指标的值等于一预定值或预定区间值,认为企业可能存在相应风险,得到风险规则集B:
Figure PCTCN2021071403-appb-000022
其中,X k为指标体系X的子集;risk k为由X k分析推理得到的相对应的风险文字描述;
步骤S3、采集企业相关行为数据,并构建企业风险预警模型的训练指标数据集及待预警企业指标数据集,训练指标数据集中训练集:测试集=4:1;
步骤S4、基于训练指标数据集,通过企业信用维度数据计算获得企业对应的风险等级,计算公式如下:
Figure PCTCN2021071403-appb-000023
Figure PCTCN2021071403-appb-000024
其中,creditScore new表示最新信用风险分值归一化后数值,100·creditScore new作为风险得分的基础分值;creditScore i表示前i年信用风险分值,
Figure PCTCN2021071403-appb-000025
代表信用分数的稳定性情况;riskListCount表示近5年来列入黑名单或失信名单次数,4·riskListCount代表被列入黑名单或失信名单风险;
步骤S5、运用互熵-区间套法进行分箱、卡方检验相关性筛选指标,根据分箱结果进行指标的字符化,并保存分箱规则以及筛选后剩余的指标列表;
步骤S6、获取关联规则集:运用关联分析FP-Tree算法挖掘企业各风险等级的企业行为的 关联规则,遍历关联规则并将其整合为由指标集、风险等级及置信度组成的关联规则集,关联规则集由以“(指标集):(风险等级,置信度)”形式且置信度大于0.5的元素构成;
Figure PCTCN2021071403-appb-000026
其中,A表示某一指标集;B表示某一风险等级;
Figure PCTCN2021071403-appb-000027
表示由指标集A推理出风险等级B的置信度;count(A∩B)、count(A)分别表示指标集A中的元素和风险等级B同时存在同一样本的样本数量、指标集A中的元素同时存在同一样本的样本数量;
步骤S7、根据步骤S6得到的关联规则集以及步骤S2得到的风险规则集,基于待预警企业指标数据集对待预警企业进行预警,预警企业命中的关联规则,预测企业风险等级及可能存在的风险点,输出预警结果。
以下为本发明的具体实现过程。
本发明采用以下方案步骤实现:
步骤1,经过前期调研、研究企业各项行为数据,分析衡量企业风险状况的标尺和重要依据,设计风险指标体系X={x 1,x 2,…,x i},x i表示指标体系的第i个指标的名称。例如,通过研究企业在行政检查行为、行政处罚行为信息、行政强制行为、履约历史、产品质量检查、投诉举报信息、信用等级评价等各个环节的行为数据以及企业自身属性,设计7个一级指标、30个二级指标、81个三级指标组成的风险指标体系;
表1企业风险指标体系
Figure PCTCN2021071403-appb-000028
表1-续1
Figure PCTCN2021071403-appb-000029
表1-续2
Figure PCTCN2021071403-appb-000030
表1-续3
Figure PCTCN2021071403-appb-000031
步骤2,根据指标体系,运用现有大数据分析形成风险规则,即由一个或多个指标的值等于某个特定值或属于某特定区间值,以此推理该企业可能存在某风险,得到风险规则集B:
Figure PCTCN2021071403-appb-000032
其中,
Figure PCTCN2021071403-appb-000033
表示第i个预警企业第q i个命中的风险规则的指标集;
Figure PCTCN2021071403-appb-000034
表示第i个预警企业第q i个命中的风险规则的风险等级;
Figure PCTCN2021071403-appb-000035
表示第i个预警企业第q i个命中的风险规则的置信度;
步骤S3、采集企业相关行为数据,并构建企业风险预警模型的训练指标数据集及待预警企业指标数据集,训练指标数据集中训练集:测试集=4:1;
以表1的建立的风险指标体系为例,根据企业一级指标年报公示中三级指标收入异常、资产异常、利润异常、人员异常、纳税异常、逻辑关系异常可以推理出“企业疑似存在逃税漏税、 弄虚作假”的风险、或根据法人代表变更、企业名称变更、登记住所变更、其他变更登记事项近三年来次数均大于10次以及注册资本大幅增长或减少推理得到“企业基本信息、股权等变更过于频繁,疑似存在经营不稳定”的风险等等。
步骤3,建立模型训练数据标准,包括:企业相关行为数据(企业基础信息、行政检查行为信息、行政处罚行为信息、行政强制行为信息、履约历史、投诉举报信息、企业信用分数、企业产品信息表等),通过数据治理体系,汇集训练数据,并编写python脚本实时生成本提案预警模型的训练指标数据集(训练集:测试集=4:1)、待预警企业指标数据集;
步骤4,获取训练样本的目标变量,根据企业信用维度数据设计公式计算获得其对应的风险等级。结合企业近5年信用分数、近5年来列入黑名单或失信名单次数等数据对训练数据集样本进行风险等级评估,依据下述公式及各风险等级分值区间对训练样本企业进行风险等级评估,以此作为训练数据集的目标变量“Y”,后续将获得的目标变量“Y”与训练指标数据集输入关联分析算法中进行关联规则挖掘;
Figure PCTCN2021071403-appb-000036
Figure PCTCN2021071403-appb-000037
其中,creditScore new表示最新信用风险分值归一化后数值,100·creditScore new作为风险得分的基础分值;creditScore i表示前i年信用风险分值,
Figure PCTCN2021071403-appb-000038
代表信用分数的稳定性情况;riskListCount表示近5年来列入黑名单或失信名单次数,4·riskListCount代表被列入黑名单或失信名单风险;表2为风险得分-风险等级对应表。
表2风险得分-风险等级对应表
riskScore (-∞,20) [20,40) [40,60) [60,80) [80,+∞)
风险等级 无风险P4 低风险P3 中风险P2 中高风险P1 高风险P0
步骤5,运用互熵-区间套法进行分箱、卡方检验进行筛选指标(过滤对预警模型助益不大的指标),根据分箱结果进行指标的字符化,并保存分箱规则以及筛选后剩余的指标列表。
进一步地,步骤5所述的卡方分箱字符化指标变量具体为:对于值种类超过5个的离散变量属性的指标及连续变量的指标,运用有监督的互熵-区间套法对指标变量进行分箱并根据分箱 结果字符化连续变量,降低模型过拟合的风险,例如对于指标“企业注册资本(x1)”,在卡方分箱下将原始指标数据分为3箱,则字符化后该指标的数值转化为x1_bin0、x1_bin1或x1_bin2。
互熵-区间套法进行分箱步骤如下:
第0步,预先设定一个阈值threshold以及最大分箱数n;
对待分箱指标I,有I·[a,b]=[min{I},max{I}],初始分箱边界值集为Boundary={a,b},对指标I进行分箱:
第1步,取
Figure PCTCN2021071403-appb-000039
将[a,b]划分为两个区间[a,a 0]、(a 0,b],结合互信息和信息熵,提出一种新的类别不确定评价函数MiEntropy:
Figure PCTCN2021071403-appb-000040
其中,t为区间;C为类别集合,C={c 1,c 2,…,c m},m为类别个数;p(c i)、p(t)、p(t,c i)分别是训练集中c i类的样本数、指标值在区间t的样本数、指标值在区间t且属于c i类的样本数与训练集样本总数的比例,p(c i|t)指标值在区间t且属于c i的样本数与指标值在区间t的样本数的比例,η为超参数,且满足η∈[0,1],默认值为0.5。
应用MiEntropy对[a,a 0]、(a 0,b]进行评价,转第2步;
第2步,若MiEntropy([a,a 0])≥throshold或MiEntropy((a 0,b])≥throshold,则将a 0添加至Boundary中,转第3步;
第3步,根据Boundary获得指标I的分箱数numb(I):
若numb(I)≥n,则停止分箱
若MiEntropy([a,a 0])≥throshold,取a=a,b=a 0并跳至第1步;
若MiEntropy((a 0,b])≥throshold,取a=a 0,b=b并跳至第1步
若MiEntropy([a,a 0])≤MiEntropy((a 0,b])<throshold,取a=a 0,b=b并跳至第1步;
若MiEntropy((a 0,b])≤MiEntropy([a,a 0])<throshold,取a=a,b=a 0并跳至第1步。
第4步,分箱结束后,得到一个分箱边界集,将其按照小到大的顺序排序得到 Boundary={a,a 1,a 2,…,a k,b},根据Boundary将指标I分为k+1箱:{[a,a 1],(a 1,a 2],…,(a k,b]}。
卡方检验相关性筛选指标具体为:通过卡方检验检验指标变量与企业风险的相关性,过滤对预警助益不大的指标,但传统的卡方检验进行相关分析的结果依赖于样本空间的划分,不同的划分可能会得到不同的推断结果,而本提案基于有监督的分箱进行划分样本空间,有较高的检验功效并且是稳健的。
步骤6,获取关联规则集。基于以上步骤,获得完整企业训练样本指标集以及目标变量“Y”,运用经典关联规则挖掘算法FP-Tree挖掘该训练数据中企业各风险等级的企业行为的关联规则,遍历关联规则并将其整合为由指标集、风险等级及置信度组成的关联规则集,关联规则集由以“(指标集):(风险等级,置信度)”形式且置信度大于0.5的元素构成。企业应用FP-Tree算法挖掘出的关联规则集形如:{(x1_bin0,x3_bin1,x7_bin3,x15_bin4):(P0,0.98),……}。
进一步地,步骤6所述的关联规则是反映一个事物和其他事物之前的相互依赖性和关联性,若事物之间存在关联关系,那么其中一个事物就能够通过其他事物预测到。基于这一思想的延展,将关联分析算法应用于企业风险预警中,运用经典关联规则挖掘算法FP-Tree挖掘企业各风险等级与企业行为的关联规则。
步骤7,根据得到的关联规则以及步骤2梳理的指标体系风险规则,对待预警企业进行预警,预警企业命中的关联规则,预测企业风险等级及可能存在的风险点。对任一待预警企业可根据如下步骤得到预警结果:
首先,待预警企业指标数据字符化。指标数据的转化,由步骤5的分箱规则决定,将原始指标数据转换为相对应的字符标识,得到该企业的指标集
Figure PCTCN2021071403-appb-000041
其次,获取命中关联规则。遍历关联规则,若关联规则的指标集
Figure PCTCN2021071403-appb-000042
满足C i∩R j=R j,则表示该企业命中R j对应的关联规则,因此,得到企业命中风险规则集:
Figure PCTCN2021071403-appb-000043
然后,获取风险等级。风险等级由命中关联规则的风险等级和置信度所决定的,将关联规则的风险等级转换为对应分值,置信度作为权值进行加权平均,计算得到最终的风险分数,并根据各风险等级的分值区间获得风险等级。
Figure PCTCN2021071403-appb-000044
Figure PCTCN2021071403-appb-000045
最后,获取风险描述。遍历步骤2所得的风险规则集
Figure PCTCN2021071403-appb-000046
以及企业命中风险规则的指标集
Figure PCTCN2021071403-appb-000047
若X k∩R ir=X k,则该企业大概率存在X k所对应的风险点risk k。遍历完成后,该企业得到风险点集
Figure PCTCN2021071403-appb-000048
对风险点集中各元素以分号进行拼接得到其风险描述。
预警结果展示案例:某企业风险等级为P0(高风险),线索描述为:缺失年报公示;企业登记变更频繁,存在经营不稳定风险;经营财务可能存在弄虚作假风险;营业执照过期或失效;关联企业经营异常比例过高,企业经营存在被纳入经营异常风险;关联企业失信比例过高,自身存在失信风险。
以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。

Claims (3)

  1. 一种基于关联分析FP-Tree算法的企业风险预警方法,其特征在于,包括如下步骤:
    步骤S1、根据历史企业相关行为数据,分析衡量企业风险状况的标尺和重要依据,设计风险指标体系X={x 1,x 2,…,x i},x i表示风险指标体系的第i个指标的名称;
    步骤S2、根据风险指标体系,运用大数据分析形成风险规则,即由一个或多个指标的值等于一预定值或预定区间值,认为企业可能存在相应风险,得到风险规则集B:
    Figure PCTCN2021071403-appb-100001
    其中,X k为指标体系X的子集;risk k为由X k分析推理得到的相对应的风险文字描述;
    步骤S3、采集企业相关行为数据,并构建企业风险预警模型的训练指标数据集及待预警企业指标数据集,训练指标数据集中训练集:测试集=4:1;
    步骤S4、基于训练指标数据集,通过企业信用维度数据计算获得企业对应的风险等级,计算公式如下:
    Figure PCTCN2021071403-appb-100002
    Figure PCTCN2021071403-appb-100003
    其中,creditScore new表示最新信用风险分值归一化后数值,100·creditScore new作为风险得分的基础分值;creditScore i表示前i年信用风险分值,
    Figure PCTCN2021071403-appb-100004
    代表信用分数的稳定性情况;riskListCount表示近5年来列入黑名单或失信名单次数,4·riskListCount代表被列入黑名单或失信名单风险;
    步骤S5、运用互熵-区间套法进行分箱、卡方检验相关性筛选指标,根据分箱结果进行指标的字符化,并保存分箱规则以及筛选后剩余的指标列表;
    步骤S6、获取关联规则集:运用关联分析FP-Tree算法挖掘企业各风险等级的企业行为的关联规则,遍历关联规则并将其整合为由指标集、风险等级及置信度组成的关联规则集,关联规则集由以“(指标集):(风险等级,置信度)”形式且置信度大于0.5的元素构成;
    Figure PCTCN2021071403-appb-100005
    其中,A表示其中一指标集;B表示其中一风险等级;
    Figure PCTCN2021071403-appb-100006
    表示由指标集A推理出风险等级B的置信度;count(A∩B)、count(A)分别表示指标集A中的元素和风险等级B同时存在同一样本的样本数量、指标集A中的元素同时存在同一样本的样本数量;
    步骤S7、根据步骤S6得到的关联规则集以及步骤S2得到的风险规则集,基于待预警企业指标数据集对待预警企业进行预警,预警企业命中的关联规则,预测企业风险等级及可能存在的风险点,输出预警结果。
  2. 根据权利要求1所述的基于关联分析FP-Tree算法的企业风险预警方法,其特征在于,步骤S5中,运用互熵-区间套法进行分箱、卡方检验相关性筛选指标的具体实现方式如下:
    对于值种类超过5个的离散变量属性的指标及连续变量的指标,运用有监督的互熵-区间套法对指标变量进行分箱并根据分箱结果字符化连续变量,降低模型过拟合的风险;
    互熵-区间套法进行分箱步骤如下:
    第0步,预先设定一个阈值threshold以及最大分箱数n;
    对待分箱指标I,有
    Figure PCTCN2021071403-appb-100007
    初始分箱边界值集为Boundary={a,b},对指标I进行分箱:
    第1步,取
    Figure PCTCN2021071403-appb-100008
    将[a,b]划分为两个区间[a,a 0]、(a 0,b],结合互信息和信息熵,提出一种新的类别不确定评价函数MiEntropy:
    Figure PCTCN2021071403-appb-100009
    其中,t为区间;C为类别集合,C={c 1,c 2,…,c m},m为类别个数;p(c i)、p(t)、p(t,c i)分别是训练集中c i类的样本数、指标值在区间t的样本数、指标值在区间t且属于c i类的样本数与训练集样本总数的比例,p(c i|t)指标值在区间t且属于c i的样本数与指标值在区间t的样本数的比例,η为超参数,且满足η∈[0,1];
    应用MiEntropy对[a,a 0]、(a 0,b]进行评价,转第2步;
    第2步,若MiEntropy([a,a 0])≥throshold或MiEntropy((a 0,b])≥throshold,则将a 0添加至Boundary中,转第3步;
    第3步,根据Boundary获得指标I的分箱数numb(I):
    若numb(I)≥n,则停止分箱
    若MiEntropy([a,a 0])≥throshold,取a=a,b=a 0并跳至第1步;
    若MiEntropy((a 0,b])≥throshold,取a=a 0,b=b并跳至第1步;
    若MiEntropy([a,a 0])≤MiEntropy((a 0,b])<throshold,取a=a 0,b=b并跳至第1步;
    若MiEntropy((a 0,b])≤MiEntropy([a,a 0])<throshold,取a=a,b=a 0并跳至第1步;
    第4步,分箱结束后,得到一个分箱边界集,将其按照小到大的顺序排序得到Boundary={a,a 1,a 2,…,a k,b},根据Boundary将指标I分为k+1箱:{[a,a 1],(a 1,a 2],…,(a k,b]};
    卡方检验相关性筛选指标具体为:通过卡方检验检验指标变量与企业风险的相关性,过滤对预警助益不大的指标,卡方检验进行相关分析的结果基于有监督的分箱进行划分样本空间。
  3. 根据权利要求1所述的基于关联分析FP-Tree算法的企业风险预警方法,其特征在于,步骤S7的具体实现方式如下:
    首先,对待预警企业指标数据集中待预警企业指标数据字符化:指标数据的转化,由步骤S5的分箱规则决定,将原始指标数据转换为相对应的字符标识,得到该企业的转化后的指标集
    Figure PCTCN2021071403-appb-100010
    其中,C i为第i个样本企业各指标值字符化后的结果集合;
    Figure PCTCN2021071403-appb-100011
    表示第i个样本企业第c i个指标字符化结果值;
    其次,获取命中关联规则:遍历关联规则,若关联规则的指标集
    Figure PCTCN2021071403-appb-100012
    满足C i∩R j=R j,则表示该企业命中R j对应的关联规则,因此,得到企业命中风险规则指标集Q i
    Figure PCTCN2021071403-appb-100013
    其中,
    Figure PCTCN2021071403-appb-100014
    表示第i个预警企业第q i个命中的风险规则的指标集;
    Figure PCTCN2021071403-appb-100015
    表示第i个预警企业第q i个命中的风险规则的风险等级;
    Figure PCTCN2021071403-appb-100016
    表示第i个预警企业第q i个命中的风险规则的置信度;
    然后,获取风险等级:风险等级由命中关联规则的风险等级和置信度所决定的,将关联规则的风险等级转换为对应分值,置信度作为权值进行加权平均,计算得到最终的风险分数,并根据各风险等级的分值区间获得风险等级;
    Figure PCTCN2021071403-appb-100017
    Figure PCTCN2021071403-appb-100018
    其中,高风险用P0表示,中高风险有两个等级,即P1、P2,且P1的风险大于P2,低风险用P3表示,无风险用P4表示;riskScore i表示第i个预警企业的风险得分;SP ij表示第i个预警企业第j个命中的风险规则的风险等级得分;P ij表示第i个预警企业第j个命中的风险规则的风险等级;Conf ij表示第i个预警企业第j个命中的风险规则的置信度;r i表示第i个预警企业命中的风险规则的置信度之和;riskLevel为风险得分映射为风险等级的函数;
    最后,获取风险描述:遍历步骤S2所得的风险规则集
    Figure PCTCN2021071403-appb-100019
    以及企业命中风险规则指标集
    Figure PCTCN2021071403-appb-100020
    若X k∩R ir=X k,则该企业大概率存在X k所对应的风险点risk k;遍历完成后,得到该企业风险点集
    Figure PCTCN2021071403-appb-100021
    对风险点集中各元素以分号进行拼接得到其风险描述。
PCT/CN2021/071403 2020-12-09 2021-01-13 基于关联分析FP-Tree算法的企业风险预警方法 WO2022121083A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011461438.1A CN112465393B (zh) 2020-12-09 2020-12-09 基于关联分析FP-Tree算法的企业风险预警方法
CN202011461438.1 2020-12-09

Publications (1)

Publication Number Publication Date
WO2022121083A1 true WO2022121083A1 (zh) 2022-06-16

Family

ID=74803925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071403 WO2022121083A1 (zh) 2020-12-09 2021-01-13 基于关联分析FP-Tree算法的企业风险预警方法

Country Status (2)

Country Link
CN (1) CN112465393B (zh)
WO (1) WO2022121083A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271263A (zh) * 2022-09-27 2022-11-01 佰聆数据股份有限公司 基于改进关联规则的电力设备缺陷预警方法、系统及介质
CN115576850A (zh) * 2022-11-21 2023-01-06 舟谱数据技术南京有限公司 数据指标测试方法、装置、电子设备及存储介质
CN116012019A (zh) * 2023-03-27 2023-04-25 北京力码科技有限公司 一种基于大数据分析的金融风控管理系统
CN116029622A (zh) * 2023-03-30 2023-04-28 中铁大桥局集团有限公司 一种基于云证据推理的板梁桥安全预警方法及装置
CN116151627A (zh) * 2023-04-04 2023-05-23 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备
CN116644351A (zh) * 2023-06-13 2023-08-25 石家庄学院 一种基于人工智能的数据处理方法及系统
CN116777204A (zh) * 2023-05-29 2023-09-19 深圳交易集团有限公司 一种通过可配置风险点实现主动式监管预警的方法
CN117094565A (zh) * 2023-10-19 2023-11-21 赛飞特工程技术集团有限公司 一种国有集团企业主体责任落实分级评估系统
CN117541057A (zh) * 2023-11-23 2024-02-09 徐州千鹤企业管理有限公司 一种基于数据分析的企业运营预警监控方法及系统
CN117556264A (zh) * 2024-01-11 2024-02-13 浙江同花顺智能科技有限公司 一种评估模型的训练方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034019A (zh) * 2021-03-31 2021-06-25 建信金融科技有限责任公司 企业风险预测方法、装置、计算机设备及可读存储介质
CN114118526A (zh) * 2021-10-29 2022-03-01 中国建设银行股份有限公司 一种企业风险预测方法、装置、设备及存储介质
CN116596674A (zh) * 2023-07-18 2023-08-15 山东省标准化研究院(Wto/Tbt山东咨询工作站) 基于大数据分析的对外贸易风险评估方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000034889A2 (en) * 1998-12-09 2000-06-15 Unica Technologies, Inc. Execution of multiple models using data segmentation
CN102012918A (zh) * 2010-11-26 2011-04-13 中金金融认证中心有限公司 一种规律挖掘和执行系统及其方法
CN105913195A (zh) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 基于全行业数据的企业金融风险评分方法
CN110942171A (zh) * 2019-09-12 2020-03-31 中电科新型智慧城市研究院有限公司 一种基于机器学习的企业劳资纠纷风险预测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180308158A1 (en) * 2016-04-19 2018-10-25 Dalian University Of Technology An optimal credit rating division method based on maximizing credit similarity
CN108846532A (zh) * 2018-03-21 2018-11-20 宁波工程学院 应用于物流供应链平台的企业风险评估方法及装置
CN109583796A (zh) * 2019-01-08 2019-04-05 河南省灵山信息科技有限公司 一种用于物流园区运营分析的数据挖掘系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000034889A2 (en) * 1998-12-09 2000-06-15 Unica Technologies, Inc. Execution of multiple models using data segmentation
CN102012918A (zh) * 2010-11-26 2011-04-13 中金金融认证中心有限公司 一种规律挖掘和执行系统及其方法
CN105913195A (zh) * 2016-04-29 2016-08-31 浙江汇信科技有限公司 基于全行业数据的企业金融风险评分方法
CN110942171A (zh) * 2019-09-12 2020-03-31 中电科新型智慧城市研究院有限公司 一种基于机器学习的企业劳资纠纷风险预测方法

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271263A (zh) * 2022-09-27 2022-11-01 佰聆数据股份有限公司 基于改进关联规则的电力设备缺陷预警方法、系统及介质
CN115576850A (zh) * 2022-11-21 2023-01-06 舟谱数据技术南京有限公司 数据指标测试方法、装置、电子设备及存储介质
CN115576850B (zh) * 2022-11-21 2023-03-14 舟谱数据技术南京有限公司 数据指标测试方法、装置、电子设备及存储介质
CN116012019A (zh) * 2023-03-27 2023-04-25 北京力码科技有限公司 一种基于大数据分析的金融风控管理系统
CN116012019B (zh) * 2023-03-27 2023-06-06 北京力码科技有限公司 一种基于大数据分析的金融风控管理系统
CN116029622A (zh) * 2023-03-30 2023-04-28 中铁大桥局集团有限公司 一种基于云证据推理的板梁桥安全预警方法及装置
CN116029622B (zh) * 2023-03-30 2023-06-30 中铁大桥局集团有限公司 一种基于云证据推理的板梁桥安全预警方法及装置
CN116151627A (zh) * 2023-04-04 2023-05-23 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备
CN116151627B (zh) * 2023-04-04 2023-09-01 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备
CN116777204A (zh) * 2023-05-29 2023-09-19 深圳交易集团有限公司 一种通过可配置风险点实现主动式监管预警的方法
CN116644351A (zh) * 2023-06-13 2023-08-25 石家庄学院 一种基于人工智能的数据处理方法及系统
CN116644351B (zh) * 2023-06-13 2024-04-02 石家庄学院 一种基于人工智能的数据处理方法及系统
CN117094565A (zh) * 2023-10-19 2023-11-21 赛飞特工程技术集团有限公司 一种国有集团企业主体责任落实分级评估系统
CN117094565B (zh) * 2023-10-19 2024-01-12 赛飞特工程技术集团有限公司 一种国有集团企业主体责任落实分级评估系统
CN117541057A (zh) * 2023-11-23 2024-02-09 徐州千鹤企业管理有限公司 一种基于数据分析的企业运营预警监控方法及系统
CN117556264A (zh) * 2024-01-11 2024-02-13 浙江同花顺智能科技有限公司 一种评估模型的训练方法、装置及电子设备
CN117556264B (zh) * 2024-01-11 2024-05-07 浙江同花顺智能科技有限公司 一种评估模型的训练方法、装置及电子设备

Also Published As

Publication number Publication date
CN112465393A (zh) 2021-03-09
CN112465393B (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2022121083A1 (zh) 基于关联分析FP-Tree算法的企业风险预警方法
Jiang et al. The effect of mandatory environmental regulation on innovation performance: Evidence from China
Wu et al. Impact and threshold effect of Internet technology upgrade on forestry green total factor productivity: Evidence from China
Mugera Measuring technical efficiency of dairy farms with imprecise data: a fuzzy data envelopment analysis approach
CN112053061A (zh) 围串标行为识别方法、装置、电子设备和存储介质
Bychkova et al. Measurement of information in the subsystem of internal control of the controlling system of organizations of the agro-industrial complex
Yi et al. Corporate social responsibility performance evaluation from the perspective of stakeholder heterogeneity based on fuzzy analytical hierarchy process integrated TOPSIS
CN111859299A (zh) 大数据指标构建方法、装置、设备及存储介质
CN114757468A (zh) 一种面向流程挖掘中流程执行异常的根源分析方法
CN115063056A (zh) 基于图拓扑分析改进的建造行为安全风险动态分析方法
Wang et al. Interactive information disclosure and non-penalty regulatory review risk
Yuan et al. An evaluation index system for intellectual capital evaluation based on machine learning
Zhang et al. A novel multi-interval-valued fuzzy set model to solve MADM problems
CN113642669B (zh) 基于特征分析的防欺诈检测方法、装置、设备及存储介质
Detcharat et al. A hybrid multi-criteria decision model for technological innovation capability assessment: Research on Thai automotive parts firms
Duan Estimation of export cutoff productivity of Chinese industrial enterprises
CN111915188A (zh) 企业的系统性能测试方法、装置和设备
Vnukova et al. Identifying changes in insurance companies’ competitiveness on the travel services market
Eissa A Comprehensive Model for Factors Affecting the Usage of Computer Assisted Auditing Tools and Techniques
CN116797097A (zh) 一种数据资产价值评估方法
Li et al. Research on the Evaluation Index System of Enterprise Production Efficiency
Liu et al. Fuzzy Prediction System of Construction Cost Based on Data Analysis Algorithm
Gui Enterprise Accounting Risk Early Warning Model Based on Artificial Intelligence System Economics
Wang Legal risk assessment of enterprise labor dispatch employment under clustering algorithm
Bai et al. Dynamic Decision Making of Decision-Makers’ Psychological Expectations Based on Interval Triangular Fuzzy Soft Sets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21901802

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21901802

Country of ref document: EP

Kind code of ref document: A1