CN114461630A

CN114461630A - Intelligent attribution analysis method, device, equipment and storage medium

Info

Publication number: CN114461630A
Application number: CN202210134402.5A
Authority: CN
Inventors: 贺民
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-10
Anticipated expiration: 2042-02-14
Also published as: CN114461630B

Abstract

The invention relates to artificial intelligence technology, and discloses an intelligent attribution analysis method, comprising: acquiring an initial data set and performing data replacement processing on abnormal values to obtain a standard data set; dividing the standard data set into attributable phenomenon data import the attribution phenomenon data set and attribution factor data set into the preset model library, and calculate the prediction success rate of each model in the model library; determine the optimal rate according to the prediction success rate Attribution phenomenon prediction model; select the model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model to explain the contribution of each attribution factor data of the algorithm. In addition, the present invention also relates to the blockchain technology, and the initial data set and the contribution of each attribution factor can be stored in the nodes of the blockchain. The present invention also provides an intelligent attribution analysis device, an electronic device and a storage medium. The present invention can improve the accuracy of attribution analysis.

Description

Intelligent attribution analysis method, device, equipment and storage medium

技术领域technical field

本发明涉及人工智能技术领域，尤其涉及一种智能归因分析方法、装置、电子设备及计算机可读存储介质。The present invention relates to the technical field of artificial intelligence, and in particular, to an intelligent attribution analysis method, device, electronic device and computer-readable storage medium.

背景技术Background technique

归因分析是解释某个现象或效果由哪些因素构成的分析方法，在互联网广告行业、保险行业等各行业应用广泛，用于分析行业数据源自于怎样的用户行为，提高行业用户的黏性。Attribution analysis is an analysis method to explain what factors constitute a phenomenon or effect. It is widely used in various industries such as the Internet advertising industry and the insurance industry. It is used to analyze the user behavior of industry data and improve the stickiness of industry users. .

当前主要的归因算法有基于规则的归因以及基于数据驱动的归因算法两类，都需要人为预先设置好数学关系，但实际行业的业务场景复杂多变，缺少各行业业务场景通用的归因分析算法，导致现有归因分析方法准确性不高。At present, the main attribution algorithms include rule-based attribution and data-driven attribution algorithms, both of which need to manually set mathematical relationships in advance. Due to the analysis algorithm, the accuracy of the existing attribution analysis methods is not high.

发明内容SUMMARY OF THE INVENTION

本发明提供一种智能归因分析方法、装置及计算机可读存储介质，其主要目的在于解决进行归因分析时准确度较低的问题。The present invention provides an intelligent attribution analysis method, device and computer-readable storage medium, the main purpose of which is to solve the problem of low accuracy in attribution analysis.

为实现上述目的，本发明提供的一种智能归因分析方法，包括：In order to achieve the above purpose, an intelligent attribution analysis method provided by the present invention includes:

获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；Obtaining an initial data set, performing data replacement processing on outliers in the initial data set, and obtaining a standard data set;

利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；Using a pre-built variable library to divide the standard data set into an attribution phenomenon data set and an attribution factor data set corresponding to the attribution phenomenon data set;

将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；Import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；Determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。Select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributable phenomenon data .

可选地，所述对所述初始数据集中的异常值进行数据替换处理，得到标准数据集，包括:Optionally, the described abnormal value in the initial data set is carried out data replacement processing to obtain a standard data set, including:

计算所述初始数据集中每个初始数据与所述初始数据的邻域数据之间的局部可达密度比值；calculating the local reachable density ratio between each initial data in the initial data set and the neighborhood data of the initial data;

在所述局部密度比值小于或等于预设密度比值阈值时，确定所述初始数据为异常值；When the local density ratio is less than or equal to a preset density ratio threshold, determining that the initial data is an abnormal value;

利用预设的正确数据集对所述异常值进行数据替换处理，得到标准数据集。The abnormal value is subjected to data replacement processing using a preset correct data set to obtain a standard data set.

可选地，所述利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集，包括：Optionally, using a pre-built variable library to divide the standard data set into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set, including:

将所述标准数据集中的变量数据与预先构建的变量库进行对比，将所述标准数据集中与所述变量库中一致的变量数据确定为被归因现象数据，以及将所述所述标准数据集中与所述变量库中不一致的变量数据确定为归因因子数据；comparing the variable data in the standard data set with a pre-built variable library, determining the variable data in the standard data set that is consistent with the variable library as attributable phenomenon data, and comparing the standard data The variable data that is inconsistent with the variable library is determined as attribution factor data;

计算所述被归因现象数据与所述归因因子数据的关联度，将所述关联度大于预设阈值的归因因子数据确定为与所述被归因现象数据相对应的目标归因因子数据；Calculate the correlation degree between the attribution phenomenon data and the attribution factor data, and determine the attribution factor data whose correlation degree is greater than a preset threshold as the target attribution factor corresponding to the attribution phenomenon data data;

汇集所述被归因现象数据及所述目标归因因子数据，得到被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集。Collecting the attribution phenomenon data and the target attribution factor data to obtain an attribution phenomenon data set and an attribution factor data set corresponding to the attribution phenomenon data set.

可选地，所述计算所述被归因现象数据与所述归因因子数据的关联度，包括：Optionally, the calculating the correlation between the attributable phenomenon data and the attribution factor data includes:

其中，r(X,Y)为所述关联度，X为所述被归因现象数据，Y为第Y个归因因子数据，Cov(X,Y)为所述被归因现象数据与所述归因因子数据之间的协方差，σ_x为所述被归因现象数据的标准差，σ_y为所述归因因子数据的标准差。Among them, r(X, Y) is the degree of correlation, X is the attributable phenomenon data, Y is the Y-th attribution factor data, and Cov(X, Y) is the attributable phenomenon data and all is the covariance between the attribution factor data, σ _x is the standard deviation of the attributable phenomenon data, and σ _y is the standard deviation of the attribution factor data.

可选地，所述计算所述待归因现象预测模型库中各个归因现象预测模型的预测成功率，包括：Optionally, the calculating the prediction success rate of each attribution phenomenon prediction model in the to-be-attributed phenomenon prediction model library includes:

将所述被归因数据集及所述归因因子数据集按照预设的比例划分为训练样本及测试样本；dividing the attribution data set and the attribution factor data set into training samples and test samples according to a preset ratio;

根据所述训练样本对所述预设的归因现象预测模型库中的各个归因现象预测模型进行模型训练，得到多个初始预测模型；Perform model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training sample to obtain a plurality of initial prediction models;

利用每个所述初始预测模型对所述测试样本进行模型预测，得到每个所述初始测试模型的测试数据；Use each of the initial prediction models to perform model prediction on the test samples to obtain test data of each of the initial test models;

将所述每个所述初始测试模型的测试数据与所述测试样本的被归因现象数据进行差值计算；performing difference calculation between the test data of each of the initial test models and the attributable phenomenon data of the test sample;

将所述差值小于预设阈值的测试数据确定为正确预测数据，并计算每个所述初始测试模型的测试数据中所述正确预测数据的比例，得到所述各个归因现象预测模型的预测成功率。Determine the test data whose difference is less than the preset threshold as correct prediction data, and calculate the proportion of the correct prediction data in the test data of each of the initial test models to obtain the predictions of the respective attribution phenomenon prediction models. Success rate.

可选地，所述利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，包括：Optionally, calculating the contribution of each attribution factor data in the attribution factor data set to the attributable phenomenon data by using the model interpretation algorithm includes:

计算所述归因因子数据集中各个归因因子数据的标准差，并根据所述标准差确定所述各个归因因子数据的扰动范围；calculating the standard deviation of each attribution factor data in the attribution factor data set, and determining the disturbance range of each attribution factor data according to the standard deviation;

根据所述扰动范围对所述各个归因因子数据进行数据扰动，得到所述各个归因因子的新数据；Perform data perturbation on each attribution factor data according to the perturbation range to obtain new data of each attribution factor;

基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，得到所述各个归因因子数据所对应的权重；Based on the model interpretation algorithm, the target linear regression model is obtained by training the new data of the attribution factors, and the weights corresponding to the attribution factor data are obtained;

将所述各个归因因子数据与所述相对应的权重相乘，得到所述各个归因因子数据的贡献度。The respective attribution factor data is multiplied by the corresponding weight to obtain the contribution of the respective attribution factor data.

可选地，所述基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，包括：Optionally, the target linear regression model is obtained by training the new data of each attribution factor based on the model interpretation algorithm, including:

分别计算所述各个归因因子数据与所述各个归因因子的新数据之间的距离，并将所述距离值作为所述各个归因因子的新数据的权重；Calculate the distances between the respective attribution factor data and the new data of the respective attribution factors, and use the distance value as the weight of the new data of the respective attribution factors;

利用所述最优归因现象模型对所述各个归因因子的新数据进行被归因现象预测，得到被归因现象预测数据，并将所述被归因现象预测数据作为所述各个归因因子的新数据对应的标签数据；Use the optimal attribution phenomenon model to perform attribution phenomenon prediction on the new data of each attribution factor, obtain attributable phenomenon prediction data, and use the attributable phenomenon prediction data as the attributions The label data corresponding to the new data of the factor;

基于预设模型解释算法采用所述标签数据和带权重的所述各个归因因子的新数据训练得到目标线性回归模型。The target linear regression model is obtained by training the label data and the weighted new data of each attribution factor based on a preset model interpretation algorithm.

为了解决上述问题，本发明还提供一种智能归因分析装置，所述装置包括：In order to solve the above problems, the present invention also provides an intelligent attribution analysis device, the device includes:

标准数据集获取模块，用于获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；a standard data set obtaining module, used for obtaining an initial data set, and performing data replacement processing on outliers in the initial data set to obtain a standard data set;

标准数据集划分模块，用于利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；a standard data set dividing module, configured to use a pre-built variable library to divide the standard data set into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set;

模型预测成功率计算模块，用于将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；The model prediction success rate calculation module is used to import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate each attribute in the attribution phenomenon prediction model library. The prediction success rate of the phenomenon-based prediction model;

最优归因现象预测模型确定模块，用于根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；an optimal attribution phenomenon prediction model determination module, configured to determine an optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

归因因子数据贡献度计算模块，用于根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。The attribution factor data contribution degree calculation module is used to select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate each attribution factor data in the attribution factor data set. Contribution to the attributed phenomenon data.

为了解决上述问题，本发明还提供一种电子设备，所述电子设备包括：In order to solve the above problems, the present invention also provides an electronic device, the electronic device includes:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的计算机程序，所述计算机程序被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述所述的智能归因分析方法。The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to execute the intelligent attribution analysis method described above .

为了解决上述问题，本发明还提供一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一个计算机程序，所述至少一个计算机程序被电子设备中的处理器执行以实现上述所述的智能归因分析方法。In order to solve the above problems, the present invention also provides a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is executed by a processor in an electronic device to realize the above-mentioned The intelligent attribution analysis method described above.

本发明实施例通过预先构建的变量库将标准数据集分为被归因现象数据集及对应的归因因子数据集，根据不同的行业变量将数据划分，有利于提高数据的准确性；将被归因现象数据集及对应的归因因子数据集导入预设的归因现象预测模型库中，计算各个归因现象预测模型的预测成功率；再根据预测成功率从归因现象预测模型库中确定最优归因现象预测模型，根据不同的数据选取与数据最符合的最优归因现象预测模型，进一步地提高归因分析的准确度；根据最优归因现象预测模型的类型选取对应的模型解释算法，利用模型解释算法计算计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，得到更加准确的归因分析结果。因此本发明提出的智能归因分析方法、装置、电子设备及计算机可读存储介质，可以解决进行归因分析时的准确度较低的问题。In the embodiment of the present invention, the standard data set is divided into the attributable phenomenon data set and the corresponding attribution factor data set through a pre-built variable library, and the data is divided according to different industry variables, which is beneficial to improve the accuracy of the data; The attribution phenomenon data set and the corresponding attribution factor data set are imported into the preset attribution phenomenon prediction model library, and the prediction success rate of each attribution phenomenon prediction model is calculated; Determine the optimal attribution phenomenon prediction model, select the optimal attribution phenomenon prediction model that is most consistent with the data according to different data, and further improve the accuracy of attribution analysis; The model interpretation algorithm uses the model interpretation algorithm to calculate and calculate the contribution of each attribution factor data in the attribution factor data set to the attribution phenomenon data, so as to obtain a more accurate attribution analysis result. Therefore, the intelligent attribution analysis method, device, electronic device and computer-readable storage medium proposed by the present invention can solve the problem of low accuracy when performing attribution analysis.

附图说明Description of drawings

图1为本发明一实施例提供的智能归因分析方法的流程示意图；1 is a schematic flowchart of an intelligent attribution analysis method provided by an embodiment of the present invention;

图2为本发明一实施例提供的标准数据集划分的流程示意图；FIG. 2 is a schematic flowchart of standard data set division provided by an embodiment of the present invention;

图3为本发明一实施例提供的计算模型预测成功率的流程示意图；3 is a schematic flowchart of a calculation model predicting a success rate according to an embodiment of the present invention;

图4为本发明一实施例提供的计算归因因子数据贡献度的流程示意图；FIG. 4 is a schematic flowchart of calculating attribution factor data contribution degree according to an embodiment of the present invention;

图5为本发明一实施例提供的智能归因分析装置的功能模块图；5 is a functional block diagram of an intelligent attribution analysis device provided by an embodiment of the present invention;

图6为本发明一实施例提供的实现所述智能归因分析方法的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device implementing the intelligent attribution analysis method according to an embodiment of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本申请实施例提供一种智能归因分析方法。所述智能归因分析方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之，所述智能归因分析方法可以由安装在终端设备或服务端设备的软件或硬件来执行，所述软件可以是区块链平台。所述服务端包括但不限于：单台服务器、服务器集群、云端服务器或云端服务器集群等。所述服务器可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDelivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The embodiment of the present application provides an intelligent attribution analysis method. The execution subject of the intelligent attribution analysis method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal. In other words, the intelligent attribution analysis method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server can be an independent server, or can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network) , CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

参照图1所示，为本发明一实施例提供的智能归因分析方法的流程示意图。在本实施例中，所述智能归因分析方法包括：Referring to FIG. 1 , it is a schematic flowchart of an intelligent attribution analysis method provided by an embodiment of the present invention. In this embodiment, the intelligent attribution analysis method includes:

S1、获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；S1, obtain an initial data set, perform data replacement processing on the abnormal values in the initial data set, and obtain a standard data set;

本发明实施例中，所述初始数据集为不同行业的不同业务数据集，例如，互联网广告行业的相关数据、保险行业的相关数据，智能客服场景下的相关数据等。In the embodiment of the present invention, the initial data set is different business data sets of different industries, for example, relevant data of the Internet advertising industry, relevant data of the insurance industry, and relevant data in the scenario of intelligent customer service.

详细地，所述对所述初始数据集中的异常值进行数据替换处理，得到标准数据集，包括:In detail, the described abnormal values in the initial data set are carried out data replacement processing to obtain a standard data set, including:

本发明实施例中，所述局部可达密度为每个初始数据的邻域数据到所述初始数据的平均距离的倒数，若所述局部可达密度比值大于预设密度比值阈值时，则可认为所述邻近数据与所述初始数据为同一簇，不是异常值。In this embodiment of the present invention, the local reachable density is the reciprocal of the average distance from the neighborhood data of each initial data to the initial data, and if the local reachable density ratio is greater than the preset density ratio threshold, it can be It is considered that the adjacent data is the same cluster as the initial data and is not an outlier.

详细地，本发明实施例中，利用如下公式计算所述初始数据集中每个初始数据与邻近数据的局部可达密度比值：In detail, in this embodiment of the present invention, the following formula is used to calculate the local reachable density ratio of each initial data in the initial data set to the adjacent data:

其中，LOF_k(P)为局部可达密度比值，N_k(P)为所述初始数据集的第P个初始数据，ρ_k(P)为所述第P个数据的局部可达密度，ρ_k(O)为所述邻域数据O的平均局部可达密度，d_k(P,O)为第P个初始数据到邻域数据O之间的距离。Among them, LOF _k (P) is the local reachable density ratio, N _k (P) is the P-th initial data of the initial data set, ρ _k (P) is the local reachable density of the P-th data, ρ _k (O) is the average local reachability density of the neighborhood data O, and d _k (P, O) is the distance between the P-th initial data and the neighborhood data O.

本发明实施例中，通过将所述所述初始数据集中的异常值进行数据替换处理，剔除初始数据集中的异常数据，保证初始数据集中数据的合理性，提高后续模型选择的准确度。In the embodiment of the present invention, by performing data replacement processing on the abnormal values in the initial data set, the abnormal data in the initial data set is eliminated, the rationality of the data in the initial data set is ensured, and the accuracy of subsequent model selection is improved.

S2、利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；S2, using a pre-built variable library to divide the standard data set into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set;

本发明实施例中，所述被归因现象数据集中为不可定量的元素，所述归因因子数据集为影响所述被归因现象数据的多个可能因素，所述预先构建的变量库中包含多个提前确定的被归因现象数据。In this embodiment of the present invention, the attributable phenomenon data set includes unquantifiable elements, the attribution factor data set is a plurality of possible factors that affect the attributable phenomenon data, and the pre-built variable library contains Contains data for multiple pre-determined attributed phenomena.

详细地，参阅图2所示，所述利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集，包括：In detail, referring to Fig. 2, the standard data set is divided into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set by using a pre-built variable library, include:

S21、将所述标准数据集中的变量数据与预先构建的变量库进行对比，将所述标准数据集中与所述变量库中一致的变量数据确定为被归因现象数据，以及将所述所述标准数据集中与所述变量库中不一致的变量数据确定为归因因子数据；S21. Compare the variable data in the standard data set with a pre-built variable database, determine the variable data in the standard data set that is consistent with the variable database as the attributable phenomenon data, and compare the variable data in the standard data set with the variable database. The variable data in the standard data set that is inconsistent with the variable database is determined as attribution factor data;

S22、计算所述被归因现象数据与所述归因因子数据的关联度，将所述关联度大于预设阈值的归因因子数据确定为与所述被归因现象数据相对应的目标归因因子数据；S22. Calculate the degree of association between the attributable phenomenon data and the attribution factor data, and determine the attribution factor data whose association degree is greater than a preset threshold as the target attribute corresponding to the attributable phenomenon data factor data;

S23、汇集所述被归因现象数据及所述目标归因因子数据，得到被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集。S23. Collect the attribution phenomenon data and the target attribution factor data to obtain an attribution phenomenon data set and an attribution factor data set corresponding to the attribution phenomenon data set.

进一步地，本发明实施例中，所述计算所述被归因现象数据与所述归因因子数据的关联度，包括：Further, in this embodiment of the present invention, the calculating the correlation degree between the attributable phenomenon data and the attribution factor data includes:

具体地，在不同的行业数据中被归因变量也不同，因此要根据实施例的具体行业数据选择对应的预先构建的变量库，根据所述预先构建的变量库将所述标准数据集分为被归因现象数据集以及归因因子数据集。Specifically, the attributable variables are also different in different industry data. Therefore, a corresponding pre-built variable library should be selected according to the specific industry data of the embodiment, and the standard data set should be divided into two groups according to the pre-built variable library. Attributed phenomenon dataset and attribution factor dataset.

例如，本发明实施例中，在广告行业中所述被归因变量可以是广告收入，所述对应的目标归因因子则可以是广告点击率、用户浏览时间等，保险行业中所述被归因变量可以是续保率，则所对应的目标归因因子可以是客户平均在保年龄、客户年龄构成、各投保渠道的续保率等。For example, in this embodiment of the present invention, in the advertising industry, the attribution variable may be advertising revenue, and the corresponding target attribution factor may be advertising click-through rate, user browsing time, etc., and the attribution variable in the insurance industry The dependent variable can be the renewal rate, and the corresponding target attribution factor can be the average insured age of the customer, the age composition of the customer, and the renewal rate of each insurance channel.

S3、将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；S3. Import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the success of the prediction of each attribution phenomenon prediction model in the attribution phenomenon prediction model library Rate;

本发明实施例中，所述预设的归因现象预测模型库可以包含上百个具有预测功能的预测模型，包括但不限于时间衰减归因模型、贝叶斯模型、XGBoost、线性回归模型、全连接神经网络等具有预测功能的模型。In this embodiment of the present invention, the preset attribution phenomenon prediction model library may include hundreds of prediction models with prediction functions, including but not limited to time decay attribution models, Bayesian models, XGBoost, linear regression models, Models with predictive capabilities such as fully connected neural networks.

详细地，参阅图3所示，所述计算所述待归因现象预测模型库中各个归因现象预测模型的预测成功率，包括：In detail, referring to Fig. 3, the calculation of the prediction success rate of each attribution phenomenon prediction model in the to-be-attributed phenomenon prediction model library includes:

S31、将所述被归因数据集及所述归因因子数据集按照预设的比例划分为训练样本及测试样本；S31. Divide the attribution data set and the attribution factor data set into training samples and test samples according to a preset ratio;

S32、根据所述训练样本对所述预设的归因现象预测模型库中的各个归因现象预测模型进行模型训练，得到多个初始预测模型；S32, performing model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training sample, to obtain a plurality of initial prediction models;

S33、利用每个所述初始预测模型对所述测试样本进行模型预测，得到每个所述初始测试模型的测试数据；S33, utilize each described initial prediction model to carry out model prediction to described test sample, obtain the test data of each described initial test model;

S34、将所述每个所述初始测试模型的测试数据与所述测试样本的被归因现象数据进行差值计算；S34, performing difference calculation between the test data of each of the initial test models and the attributable phenomenon data of the test sample;

S35、将所述差值小于预设阈值的测试数据确定为正确预测数据，并计算每个所述初始测试模型的测试数据中所述正确预测数据的比例，得到所述各个归因现象预测模型的预测成功率。S35. Determine the test data whose difference is less than a preset threshold as correct prediction data, and calculate the proportion of the correct prediction data in the test data of each initial test model to obtain the each attribution phenomenon prediction model prediction success rate.

具体地，本发明实施例中，所述预设阈值可以根据不同的被归因现象数据集设置为不同的阈值，例如，若所述被归因现象数据为续保率，则所述阈值可以为0.01。Specifically, in this embodiment of the present invention, the preset threshold may be set to different thresholds according to different attributable phenomenon data sets. For example, if the attributable phenomenon data is the renewal rate, the threshold may be is 0.01.

S4、根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；S4, determining the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

本发明实施例中，根据所述模型成功率在所述预设的归因现象预测模型库中选取与所述标准数据集最匹配的归因现象预测模型，即最优归因现象预测模型。In the embodiment of the present invention, an attribution phenomenon prediction model that best matches the standard data set is selected from the preset attribution phenomenon prediction model library according to the model success rate, that is, an optimal attribution phenomenon prediction model.

例如，本发明实际一应用场景中，所述预设的归因现象预测模型库中包含时间衰减归因模型、贝叶斯模型、XGBoost、线性模型、全连接神经网络等，其中时间衰减归因模型的预测成功率为89％，贝叶斯模型的预测成功率为85％，XGBoost的预测成功率为89％，线性模型的预测成功率为78％，全连接神经网络的预测成功率为94％，则确定全连接神经网络为最优归因现象预测模型。For example, in an actual application scenario of the present invention, the preset attribution phenomenon prediction model library includes a time decay attribution model, a Bayesian model, XGBoost, a linear model, a fully connected neural network, etc., wherein the time decay attribution model The prediction success rate of the model is 89%, the prediction success rate of the Bayesian model is 85%, the prediction success rate of XGBoost is 89%, the prediction success rate of the linear model is 78%, and the prediction success rate of the fully connected neural network is 94%. %, the fully connected neural network is determined as the optimal attribution phenomenon prediction model.

本发明实施例中，通过所述模型预测成功率确定最优归因现象预测模型，能够根据不同的行业数据选取不同的被归因现象预测模型，提高不同行业数据的适用范围，同时保证后续归因因子数据贡献度计算的准确性。In the embodiment of the present invention, the optimal attribution phenomenon prediction model is determined by the model prediction success rate, different attribution phenomenon prediction models can be selected according to different industry data, the applicable scope of different industry data is improved, and the subsequent attribution phenomenon is guaranteed at the same time. Due to the accuracy of factor data contribution calculation.

S5、根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。S5. Select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the difference between the attribution factor data in the attribution factor data set and the attribution phenomenon data. contribution.

本发明实施例中，所述相对应的模型解释算法是对所述最优归因现象预测模型中的变量即归因因子进行贡献度预估，即各个所述归因因子数据对所述最优归因现象预测模型的预测结果所起的作用大小。In the embodiment of the present invention, the corresponding model interpretation algorithm is to estimate the contribution degree of the variables in the optimal attribution phenomenon prediction model, namely attribution factors, that is, each attribution factor data contributes to the optimal attribution phenomenon. The size of the role played by the prediction results of the optimal attribution phenomenon prediction model.

具体地，本发明实施例中可以根据所述最优归因现象预测模型的类型在预先存储的模型解释算法库中调用与所述最优归因现象预测模型相对应的模型解释算法。Specifically, in this embodiment of the present invention, a model interpretation algorithm corresponding to the optimal attribution phenomenon prediction model may be called from a pre-stored model interpretation algorithm library according to the type of the optimal attribution phenomenon prediction model.

详细地，参阅图4所示，所述利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，包括：In detail, referring to FIG. 4 , the calculation of the contribution of each attribution factor data in the attribution factor data set to the attributable phenomenon data by using the model interpretation algorithm includes:

S51、计算所述归因因子数据集中各个归因因子数据的标准差，并根据所述标准差确定所述各个归因因子数据的扰动范围；S51. Calculate the standard deviation of each attribution factor data in the attribution factor data set, and determine the disturbance range of each attribution factor data according to the standard deviation;

S52、根据所述扰动范围对所述各个归因因子数据进行数据扰动，得到所述各个归因因子的新数据；S52. Perform data perturbation on each attribution factor data according to the perturbation range to obtain new data of each attribution factor;

S53、基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，得到所述各个归因因子数据所对应的权重；S53, adopting the new data training of each attribution factor based on the model interpretation algorithm to obtain a target linear regression model, and obtaining the weight corresponding to the each attribution factor data;

S54、将所述各个归因因子数据与所述相对应的权重相乘，得到所述各个归因因子数据的贡献度。S54. Multiply the respective attribution factor data by the corresponding weight to obtain the contribution degree of the respective attribution factor data.

进一步地，本发明实施例中，所述基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，包括：Further, in the embodiment of the present invention, the target linear regression model is obtained by training the new data of each attribution factor based on the model interpretation algorithm, including:

例如，本发明实际一应用场景中，在所述最优归因现象预测模型为全连接神经网络时，可调用Deeplift算法对所述最优归因现象预测模型进行解释，在所述最优归因现象预测模型为XGBoost模型时，可以调用Shapley Value(沙普利值)对XGBoost模型进行解释，得到所述最优归因现象预测模型中各个归因因子的贡献度。For example, in an actual application scenario of the present invention, when the optimal attribution phenomenon prediction model is a fully connected neural network, the Deeplift algorithm can be invoked to explain the optimal attribution phenomenon prediction model. When the phenomenon prediction model is the XGBoost model, the Shapley Value can be called to interpret the XGBoost model, and the contribution of each attribution factor in the optimal attribution phenomenon prediction model can be obtained.

本发明实施例中，通过模型解释算法得到各个归因因子数据对被归因现象数据的贡献度，找到行业数据主要源自于怎样的用户行为，根据所述各个归因因子的贡献度做出相对应的应对策略，例如，在保险行业数据中，发现进行续保率下降，利用所述智能归因分析方法计算得到主要是由于某保险渠道的老客户投诉率上升所导致。In the embodiment of the present invention, the contribution degree of each attribution factor data to the attributable phenomenon data is obtained through the model interpretation algorithm, and the user behavior that the industry data is mainly derived from is found, and the contribution degree of each attribution factor is made according to the contribution degree of each attribution factor. Corresponding coping strategies, for example, in the insurance industry data, it is found that the rate of renewal of insurance has decreased, and it is calculated by using the intelligent attribution analysis method that it is mainly caused by the increase in the complaint rate of old customers of an insurance channel.

本发明实施例通过预先构建的变量库将标准数据集分为被归因现象数据集及对应的归因因子数据集，根据不同的行业变量将数据划分，有利于提高数据的准确性；将被归因现象数据集及对应的归因因子数据集导入预设的归因现象预测模型库中，计算各个归因现象预测模型的预测成功率；再根据预测成功率从归因现象预测模型库中确定最优归因现象预测模型，根据不同的数据选取与数据最符合的最优归因现象预测模型，进一步地提高归因分析的准确度；根据最优归因现象预测模型的类型选取对应的模型解释算法，利用模型解释算法计算计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，得到更加准确的归因分析结果。因此本发明提出的智能归因分析方法，可以解决进行归因分析时的准确度较低的问题。In the embodiment of the present invention, the standard data set is divided into the attributable phenomenon data set and the corresponding attribution factor data set through a pre-built variable library, and the data is divided according to different industry variables, which is beneficial to improve the accuracy of the data; The attribution phenomenon data set and the corresponding attribution factor data set are imported into the preset attribution phenomenon prediction model library, and the prediction success rate of each attribution phenomenon prediction model is calculated; Determine the optimal attribution phenomenon prediction model, select the optimal attribution phenomenon prediction model that is most consistent with the data according to different data, further improve the accuracy of attribution analysis; select the corresponding attribution phenomenon prediction model according to the type of the optimal attribution phenomenon The model interpretation algorithm uses the model interpretation algorithm to calculate and calculate the contribution of each attribution factor data in the attribution factor data set to the attribution phenomenon data, so as to obtain a more accurate attribution analysis result. Therefore, the intelligent attribution analysis method proposed by the present invention can solve the problem of low accuracy in attribution analysis.

如图5所示，是本发明一实施例提供的智能归因分析装置的功能模块图。As shown in FIG. 5 , it is a functional block diagram of an intelligent attribution analysis device provided by an embodiment of the present invention.

本发明所述智能归因分析装置100可以安装于电子设备中。根据实现的功能，所述智能归因分析装置100可以包括标准数据集获取模块101、标准数据集划分模块102、模型预测成功率计算模块103、最优归因现象预测模型确定模块104及归因因子数据贡献度计算模块105。本发明所述模块也可以称之为单元，是指一种能够被电子设备处理器所执行，并且能够完成固定功能的一系列计算机程序段，其存储在电子设备的存储器中。The intelligent attribution analysis apparatus 100 of the present invention can be installed in an electronic device. According to the realized functions, the intelligent attribution analysis device 100 may include a standard data set acquisition module 101, a standard data set division module 102, a model prediction success rate calculation module 103, an optimal attribution phenomenon prediction model determination module 104, and an attribution model The factor data contribution degree calculation module 105 . The modules in the present invention can also be called units, which refer to a series of computer program segments that can be executed by the electronic device processor and can perform fixed functions, and are stored in the memory of the electronic device.

在本实施例中，关于各模块/单元的功能如下：In this embodiment, the functions of each module/unit are as follows:

所述标准数据集获取模块101，用于获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；The standard data set obtaining module 101 is configured to obtain an initial data set, and perform data replacement processing on outliers in the initial data set to obtain a standard data set;

所述标准数据集划分模块102，用于利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；The standard data set dividing module 102 is configured to use a pre-built variable library to divide the standard data set into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set;

所述模型预测成功率计算模块103，用于将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；The model prediction success rate calculation module 103 is used to import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the attribution phenomenon prediction model library The prediction success rate of each attribution phenomenon prediction model;

所述最优归因现象预测模型确定模块104，用于根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；The optimal attribution phenomenon prediction model determining module 104 is configured to determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

所述归因因子数据贡献度计算模块105，用于根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。The attribution factor data contribution degree calculation module 105 is configured to select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate each attribution factor in the attribution factor data set. The contribution of the factor data to the attributable phenomenon data.

详细地，本发明实施例中所述智能归因分析装置100中所述的各模块在使用时采用与上述图1至图4中所述的智能归因分析方法一样的技术手段，并能够产生相同的技术效果，这里不再赘述。In detail, each module described in the intelligent attribution analysis device 100 in the embodiment of the present invention adopts the same technical means as the intelligent attribution analysis method described in FIG. 1 to FIG. 4, and can generate The same technical effect will not be repeated here.

如图6所示，是本发明一实施例提供的实现智能归因分析方法的电子设备的结构示意图。As shown in FIG. 6 , it is a schematic structural diagram of an electronic device for implementing an intelligent attribution analysis method provided by an embodiment of the present invention.

所述电子设备1可以包括处理器10、存储器11、通信总线12以及通信接口13，还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序，如智能归因分析程序。The electronic device 1 may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as smart attribution Analysis program.

其中，所述处理器10在一些实施例中可以由集成电路组成，例如可以由单个封装的集成电路所组成，也可以是由多个相同功能或不同功能封装的集成电路所组成，包括一个或者多个中央处理器(Central Processing unit，CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(ControlUnit)，利用各种接口和线路连接整个电子设备的各个部件，通过运行或执行存储在所述存储器11内的程序或者模块(例如执行智能归因分析程序等)，以及调用存储在所述存储器11内的数据，以执行电子设备的各种功能和处理数据。The processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or A combination of multiple central processing units (Central Processing Units, CPUs), microprocessors, digital processing chips, graphics processors, and various control chips, etc. The processor 10 is the control core (ControlUnit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, and by running or executing the program or module stored in the memory 11 (for example, executing intelligent Attribution analysis program, etc.), and call data stored in the memory 11 to perform various functions of the electronic device and process data.

所述存储器11至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如：SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备的内部存储单元，例如该电子设备的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备的外部存储设备，例如电子设备上配备的插接式移动硬盘、智能存储卡(Smart Media Card，SMC)、安全数字(Secure Digital，SD)卡、闪存卡(Flash Card)等。进一步地，所述存储器11还可以既包括电子设备的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备的应用软件及各类数据，例如智能归因分析程序的代码等，还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. . In some embodiments, the memory 11 may be an internal storage unit of an electronic device, such as a mobile hard disk of the electronic device. In other embodiments, the memory 11 may also be an external storage device of the electronic device, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) equipped on the electronic device. ) card, flash memory card (Flash Card) and so on. Further, the memory 11 may also include both an internal storage unit of an electronic device and an external storage device. The memory 11 can not only be used to store application software installed in the electronic device and various types of data, such as codes of intelligent attribution analysis programs, etc., but also can be used to temporarily store data that has been output or will be output.

所述通信总线12可以是外设部件互连标准(peripheral componentinterconnect，简称PCI)总线或扩展工业标准结构(extended industry standardarchitecture，简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The communication bus 12 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to enable connection communication between the memory 11 and at least one processor 10 and the like.

所述通信接口13用于上述电子设备与其他设备之间的通信，包括网络接口和用户接口。可选地，所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等)，通常用于在该电子设备与其他电子设备之间建立通信连接。所述用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard))，可选地，用户接口还可以是标准的有线接口、无线接口。可选地，在一些实施例中，显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode，有机发光二极管)触摸器等。其中，显示器也可以适当的称为显示屏或显示单元，用于显示在电子设备中处理的信息以及用于显示可视化的用户界面。The communication interface 13 is used for communication between the above electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (eg, a WI-FI interface, a Bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a display (Display), an input unit (such as a keyboard (Keyboard)), and optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. The display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device and for displaying a visual user interface.

图中仅示出了具有部件的电子设备，本领域技术人员可以理解的是，图中示出的结构并不构成对所述电子设备的限定，可以包括比图示更少或者更多的部件，或者组合某些部件，或者不同的部件布置。The figure only shows an electronic device with components, and those skilled in the art can understand that the structure shown in the figure does not constitute a limitation on the electronic device, and may include fewer or more components than those shown in the figure. , or a combination of certain components, or a different arrangement of components.

例如，尽管未示出，所述电子设备还可以包括给各个部件供电的电源(比如电池)，优选地，电源可以通过电源管理装置与所述至少一个处理器10逻辑相连，从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备还可以包括多种传感器、蓝牙模块、Wi-Fi模块等，在此不再赘述。For example, although not shown, the electronic device may also include a power source (such as a battery) for powering the various components, preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that the power management device Implement functions such as charge management, discharge management, and power management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The electronic device may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

应该了解，所述实施例仅为说明之用，在专利申请范围上并不受此结构的限制。It should be understood that the embodiments are only used for illustration, and are not limited by this structure in the scope of the patent application.

所述电子设备1中的所述存储器11存储的智能归因分析程序是多个指令的组合，在所述处理器10中运行时，可以实现：The intelligent attribution analysis program stored in the memory 11 of the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, can realize:

具体地，所述处理器10对上述指令的具体实现方法可参考附图对应实施例中相关步骤的描述，在此不赘述。Specifically, for the specific implementation method of the above-mentioned instruction by the processor 10, reference may be made to the description of the relevant steps in the embodiment corresponding to the accompanying drawings, which will not be repeated here.

进一步地，所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读存储介质中。所述计算机可读存储介质可以是易失性的，也可以是非易失性的。例如，所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)。Further, if the modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, a read-only memory (ROM, Read-Only). Memory).

本发明还提供一种计算机可读存储介质，所述可读存储介质存储有计算机程序，所述计算机程序在被电子设备的处理器所执行时，可以实现：The present invention also provides a computer-readable storage medium, where the readable storage medium stores a computer program, and when executed by a processor of an electronic device, the computer program can realize:

在本发明所提供的几个实施例中，应该理解到，所揭露的设备，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention.

因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim.

本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in the present invention is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process related data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一、第二等词语用来表示名称，而并不表示任何特定的顺序。Furthermore, it is clear that the word "comprising" does not exclude other units or steps and the singular does not exclude the plural. Several units or means recited in the system claims can also be realized by one unit or means by means of software or hardware. The words first, second, etc. are used to denote names and do not denote any particular order.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. an intelligent attribution analysis method, characterized in that, the method comprises:

Obtaining an initial data set, performing data replacement processing on outliers in the initial data set, and obtaining a standard data set;

Using a pre-built variable library to divide the standard data set into an attribution phenomenon data set and an attribution factor data set corresponding to the attribution phenomenon data set;

Import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

Determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

Select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributable phenomenon data .

2. intelligent attribution analysis method as claimed in claim 1, is characterized in that, described to the abnormal value in described initial data set is carried out data replacement processing, obtains standard data set, comprises:

calculating the local reachable density ratio between each initial data in the initial data set and the neighborhood data of the initial data;

When the local density ratio is less than or equal to a preset density ratio threshold, determining that the initial data is an abnormal value;

The abnormal value is subjected to data replacement processing using a preset correct data set to obtain a standard data set.

3 . The intelligent attribution analysis method according to claim 1 , wherein the standard data set is divided into attribution phenomenon data sets and attribution phenomenon data sets by using a pre-built variable library. 4 . The attribution factor dataset corresponding to the set, including:

comparing the variable data in the standard data set with a pre-built variable library, determining the variable data in the standard data set that is consistent with the variable library as attributable phenomenon data, and comparing the standard data The variable data that is inconsistent with the variable library is determined as attribution factor data;

Calculate the correlation degree between the attribution phenomenon data and the attribution factor data, and determine the attribution factor data whose correlation degree is greater than a preset threshold as the target attribution factor corresponding to the attribution phenomenon data data;

Collecting the attribution phenomenon data and the target attribution factor data to obtain an attribution phenomenon data set and an attribution factor data set corresponding to the attribution phenomenon data set.

4. The intelligent attribution analysis method according to claim 3, wherein the calculating the correlation between the attribution phenomenon data and the attribution factor data comprises:

Among them, r(X, Y) is the degree of correlation, X is the attributable phenomenon data, Y is the Y-th attribution factor data, and Cov(X, Y) is the attributable phenomenon data and all is the covariance between the attribution factor data, σ _x is the standard deviation of the attributable phenomenon data, and σ _y is the standard deviation of the attribution factor data.

5. The intelligent attribution analysis method according to claim 1, wherein the calculation of the prediction success rate of each attribution phenomenon prediction model in the to-be-attributed phenomenon prediction model library comprises:

dividing the attribution data set and the attribution factor data set into training samples and test samples according to a preset ratio;

Perform model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training sample to obtain a plurality of initial prediction models;

Use each of the initial prediction models to perform model prediction on the test samples to obtain test data of each of the initial test models;

performing difference calculation between the test data of each of the initial test models and the attributable phenomenon data of the test sample;

Determine the test data whose difference is less than the preset threshold as correct prediction data, and calculate the proportion of the correct prediction data in the test data of each of the initial test models to obtain the predictions of the respective attribution phenomenon prediction models. Success rate.

6 . The intelligent attribution analysis method according to claim 1 , wherein the model interpretation algorithm is used to calculate the contribution of each attribution factor data in the attribution factor data set to the attributable phenomenon data. 7 . degrees, including:

calculating the standard deviation of each attribution factor data in the attribution factor data set, and determining the disturbance range of each attribution factor data according to the standard deviation;

Perform data perturbation on each attribution factor data according to the perturbation range to obtain new data of each attribution factor;

Based on the model interpretation algorithm, the target linear regression model is obtained by training the new data of the attribution factors, and the weights corresponding to the attribution factor data are obtained;

The respective attribution factor data is multiplied by the corresponding weight to obtain the contribution of the respective attribution factor data.

7. The intelligent attribution analysis method according to any one of claims 1 to 6, wherein the model interpretation algorithm adopts the new data training of the respective attribution factors to obtain a target linear regression model, include:

Calculate the distances between the respective attribution factor data and the new data of the respective attribution factors, and use the distance value as the weight of the new data of the respective attribution factors;

Use the optimal attribution phenomenon model to perform attribution phenomenon prediction on the new data of each attribution factor, obtain attributable phenomenon prediction data, and use the attributable phenomenon prediction data as the attributions The label data corresponding to the new data of the factor;

The target linear regression model is obtained by training the label data and the weighted new data of each attribution factor based on a preset model interpretation algorithm.

8. An intelligent attribution analysis device, wherein the device comprises:

a standard data set obtaining module, used for obtaining an initial data set, and performing data replacement processing on outliers in the initial data set to obtain a standard data set;

a standard data set dividing module, configured to use a pre-built variable library to divide the standard data set into an attributable phenomenon data set and an attribution factor data set corresponding to the attributable phenomenon data set;

The model prediction success rate calculation module is used to import the attribution phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate each attribute in the attribution phenomenon prediction model library. The prediction success rate of the phenomenon-based prediction model;

an optimal attribution phenomenon prediction model determination module, configured to determine an optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

The attribution factor data contribution degree calculation module is used to select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate each attribution factor data in the attribution factor data set. Contribution to the attributed phenomenon data.

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform any one of claims 1 to 7 The intelligent attribution analysis method described in item.

10 . A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the intelligent attribution analysis method according to any one of claims 1 to 7 is implemented. 11 .