WO2021114820A1 - 多方联合进行风险识别的方法和装置 - Google Patents

多方联合进行风险识别的方法和装置 Download PDF

Info

Publication number
WO2021114820A1
WO2021114820A1 PCT/CN2020/118006 CN2020118006W WO2021114820A1 WO 2021114820 A1 WO2021114820 A1 WO 2021114820A1 CN 2020118006 W CN2020118006 W CN 2020118006W WO 2021114820 A1 WO2021114820 A1 WO 2021114820A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
site
feature
risk
sub
Prior art date
Application number
PCT/CN2020/118006
Other languages
English (en)
French (fr)
Inventor
宋博文
陈帅
顾曦
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of WO2021114820A1 publication Critical patent/WO2021114820A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Definitions

  • One or more embodiments of this specification relate to the computer field, and in particular to methods and devices for joint risk identification by multiple parties.
  • risk identification is often required.
  • the risk control system the risk of user behavior is judged through the analysis of user characteristics, such as the risks of embezzlement, fraud, and marketing cheating.
  • the basic data sources of these user characteristics are very large. To a degree depends on the user's private information.
  • a variable is designed to characterize user transaction frequency: the user’s current device transaction amount accumulated in the past 7 days.
  • the basic information of the unique identification (ID) of the device used by the user is required, and the user uses The unique identification of the device belongs to the user’s private information.
  • One or more embodiments of this specification describe a method and device for multi-party joint risk identification, which can prevent the leakage of user's private information.
  • a method for joint risk identification by multiple parties includes a first site and a second site, the first site stores feature information in a first feature set of a user, and the second site stores The feature information in the second feature set of the user, the feature information relates to the user’s privacy information, the method is applied to the first site, and includes: obtaining the first child of the security tree model jointly trained with the second site Model; the security tree model also has a second sub-model deployed at the second site; acquiring a third sub-model obtained according to the tree structure corresponding to the preset risk identification strategy; the tree structure also has a deployment at the The fourth sub-model of the second site; when it is determined that the preset risk identification conditions are met, obtain the first feature data of each feature in the first feature set of the target user; input the first feature data into the first sub-model , Obtain the first prediction score, and input the third sub-model to obtain the third prediction score; provide the first prediction score and the third prediction score by means of multi-party computing
  • the acquiring the first sub-model of the security tree model jointly trained with the second site includes: training the security tree model jointly with the second site through an MPC method, Obtain the first sub-model of the security tree model.
  • the acquiring the first sub-model of the security tree model jointly trained with the second site includes: receiving a first model file corresponding to the first sub-model, and the first The model file is a file separated from the total model file of the safety tree model obtained through joint training.
  • the determining that a preset risk identification condition is satisfied includes: receiving an evaluation request, where the evaluation request includes an identifier of the target user.
  • the determining that the preset risk identification condition is satisfied includes: receiving a batch processing request, and the target user is any user in the user set defined by the batch processing request.
  • the MPC includes one of homomorphic encryption and secret sharing.
  • the method before the obtaining the first sub-model of the security tree model jointly trained with the second site, the method further includes: determining the data interaction authority with the second site And/or, determine the feature information in the first feature set and the feature information in the second feature set; and/or, determine that an algorithm consensus has been reached with the second site.
  • the method further includes: during joint training with the second site, recording data of interaction with the second site.
  • the first risk includes a supervised risk
  • the supervised risk is that after the user performs the first behavior, the user can obtain the label of whether the first risk corresponds to the first behavior
  • the characteristic information also relates to user behavior information.
  • the first risk includes an unsupervised risk; the unsupervised risk is that the user cannot obtain the label of whether the first risk corresponding to the second behavior after the second behavior is implemented.
  • Joint training of the safety tree model with the second site includes: obtaining a first sample set for the first risk, the label of each sample in the first sample set is manually defined, or based on each sample The feature distribution of each feature in the high-risk feature set is determined; the first sample set is used to initially jointly train the safety tree model with the second site, and each feature contained in the high-risk feature set is re-determined Using the newly determined feature distribution of each feature in the high-risk feature set to update the label of each sample in the first sample set; based on the updated label, and the second site to jointly train the security again Tree model.
  • a device for joint risk identification by multiple parties includes a first site and a second site, the first site stores feature information in a first feature set of a user, and the second site stores The feature information in the second feature set of the user, the feature information relates to the user’s privacy information, and the device is applied to the first site, and includes: a first acquisition unit, configured to acquire information that is jointly trained with the second site The first sub-model of the security tree model; the security tree model also has a second sub-model deployed at the second site; the second obtaining unit is configured to obtain the first sub-model obtained according to the tree structure corresponding to the preset risk identification strategy Three sub-models; the tree structure also has a fourth sub-model deployed at the second site; the third acquisition unit is used to acquire each item in the first feature set of the target user when it is determined that the preset risk identification conditions are met First feature data of a feature; a prediction unit for inputting the first feature data obtained by the third obtaining unit into the first sub-
  • the third sub-model obtained by the obtaining unit obtains the third prediction score; the joint unit is used to provide the first prediction score and the third prediction score obtained by the prediction unit in a multi-party safe calculation of MPC, thereby Combined with the second prediction score and the fourth prediction score, it is comprehensively determined whether the target user has the first risk; wherein, the second prediction score is that the second site uses the second feature set of the target user The second feature data of the item feature and the second sub-model are obtained, and the fourth prediction score is obtained by the second station using the second feature data and the fourth sub-model.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and the processor implements the method of the first aspect when the executable code is executed by the processor.
  • risk identification is performed by multiple parties.
  • the first sub-model of the security tree model jointly trained with the second site is first obtained; the security tree model also has The second sub-model deployed at the second site; then the third sub-model obtained according to the tree structure corresponding to the preset risk identification strategy is obtained; the tree structure also has a fourth sub-model deployed at the second site ;
  • first feature data of each feature in the first feature set of the target user is obtained; and then the first feature data is input into the first sub-model to obtain the first prediction score , And input the third sub-model to obtain a third prediction score; finally, the first prediction score and the third prediction score are provided through MPC, so as to be combined with the second prediction score and the fourth prediction score, It is comprehensively determined whether the target user has the first risk; wherein, the second prediction score is the second feature data and the second sub-item of each feature in the second feature set
  • the prediction results of each sub-model can be combined to obtain the final risk identification result, ensuring Each site does not have to interact with the user’s private information, which can prevent the leakage of the user’s private information; in addition, not only the model obtained through training is split and deployed, the preset risk identification strategy is also split and deployed to further prevent the disclosure of users Privacy information, and enhance the accuracy of risk identification.
  • Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification.
  • Fig. 2 shows a flow chart of a method for multi-party joint risk identification according to an embodiment.
  • Fig. 3 shows a schematic diagram of the system structure of multi-party joint risk identification according to an embodiment.
  • Fig. 4 shows a schematic diagram of an online deployment link according to an embodiment.
  • Fig. 5 shows a schematic diagram of offline deployment links according to an embodiment.
  • Fig. 6 shows a schematic diagram of a strategy conversion process according to an embodiment.
  • Fig. 7 shows a schematic diagram of a closed-loop multi-party model evolution according to an embodiment.
  • Fig. 8 shows a schematic block diagram of an apparatus for multi-party joint risk identification according to an embodiment.
  • Figure 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in this specification.
  • This implementation scenario involves multi-party joint risk identification.
  • the multiple parties include a first site 11 and a second site 12, the first site 11 stores feature information in a first feature set of the user, and the second site 12 stores features in a second feature set of the user information.
  • the first feature set and the second feature set contain different feature information.
  • the first feature set contains feature 1, feature 2, and the second feature set contains feature 3, feature 4, and feature 5.
  • the information involves the user's private information.
  • the personally identifiable information (PII) information such as address, email address, name, identity ID, etc., that can be located to the user is particularly important.
  • PII personally identifiable information
  • the embodiments of this specification are based on a multi-party computing (multi-party computing, MPC) approach, where multiple parties jointly perform risk identification.
  • MPC multi-party computing
  • it involves multi-party deployment of policies and models, and relies on the deployment form based on a tree structure, which can prevent the disclosure of users' private information.
  • Fig. 2 shows a flow chart of a method for joint risk identification by multiple parties according to an embodiment.
  • the method may be based on the implementation scenario shown in Fig. 1.
  • the multiple parties include a first site and a second site, and the first site stores users.
  • the method for multi-party joint risk identification in this embodiment includes the following steps: Step 21: Obtain a first sub-model of a security tree model jointly trained with the second site; the security tree model also has The second sub-model deployed at the second site; step 22, the third sub-model obtained according to the tree structure corresponding to the preset risk identification strategy is obtained; the tree structure also has a fourth sub-model deployed at the second site Sub-model; step 23, when it is determined that the preset risk identification conditions are met, obtain the first feature data of each feature in the first feature set of the target user; step 24, input the first feature data into the first sub-model , Obtain the first prediction score, and input the third sub-model to obtain the third prediction score; step 25, provide the first prediction score and the third prediction score by multi-party safe calculation of MPC, so as to compare with the first prediction score The second prediction score and the fourth prediction score are combined to comprehensively determine whether the target user has the first risk; wherein, the second prediction score is the use of each feature in the second feature set of the target user
  • a first sub-model of a security tree model jointly trained with the second site is obtained; the security tree model also has a second sub-model deployed at the second site.
  • the security tree model is a general model, and the model can be split into a first sub-model and a second sub-model, and the first sub-model and the second sub-model are respectively deployed at the first site and the second site.
  • the security tree model is jointly trained with the second site by means of MPC to obtain the first sub-model of the security tree model. It is understandable that the MPC method is to complete the related calculations of the joint training by exchanging process parameters and random numbers while protecting the privacy and security of the data and keeping the data out of the domain.
  • a first model file corresponding to the first sub-model is received, where the first model file is a file split from a total model file of a security tree model obtained through a joint training method.
  • the method further includes: determining the data interaction authority with the second site; and/or determining the feature information in the first feature set and the second feature Concentrated feature information; and/or, it is determined that an algorithm consensus has been reached with the second site.
  • the method further includes: during joint training with the second site, recording data of interaction with the second site.
  • a third sub-model obtained according to the tree structure corresponding to the preset risk identification strategy is obtained; the tree structure further has a fourth sub-model deployed at the second site.
  • the preset risk identification strategy can be manually defined.
  • the preset risk identification strategy is (x1>a or x2>b) and y3>c, which can be converted into x1>a and y3>c and x2>b and y3>c are two trees, and each tree corresponds to a sub-model.
  • the preset risk identification strategy is also split into multiple sub-models, which are respectively deployed in multiple sites, which can prevent the disclosure of the user's private information.
  • step 23 when it is determined that the preset risk identification condition is satisfied, first feature data of each feature in the first feature set of the target user is acquired.
  • the preset risk identification condition that is, the trigger condition
  • the preset risk identification condition can be triggered after receiving the request, or can be triggered at a time.
  • the determining that the preset risk identification condition is satisfied includes: receiving an evaluation request, the evaluation request including the identification of the target user.
  • the determining that the preset risk identification condition is satisfied includes: receiving a batch processing request, and the target user is any user in the user set defined by the batch processing request.
  • step 24 input the first feature data into the first sub-model to obtain a first prediction score, and input the third sub-model to obtain a third prediction score.
  • the first feature data is stored on the first site, and the first sub-model and the third sub-model are also deployed on the first site.
  • the first feature data does not need to be transmitted externally, which can prevent the leakage of the user's private information.
  • the first prediction score and the third prediction score are provided by means of multi-party safe calculation of MPC, which are combined with the second prediction score and the fourth prediction score to comprehensively determine whether the target user has the first prediction score.
  • a risk wherein, the second prediction score is obtained by the second site using the second feature data of each feature in the second feature set of the target user and the second sub-model, the fourth prediction score Obtained by using the second feature data and the fourth sub-model for the second site. It is understandable that each party uses the characteristic data of the target user stored by itself to determine the corresponding prediction score, and then integrates the prediction scores of multiple parties to determine whether the target user is at risk, which can prevent the leakage of the user's private information.
  • the MPC includes one of homomorphic encryption and secret sharing.
  • the first risk includes a supervised risk
  • the supervised risk means that a user can obtain a label for whether the first risk corresponding to the first behavior after performing the first behavior; the characteristic information It also involves user behavior information.
  • the first behavior can be a transaction behavior
  • the first risk can be a risk of misappropriation. Usually, this type of risk will be reported by the user after the transaction occurs to obtain the label.
  • the first risk includes an unsupervised risk
  • the unsupervised risk is that the user cannot obtain the label of the first risk corresponding to the second behavior after the user performs the second behavior
  • the second site joint training of the safety tree model includes: obtaining a first sample set for the first risk, and the label of each sample in the first sample set is manually defined or based on the high-risk feature set of each sample The feature distribution of each feature in the set is determined; the first sample set is used to initially jointly train the safety tree model with the second site, and each feature contained in the high-risk feature set is re-determined; The determined feature distribution of each feature in the high-risk feature set is updated to update the label of each sample in the first sample set; based on the updated label, the security tree model is trained again jointly with the second site.
  • the second behavior can be a transaction behavior
  • the first risk can be a marketing cheating risk or a false transaction risk. Normally, this type of risk will not be reported by users after the transaction occurs, so that the label cannot be obtained.
  • the corresponding label can be determined by manual labeling or feature recognition.
  • the method provided in the embodiment of this specification splits the overall model into multiple sub-models, and deploys each sub-model on multiple sites, so that the prediction results of each sub-model can be combined to obtain the final risk identification result, which guarantees Each site does not need to interact with the user’s private information, which can prevent the leakage of the user’s private information; in addition, not only the model obtained through training is split and deployed, the preset risk identification strategy is also split and deployed to further prevent the disclosure of the user’s private information. Privacy information, and enhance the accuracy of risk identification.
  • the MPC in the embodiment of this specification may also be referred to as federated learning.
  • a secure boost (secureboost) federated learning scheme may be adopted.
  • Fig. 3 shows a schematic diagram of the system structure of multi-party joint risk identification according to an embodiment.
  • the architecture includes a configuration layer, a definition layer, and a deployment layer.
  • the configuration layer is mainly composed of three parts: tenant management, which is used to provide management functions for data providers and users, and records the tenants’ operations on data and synchronizes the entire network; variable management, which is used to provide the source of each basic variable (source For which tenant) and the basic definition, online data is connected to the real-time data interface on the terminal, and the offline part is connected to the database on the terminal; algorithm authorization is used to provide the algorithm consensus part of the federated learning, based on the federated learning solution
  • the algorithm is divided into three steps. The first is offline training, which completes model training through the interaction of random numbers and intermediate parameters; the second step is to split the obtained model files and deploy them to each end node; the third step is to Real-time or offline batch prediction is performed on the end node.
  • the running algorithm scheme (such as secureboost) not only needs to meet the security requirements, but also needs to obtain the consensus of each end (make sure that the algorithm does not transmit internal information).
  • the consensus algorithm needs to enter the signature, and the end data intelligently runs on the algorithm component under the signature matching.
  • the definition layer is used to produce algorithm files, including algorithm files obtained from model training and algorithm files for strategy definition.
  • the deployment layer is used to deploy algorithm files in multiple parties to provide prediction services. Including online deployment and offline deployment.
  • some logical operators are connected with and and or.
  • the strategy can be transformed into an integrated tree structure to reuse the online and offline deployment links of the model.
  • strategy (x1>a or x2>b) and y3>c can be converted into two trees: x1>a and y3>c and x2>b and y3>c.
  • the logic is established to go to the right (if there is and logic, then continue to split, otherwise it is recorded as leaf node 1), and the logic is not to go to the left and recorded as leaf node 0.
  • Two different trees are added together. If the final result is greater than 0, then the policy is audited, otherwise, the policy is not audited.
  • the deployment link of the model can be used for multi-party scoring and prediction.
  • Fig. 4 shows a schematic diagram of an online deployment link according to an embodiment.
  • the federated learning process of the multi-party model and the online scoring process are shown.
  • a tree model is obtained, and after splitting, it is deployed on the prediction nodes of data domain A and data domain B.
  • the real-time scoring prediction requests the prediction nodes on both sides, and the prediction nodes read the corresponding features from the real-time feature interface.
  • the prediction node obtains the sub-results on all the sub-models owned by the node, and summarizes it to the prediction node to obtain the final score.
  • the prediction node returns the final score to the consulting party.
  • Fig. 5 shows a schematic diagram of offline deployment links according to an embodiment.
  • the link for offline batch and timing scheduling after the trained model is deployed at the end node is shown.
  • This part of the link needs to be opened up with the same-end database, and batch scoring is performed on the data that runs out of the database at regular intervals.
  • this part of the function also provides a one-time scoring service to evaluate the effectiveness of strategies and models.
  • Fig. 6 shows a schematic diagram of a strategy conversion process according to an embodiment. Referring to Figure 6, after the strategy is converted into a tree, it will be split into sub-models by splitting the service, and the sub-models are deployed on each end for prediction or offline scheduling and scoring.
  • Fig. 7 shows a schematic diagram of a closed-loop multi-party model evolution according to an embodiment.
  • the function of closed-loop model evolution is further proposed on the basis of federated learning multi-party modeling.
  • the multi-party model system can not only identify labeled, supervised risk targets, but can also identify unsupervised risks such as marketing cheating and false transactions, so as to integrate the identification of supervised and unsupervised risks.
  • the optimized high-risk features can further promote the accuracy of unsupervised risk identification.
  • the safety tree model can be continuously iteratively optimized during the offline training or modeling phase.
  • the risk control system based on federated learning can not only solve the risks of multi-party embezzlement, fraud, and other labeled returns, but also prevent and control the risks of unlabeled returns such as marketing cheating and false transactions. It can not only support the model, but also the deployment of compatible strategies. Provides both real-time prediction and offline scoring functions.
  • On the model side there is a complete model optimization process.
  • At the same time because it is a decentralized system, there is only a management function in the center without any data storage. This part of the function can be opened to all institutions that access data sharing, and the management of institutional variables and the algorithm functions that can be used by each institution provide different institutions for different institutions. Risk control services.
  • a device for joint risk identification by multiple parties including a first site and a second site, the first site storing feature information in a first feature set of a user, and The second site stores the feature information in the second feature set of the user.
  • the feature information relates to the user’s privacy information.
  • the device is applied to the first site and is used to perform the multi-party joint risk identification provided by the embodiment of this specification. method.
  • Fig. 8 shows a schematic block diagram of an apparatus for multi-party joint risk identification according to an embodiment. As shown in FIG. 8, the device 800 includes a first obtaining unit 81, a second obtaining unit 82, a third obtaining unit 83, a prediction unit 84 and a combining unit 85.
  • the first obtaining unit 81 is configured to obtain a first sub-model of the security tree model jointly trained with the second site; the security tree model further has a second sub-model deployed at the second site.
  • the second acquiring unit 82 is configured to acquire a third sub-model obtained according to the tree structure corresponding to the preset risk identification strategy; the tree structure further has a fourth sub-model deployed at the second site.
  • the third acquiring unit 83 is configured to acquire the first feature data of each feature in the first feature set of the target user when it is determined that the preset risk identification condition is satisfied.
  • the prediction unit 84 is configured to input the first feature data acquired by the third acquisition unit 83 into the first sub-model acquired by the first acquisition unit 81 to obtain a first prediction score, and input it into the second acquisition unit 82 The third sub-model is obtained, and the third prediction score is obtained.
  • the combining unit 85 is configured to provide the first prediction score and the third prediction score obtained by the prediction unit 84 in a multi-party safe calculation of MPC, so as to combine with the second prediction score and the fourth prediction score to integrate Determine whether the target user has the first risk; wherein, the second prediction score is the second feature data and the second sub-model of each feature in the second feature set of the target user used by the second site Obtained, the fourth prediction score is obtained by the second station using the second feature data and the fourth sub-model.
  • the first obtaining unit 81 is specifically configured to jointly train the security tree model with the second site in an MPC manner to obtain the first sub-model of the security tree model.
  • the first obtaining unit 81 is specifically configured to receive a first model file corresponding to the first sub-model, where the first model file is obtained from a security tree obtained through joint training. A file divided from the overall model file of the model.
  • the determining that a preset risk identification condition is satisfied includes: receiving an evaluation request, where the evaluation request includes an identifier of the target user.
  • the determining that a preset risk identification condition is satisfied includes: receiving a batch processing request, and the target user is any user in a user set defined by the batch processing request.
  • the MPC includes one of homomorphic encryption and secret sharing.
  • the device further includes: a determining unit, configured to determine and before the first obtaining unit 81 obtains the first sub-model of the security tree model jointly trained with the second site The data exchange authority between the second site; and/or determine the feature information in the first feature set and the feature information in the second feature set; and/or determine the relationship with the second site Algorithm consensus has been reached.
  • a determining unit configured to determine and before the first obtaining unit 81 obtains the first sub-model of the security tree model jointly trained with the second site The data exchange authority between the second site; and/or determine the feature information in the first feature set and the feature information in the second feature set; and/or determine the relationship with the second site Algorithm consensus has been reached.
  • the device further includes: a recording unit, configured to record data interacted with the second site during joint training with the second site.
  • the first risk includes a supervised risk
  • the supervised risk is that after the user performs the first behavior, the user can obtain the label of whether the first risk corresponds to the first behavior;
  • the characteristic information also relates to user behavior information.
  • the first risk includes an unsupervised risk
  • the unsupervised risk is that the user cannot obtain the label of whether the first risk corresponding to the second behavior after the second behavior is performed.
  • Joint training of the safety tree model with the second site includes: obtaining a first sample set for the first risk, the label of each sample in the first sample set is manually defined, or based on each sample The feature distribution of each feature in the high-risk feature set is determined; the first sample set is used to initially jointly train the safety tree model with the second site, and each feature contained in the high-risk feature set is re-determined Using the newly determined feature distribution of each feature in the high-risk feature set to update the label of each sample in the first sample set; based on the updated label, and the second site to jointly train the security again Tree model.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, it implements what is described in conjunction with FIG. 2 method.
  • the functions described in the present invention can be implemented by hardware, software, firmware, or any combination thereof.
  • these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.

Abstract

一种多方联合进行风险识别的方法和装置,方法包括:第一站点获取与第二站点联合训练的安全树模型的第一子模型;安全树模型还具有部署于第二站点的第二子模型;获取根据预设风险识别策略对应的树结构得到的第三子模型;树结构还具有部署于第二站点的第四子模型;当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;将第一特征数据输入第一子模型和第三子模型,分别得到第一预测分数和第三预测分数;通过多方安全计算的方式提供第一预测分数和第三预测分数,与第二站点提供的第二预测分数和第四预测分数相结合,综合确定目标用户是否具有第一风险。能够防止泄露用户的隐私信息。

Description

多方联合进行风险识别的方法和装置 技术领域
本说明书一个或多个实施例涉及计算机领域,尤其涉及多方联合进行风险识别的方法和装置。
背景技术
当前,常常需要进行风险识别,对风控系统来说,通过对用户特征的分析,来判断用户行为的风险性,如盗用、欺诈、营销作弊等风险,这些用户特征的基础数据源在很大程度上依赖于用户的隐私信息。比如在刻画用户交易频次上会设计变量:用户在过去7天当前设备上的交易金额累计,在该变量上面,就需要用到用户使用的设备的唯一标识(ID)这个基础信息,而用户使用的设备的唯一标识就属于用户的隐私信息。
随着通用数据保护条例(general data protection regulation,GDPR)的生效,用户隐私数据所受到的管制也越来越严格。尤其是在国际场景,隐私数据不出域、用户数据可用不可见等也成为越来越多机构对数据采集使用方的要求。比如,在全球支付网络(global net,GN)场景下,发卡站和收单站分别属于不同国家,如何做到隐私数据不出域的情况下完成对网络上交易的风险防控,是当前面临的一个难题。
因此,希望能有改进的方案,通过多方联合进行风险识别,能够防止泄露用户的隐私信息。
发明内容
本说明书一个或多个实施例描述了一种多方联合进行风险识别的方法和装置,能够防止泄露用户的隐私信息。
第一方面,提供了一种多方联合进行风险识别的方法,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述方法应用于所述第一站点,包括:获取与所述第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;当确定 满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;将所述第一特征数据输入所述第一子模型,得到第一预测分数,以及输入所述第三子模型,得到第三预测分数;通过多方安全计算(multi-party computing,MPC)的方式提供所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。
在一种可能的实施方式中,所述获取与所述第二站点联合训练的安全树模型的第一子模型,包括:通过MPC方式,与所述第二站点联合训练所述安全树模型,得到所述安全树模型的第一子模型。
在一种可能的实施方式中,所述获取与所述第二站点联合训练的安全树模型的第一子模型,包括:接收所述第一子模型对应的第一模型文件,所述第一模型文件是从通过联合训练方式得到的安全树模型的总模型文件中分拆的文件。
在一种可能的实施方式中,所述确定满足预设风险识别条件包括:接收评估请求,所述评估请求中包括所述目标用户的标识。
在一种可能的实施方式中,所述确定满足预设风险识别条件包括:接收批量处理请求,所述目标用户是批量处理请求所限定的用户集合中的任意用户。
在一种可能的实施方式中,所述MPC包括:同态加密、秘密分享之一。
在一种可能的实施方式中,所述获取与所述第二站点联合训练的安全树模型的第一子模型之前,所述方法还包括:确定与所述第二站点之间的数据交互权限;和/或,确定所述第一特征集中的特征信息和所述第二特征集中的特征信息;和/或,确定与所述第二站点之间已达成算法共识。
在一种可能的实施方式中,所述方法还包括:与所述第二站点联合训练时,记录与所述第二站点之间交互的数据。
在一种可能的实施方式中,所述第一风险包括有监督风险,所述有监督风险为用户实施第一行为后能够获得所述第一行为对应的是否具有所述第一风险的标签;所述特征信息还涉及用户的行为信息。
在一种可能的实施方式中,所述第一风险包括无监督风险;所述无监督风险为用户实施第二行为后不能够获得所述第二行为对应的是否具有所述第一风险的标签;与所 述第二站点联合训练安全树模型,包括:针对所述第一风险获取第一样本集合,所述第一样本集合中各样本的标签为人工定义的,或者基于各样本的高危特征集合中的各特征的特征分布确定的;利用所述第一样本集合,与所述第二站点初步联合训练所述安全树模型,并重新确定所述高危特征集合中包含的各特征;利用重新确定的所述高危特征集合中的各特征的特征分布,更新所述第一样本集合中各样本的标签;基于更新后的标签,与所述第二站点再次联合训练所述安全树模型。
第二方面,提供了一种多方联合进行风险识别的装置,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述装置应用于所述第一站点,包括:第一获取单元,用于获取与所述第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;第二获取单元,用于获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;第三获取单元,用于当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;预测单元,用于将所述第三获取单元获取的第一特征数据输入所述第一获取单元获取的第一子模型,得到第一预测分数,以及输入所述第二获取单元获取的第三子模型,得到第三预测分数;联合单元,用于通过多方安全计算MPC的方式提供所述预测单元得到的所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。
第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面的方法。
第四方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
通过本说明书实施例提供的方法和装置,通过多方联合进行风险识别,对于多方中的第一站点首先获取与第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;然后获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;接着当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据; 再将所述第一特征数据输入所述第一子模型,得到第一预测分数,以及输入所述第三子模型,得到第三预测分数;最后通过MPC的方式提供所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。由上可见,本说明书实施例,通过将总的模型拆分为多个子模型,将各子模型分别部署在多方站点,从而可以结合各子模型的预测结果,综合得到最终的风险识别结果,保证了各站点不必交互用户的隐私信息,能够防止泄露用户的隐私信息;此外,不仅将通过训练得到的模型拆分部署,同样地,将预设风险识别策略也进行拆分部署,进一步防止泄露用户的隐私信息,并且增强了风险识别的准确性。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本说明书披露的一个实施例的实施场景示意图。
图2示出根据一个实施例的多方联合进行风险识别的方法流程图。
图3示出根据一个实施例的多方联合进行风险识别的体系结构示意图。
图4示出根据一个实施例的在线部署链路示意图。
图5示出根据一个实施例的离线部署链路示意图。
图6示出根据一个实施例的策略转换过程示意图。
图7示出根据一个实施例的多方模型进化闭环示意图。
图8示出根据一个实施例的多方联合进行风险识别的装置的示意性框图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
图1为本说明书披露的一个实施例的实施场景示意图。该实施场景涉及多方联合进行风险识别。参照图1,所述多方包括第一站点11和第二站点12,所述第一站点11存储用户的第一特征集中的特征信息,所述第二站点12存储用户的第二特征集中的特征信息。可以理解的是,第一特征集与第二特征集包含的特征信息不同,例如,第一特征集包含特征1、特征2,第二特征集包含特征3、特征4和特征5,其中,特征信息涉及用户的隐私信息,这其中尤其重要的是可以定位到用户个人的信息(personally identifiable information,PII)信息,如地址、邮箱、姓名、身份ID等。
本说明书实施例,基于多方安全计算(multi-party computing,MPC)的方式,由多方联合进行风险识别。其中,涉及到策略和模型的多方部署,依赖于基于树结构的部署形式,能够防止泄露用户的隐私信息。
需要说明的是,本说明书实施例,仅以两方联合进行风险识别为例进行说明,但实际上多方并限定为两方,例如,可以为三方、四方或更多方联合进行风险识别。
图2示出根据一个实施例的多方联合进行风险识别的方法流程图,该方法可以基于图1所示的实施场景,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述方法应用于所述第一站点,可以理解的是,在多方联合进行风险识别时,第一站点为多方中的任一方,第二站点与第一站点的处理过程类似,在此不做赘述。如图2所示,该实施例中多方联合进行风险识别的方法包括以下步骤:步骤21,获取与所述第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;步骤22,获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;步骤23,当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;步骤24,将所述第一特征数据输入所述第一子模型,得到第一预测分数,以及输入所述第三子模型,得到第三预测分数;步骤25,通过多方安全计算MPC的方式提供所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。下面描述以上各个步骤的具体执行方式。
首先在步骤21,获取与所述第二站点联合训练的安全树模型的第一子模型;所述 安全树模型还具有部署于所述第二站点的第二子模型。可以理解的是,安全树模型为一个总的模型,该模型可以拆分为第一子模型和第二子模型,第一子模型和第二子模型分别部署于第一站点和第二站点。
在一个示例中,通过MPC方式,与所述第二站点联合训练所述安全树模型,得到所述安全树模型的第一子模型。可以理解的是,MPC方式即在保护数据隐私安全、数据不出域的情况下通过交换过程参数和随机数的方式完成联合训练的相关计算。
在另一个示例中,接收所述第一子模型对应的第一模型文件,所述第一模型文件是从通过联合训练方式得到的安全树模型的总模型文件中分拆的文件。
在一个示例中,在步骤21之前,所述方法还包括:确定与所述第二站点之间的数据交互权限;和/或,确定所述第一特征集中的特征信息和所述第二特征集中的特征信息;和/或,确定与所述第二站点之间已达成算法共识。
在一个示例中,所述方法还包括:与所述第二站点联合训练时,记录与所述第二站点之间交互的数据。
然后在步骤22,获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型。可以理解的是,预设风险识别策略可以为人工定义的,例如,该预设风险识别策略为(x1>a or x2>b)and y3>c,可以转换成x1>a and y3>c和x2>b and y3>c两棵树,每棵树对应一个子模型。
本说明书实施例中,预设风险识别策略也被拆分为多个子模型,分别部署在多个站点,能够防止泄露用户的隐私信息。
接着在步骤23,当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据。可以理解的是,该预设风险识别条件即触发条件,可以是接收到请求后触发,也可以是定时触发。
在一个示例中,所述确定满足预设风险识别条件包括:接收评估请求,所述评估请求中包括所述目标用户的标识。
在另一个示例中,所述确定满足预设风险识别条件包括:接收批量处理请求,所述目标用户是批量处理请求所限定的用户集合中的任意用户。
再在步骤24,将所述第一特征数据输入所述第一子模型,得到第一预测分数,以及输入所述第三子模型,得到第三预测分数。可以理解的是,第一特征数据存储于第一 站点,第一子模型和第三子模型也部署在第一站点,第一特征数据无需外传,能够防止泄露用户的隐私信息。
最后在步骤25,通过多方安全计算MPC的方式提供所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。可以理解的是,各方分别利用自身存储的目标用户的特征数据确定相应的预测分数,再综合多方的预测分数确定目标用户是否具有风险,能够防止泄露用户的隐私信息。
在一个示例中,所述MPC包括:同态加密、秘密分享之一。
在一个示例中,所述第一风险包括有监督风险,所述有监督风险为用户实施第一行为后能够获得所述第一行为对应的是否具有所述第一风险的标签;所述特征信息还涉及用户的行为信息。可以理解的是,第一行为可以为交易行为,第一风险可以为盗用风险,通常这类风险在交易行为发生后会有用户报案,从而获得标签。
在另一个示例中,所述第一风险包括无监督风险;所述无监督风险为用户实施第二行为后不能够获得所述第二行为对应的是否具有所述第一风险的标签;与所述第二站点联合训练安全树模型,包括:针对所述第一风险获取第一样本集合,所述第一样本集合中各样本的标签为人工定义的,或者基于各样本的高危特征集合中的各特征的特征分布确定的;利用所述第一样本集合,与所述第二站点初步联合训练所述安全树模型,并重新确定所述高危特征集合中包含的各特征;利用重新确定的所述高危特征集合中的各特征的特征分布,更新所述第一样本集合中各样本的标签;基于更新后的标签,与所述第二站点再次联合训练所述安全树模型。
可以理解的是,第二行为可以为交易行为,第一风险可以为营销作弊风险或虚假交易风险,通常这类风险在交易行为发生后不会有用户报案,从而不能够获得标签。可以通过人工标注或特征识别确定对应的标签。
本说明书实施例提供的方法,通过将总的模型拆分为多个子模型,将各子模型分别部署在多方站点,从而可以结合各子模型的预测结果,综合得到最终的风险识别结果,保证了各站点不必交互用户的隐私信息,能够防止泄露用户的隐私信息;此外,不仅将通过训练得到的模型拆分部署,同样地,将预设风险识别策略也进行拆分部署,进一步 防止泄露用户的隐私信息,并且增强了风险识别的准确性。
本说明书实施例中的MPC也可以称为联邦学习,具体地,可以采用安全树(secureboost)的联邦学习方案。
图3示出根据一个实施例的多方联合进行风险识别的体系结构示意图。参照图3,该体系结构包括配置层、定义层和部署层。
配置层主要有三部分组成:租户管理,用于提供数据提供方和使用方的管理功能,并对租户对数据的操作进行记录以及全网同步;变量管理,用于提供各基础变量的来源(来源于哪个租户)以及基础定义,线上数据来说对接到端上的数据实时接口,线下部分对接到端上的数据库;算法授权,用于提供联邦学习的算法共识部分,基于联邦学习方案的算法分为三个步骤,第一个是离线训练,通过随机数以及中间参数的交互完成模型训练;第二步是将得到的模型文件进行拆分,部署到各个端节点;第三步是在端节点上进行实时或者是离线批量预测。该运行的算法方案(如secureboost)不仅需要达到安全性的要求,同时还需要得到各个端的共识(确定了解算法不会外传内部信息)。共识后的算法需要输入签名,端数据智能在签名匹配下的算法组件上面运行。
定义层,用于产出算法文件,包括模型训练得到的算法文件,以及策略定义的算法文件。
部署层,用于将算法文件部署在多方,以提供预测服务。包括在线部署和离线部署。对于策略来说,是用and和or连起来的一些逻辑算子。通过对and和or的拆分即可将策略转化成集成树的结构从而复用模型的在线和离线部署链路。例如:策略(x1>a or x2>b)and y3>c可以转换成x1>a and y3>c和x2>b and y3>c两棵树。对每棵树来说,逻辑成立向右走(如果还有and逻辑那么继续分裂否则记为叶子节点1),逻辑不成立向左走并记为叶子节点0。两颗不同的树来进行加和,最终结果如果大于0那么就是策略稽核,否则就是策略未稽核。转化成树结构之后可以沿用模型的部署链路来进行多方打分和预测。
图4示出根据一个实施例的在线部署链路示意图。参照图4,展示了多方模型的联邦学习过程以及在线打分过程。通过随机数和参数的交互,得到一个树模型,经过拆分之后部署在数据域A和数据域B的预测节点上。在风控实时链路上,由实时打分预测请求两边预测节点,预测节点从实时特征接口读取相应特征。预测节点在节点所有拥有的子模型上得到子结果,并汇总到预测节点得到最终打分。预测节点将最终打分返回给 咨询方。
图5示出根据一个实施例的离线部署链路示意图。参照图5,展示了训练好的模型在端节点部署之后的离线跑批和定时调度的链路。该部分链路需要同端数据库打通,对数据库内部的定时跑出的数据进行批量打分。同时该部分功能也提供一次性打分服务,来对策略和模型的效能进行评估。
图6示出根据一个实施例的策略转换过程示意图。参照图6,策略转换成树之后,会通过拆分服务拆分成子模型,将子模型部署在各个端上来进行预测或者离线调度打分。
图7示出根据一个实施例的多方模型进化闭环示意图。参照图7,在联邦学习多方建模的基础上进一步提出了模型进化闭环的功能。在此基础上,多方模型体系不仅能识别有标签的监督型风险目标,同时也可以对营销作弊、虚假交易等无监督风险进行风险识别,从而一体化覆盖有监督风险、无监督风险的识别。首先通过人工定义的一些高风险标签以及人工定义的高危特征识别到的无监督风险作为标签来训练有监督模型,根据有监督模型进一步对高危特征进行优化,此处同时可以结合人工经验输入调整高危特征的特征分布。优化后的高危特征可以进一步促进无监督风险识别的精度。通过闭环结构,可以在离线训练或者建模阶段不停的迭代优化安全树模型。
综上,基于联邦学习的风控体系既可以解决多方盗用风险、欺诈风险等带标签返回的风险,同时也可以对无标签返回的比如营销作弊、虚假交易等风险进行防控。不仅可以支持模型、同时也可以兼容策略的部署。同时提供实时预测和离线打分两种功能。在模型端,有一套完整的模型优化流程。同时由于是去中心化体系,在中心只有管理功能,没有任何数据存储,该部分功能可以开放给所有接入数据共享的机构,管理机构变量以及各个机构可以使用的算法功能,对不同机构提供不同的风控服务。
根据另一方面的实施例,还提供一种多方联合进行风险识别的装置,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述装置应用于所述第一站点,用于执行本说明书实施例提供的多方联合进行风险识别的方法。图8示出根据一个实施例的多方联合进行风险识别的装置的示意性框图。如图8所示,该装置800包括第一获取单元81、第二获取单元82、第三获取单元83、预测单元84以及联合单元85。
第一获取单元81,用于获取与所述第二站点联合训练的安全树模型的第一子模型; 所述安全树模型还具有部署于所述第二站点的第二子模型。
第二获取单元82,用于获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型。
第三获取单元83,用于当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据。
预测单元84,用于将所述第三获取单元83获取的第一特征数据输入所述第一获取单元81获取的第一子模型,得到第一预测分数,以及输入所述第二获取单元82获取的第三子模型,得到第三预测分数。
联合单元85,用于通过多方安全计算MPC的方式提供所述预测单元84得到的所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。
可选地,作为一个实施例,所述第一获取单元81,具体用于通过MPC方式,与所述第二站点联合训练所述安全树模型,得到所述安全树模型的第一子模型。
可选地,作为一个实施例,所述第一获取单元81,具体用于接收所述第一子模型对应的第一模型文件,所述第一模型文件是从通过联合训练方式得到的安全树模型的总模型文件中分拆的文件。
可选地,作为一个实施例,所述确定满足预设风险识别条件包括:接收评估请求,所述评估请求中包括所述目标用户的标识。
可选地,作为一个实施例,所述确定满足预设风险识别条件包括:接收批量处理请求,所述目标用户是批量处理请求所限定的用户集合中的任意用户。
可选地,作为一个实施例,所述MPC包括:同态加密、秘密分享之一。
可选地,作为一个实施例,所述装置还包括:确定单元,用于在所述第一获取单元81获取与所述第二站点联合训练的安全树模型的第一子模型之前,确定与所述第二站点之间的数据交互权限;和/或,确定所述第一特征集中的特征信息和所述第二特征集中的特征信息;和/或,确定与所述第二站点之间已达成算法共识。
可选地,作为一个实施例,所述装置还包括:记录单元,用于与所述第二站点联 合训练时,记录与所述第二站点之间交互的数据。
可选地,作为一个实施例,所述第一风险包括有监督风险,所述有监督风险为用户实施第一行为后能够获得所述第一行为对应的是否具有所述第一风险的标签;所述特征信息还涉及用户的行为信息。
可选地,作为一个实施例,所述第一风险包括无监督风险;所述无监督风险为用户实施第二行为后不能够获得所述第二行为对应的是否具有所述第一风险的标签;与所述第二站点联合训练安全树模型,包括:针对所述第一风险获取第一样本集合,所述第一样本集合中各样本的标签为人工定义的,或者基于各样本的高危特征集合中的各特征的特征分布确定的;利用所述第一样本集合,与所述第二站点初步联合训练所述安全树模型,并重新确定所述高危特征集合中包含的各特征;利用重新确定的所述高危特征集合中的各特征的特征分布,更新所述第一样本集合中各样本的标签;基于更新后的标签,与所述第二站点再次联合训练所述安全树模型。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2所描述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (22)

  1. 一种多方联合进行风险识别的方法,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述方法应用于所述第一站点,包括:
    获取与所述第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;
    获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;
    当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;
    将所述第一特征数据输入所述第一子模型,得到第一预测分数,以及输入所述第三子模型,得到第三预测分数;
    通过多方安全计算MPC的方式提供所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。
  2. 如权利要求1所述的方法,其中,所述获取与所述第二站点联合训练的安全树模型的第一子模型,包括:
    通过MPC方式,与所述第二站点联合训练所述安全树模型,得到所述安全树模型的第一子模型。
  3. 如权利要求1所述的方法,其中,所述获取与所述第二站点联合训练的安全树模型的第一子模型,包括:
    接收所述第一子模型对应的第一模型文件,所述第一模型文件是从通过联合训练方式得到的安全树模型的总模型文件中分拆的文件。
  4. 如权利要求1所述的方法,其中,所述确定满足预设风险识别条件包括:
    接收评估请求,所述评估请求中包括所述目标用户的标识。
  5. 如权利要求1所述的方法,其中,所述确定满足预设风险识别条件包括:
    接收批量处理请求,所述目标用户是批量处理请求所限定的用户集合中的任意用户。
  6. 如权利要求1所述的方法,其中,所述MPC包括:
    同态加密、秘密分享之一。
  7. 如权利要求1所述的方法,其中,所述获取与所述第二站点联合训练的安全树模型的第一子模型之前,所述方法还包括:
    确定与所述第二站点之间的数据交互权限;和/或,
    确定所述第一特征集中的特征信息和所述第二特征集中的特征信息;和/或,
    确定与所述第二站点之间已达成算法共识。
  8. 如权利要求1所述的方法,其中,所述方法还包括:
    与所述第二站点联合训练时,记录与所述第二站点之间交互的数据。
  9. 如权利要求1所述的方法,其中,所述第一风险包括有监督风险,所述有监督风险为用户实施第一行为后能够获得所述第一行为对应的是否具有所述第一风险的标签;所述特征信息还涉及用户的行为信息。
  10. 如权利要求1所述的方法,其中,所述第一风险包括无监督风险;所述无监督风险为用户实施第二行为后不能够获得所述第二行为对应的是否具有所述第一风险的标签;
    与所述第二站点联合训练安全树模型,包括:
    针对所述第一风险获取第一样本集合,所述第一样本集合中各样本的标签为人工定义的,或者基于各样本的高危特征集合中的各特征的特征分布确定的;
    利用所述第一样本集合,与所述第二站点初步联合训练所述安全树模型,并重新确定所述高危特征集合中包含的各特征;
    利用重新确定的所述高危特征集合中的各特征的特征分布,更新所述第一样本集合中各样本的标签;
    基于更新后的标签,与所述第二站点再次联合训练所述安全树模型。
  11. 一种多方联合进行风险识别的装置,所述多方包括第一站点和第二站点,所述第一站点存储用户的第一特征集中的特征信息,所述第二站点存储用户的第二特征集中的特征信息,所述特征信息涉及用户的隐私信息,所述装置应用于所述第一站点,包括:
    第一获取单元,用于获取与所述第二站点联合训练的安全树模型的第一子模型;所述安全树模型还具有部署于所述第二站点的第二子模型;
    第二获取单元,用于获取根据预设风险识别策略对应的树结构得到的第三子模型;所述树结构还具有部署于所述第二站点的第四子模型;
    第三获取单元,用于当确定满足预设风险识别条件时,获取目标用户的第一特征集中各项特征的第一特征数据;
    预测单元,用于将所述第三获取单元获取的第一特征数据输入所述第一获取单元获 取的第一子模型,得到第一预测分数,以及输入所述第二获取单元获取的第三子模型,得到第三预测分数;
    联合单元,用于通过多方安全计算MPC的方式提供所述预测单元得到的所述第一预测分数和所述第三预测分数,从而与第二预测分数和第四预测分数相结合,综合确定所述目标用户是否具有第一风险;其中,所述第二预测分数为所述第二站点利用所述目标用户的第二特征集中各项特征的第二特征数据和所述第二子模型得到,所述第四预测分数为所述第二站点利用所述第二特征数据和所述第四子模型得到。
  12. 如权利要求11所述的装置,其中,所述第一获取单元,具体用于通过MPC方式,与所述第二站点联合训练所述安全树模型,得到所述安全树模型的第一子模型。
  13. 如权利要求11所述的装置,其中,所述第一获取单元,具体用于接收所述第一子模型对应的第一模型文件,所述第一模型文件是从通过联合训练方式得到的安全树模型的总模型文件中分拆的文件。
  14. 如权利要求11所述的装置,其中,所述确定满足预设风险识别条件包括:
    接收评估请求,所述评估请求中包括所述目标用户的标识。
  15. 如权利要求11所述的装置,其中,所述确定满足预设风险识别条件包括:
    接收批量处理请求,所述目标用户是批量处理请求所限定的用户集合中的任意用户。
  16. 如权利要求11所述的装置,其中,所述MPC包括:
    同态加密、秘密分享之一。
  17. 如权利要求11所述的装置,其中,所述装置还包括:
    确定单元,用于在所述第一获取单元获取与所述第二站点联合训练的安全树模型的第一子模型之前,确定与所述第二站点之间的数据交互权限;和/或,确定所述第一特征集中的特征信息和所述第二特征集中的特征信息;和/或,确定与所述第二站点之间已达成算法共识。
  18. 如权利要求11所述的装置,其中,所述装置还包括:
    记录单元,用于与所述第二站点联合训练时,记录与所述第二站点之间交互的数据。
  19. 如权利要求11所述的装置,其中,所述第一风险包括有监督风险,所述有监督风险为用户实施第一行为后能够获得所述第一行为对应的是否具有所述第一风险的标签;所述特征信息还涉及用户的行为信息。
  20. 如权利要求11所述的装置,其中,所述第一风险包括无监督风险;所述无监督风险为用户实施第二行为后不能够获得所述第二行为对应的是否具有所述第一风险的标签;
    与所述第二站点联合训练安全树模型,包括:
    针对所述第一风险获取第一样本集合,所述第一样本集合中各样本的标签为人工定义的,或者基于各样本的高危特征集合中的各特征的特征分布确定的;
    利用所述第一样本集合,与所述第二站点初步联合训练所述安全树模型,并重新确定所述高危特征集合中包含的各特征;
    利用重新确定的所述高危特征集合中的各特征的特征分布,更新所述第一样本集合中各样本的标签;
    基于更新后的标签,与所述第二站点再次联合训练所述安全树模型。
  21. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-10中任一项的所述的方法。
  22. 一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-10中任一项的所述的方法。
PCT/CN2020/118006 2019-12-12 2020-09-27 多方联合进行风险识别的方法和装置 WO2021114820A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911275611.6 2019-12-12
CN201911275611.6A CN111046425B (zh) 2019-12-12 2019-12-12 多方联合进行风险识别的方法和装置

Publications (1)

Publication Number Publication Date
WO2021114820A1 true WO2021114820A1 (zh) 2021-06-17

Family

ID=70236623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118006 WO2021114820A1 (zh) 2019-12-12 2020-09-27 多方联合进行风险识别的方法和装置

Country Status (3)

Country Link
CN (1) CN111046425B (zh)
TW (1) TWI798550B (zh)
WO (1) WO2021114820A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538127A (zh) * 2021-07-16 2021-10-22 四川新网银行股份有限公司 支持多合作方同时联合风控测试方法、系统、设备及介质
CN114417388A (zh) * 2022-01-25 2022-04-29 云南电网有限责任公司信息中心 基于纵向联邦学习的电力负荷预测方法、系统、设备及介质
CN116151627A (zh) * 2023-04-04 2023-05-23 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046425B (zh) * 2019-12-12 2021-07-13 支付宝(杭州)信息技术有限公司 多方联合进行风险识别的方法和装置
CN112016788A (zh) * 2020-07-14 2020-12-01 北京淇瑀信息科技有限公司 风险控制策略生成及风险控制方法、装置和电子设备
CN112150279A (zh) * 2020-10-10 2020-12-29 成都数融科技有限公司 一种基于多方计算的金融风险预测方法及预测系统
CN112199706B (zh) * 2020-10-26 2022-11-22 支付宝(杭州)信息技术有限公司 基于多方安全计算的树模型的训练方法和业务预测方法
CN112597379B (zh) * 2020-12-04 2023-09-01 光大科技有限公司 数据识别方法、装置和存储介质及电子装置
CN112766977B (zh) * 2021-01-27 2022-06-28 支付宝(杭州)信息技术有限公司 风险识别方法、装置和系统
CN112966233A (zh) * 2021-02-23 2021-06-15 杭州安恒信息技术股份有限公司 用户风险操作的检测方法、装置和计算机设备
CN112948883B (zh) * 2021-03-25 2023-10-31 支付宝(杭州)信息技术有限公司 保护隐私数据的多方联合建模的方法、装置和系统
CN114991746B (zh) * 2021-11-23 2024-01-19 中国石油天然气集团有限公司 一种钻井工况智能标定方法和系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279742A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Determining an obverse weight
CN109861816A (zh) * 2019-02-22 2019-06-07 矩阵元技术(深圳)有限公司 数据处理方法和装置
CN109902611A (zh) * 2019-02-22 2019-06-18 矩阵元技术(深圳)有限公司 目标证件的检测方法、装置和终端设备
CN109960936A (zh) * 2019-03-28 2019-07-02 吴道钰 一种对移动终端进行自动化模拟业务访问的风险识别方法
CN110537191A (zh) * 2017-03-22 2019-12-03 维萨国际服务协会 隐私保护机器学习
CN111046425A (zh) * 2019-12-12 2020-04-21 支付宝(杭州)信息技术有限公司 多方联合进行风险识别的方法和装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324954B (zh) * 2013-05-31 2017-02-08 中国科学院计算技术研究所 一种基于树结构的图像分类方法及其系统
US10764048B2 (en) * 2017-12-20 2020-09-01 Nxp B.V. Privacy-preserving evaluation of decision trees
CN108234463B (zh) * 2017-12-22 2021-02-02 杭州安恒信息技术股份有限公司 一种基于多维行为模型的用户风险评估与分析方法
CN108366045B (zh) * 2018-01-02 2020-09-01 北京奇艺世纪科技有限公司 一种风控评分卡的设置方法和装置
CN108537269B (zh) * 2018-04-04 2022-03-25 中山大学 一种弱交互式的物体检测深度学习方法及其系统
CN109299728B (zh) * 2018-08-10 2023-06-27 深圳前海微众银行股份有限公司 基于构建梯度树模型的样本联合预测方法、系统及介质
CN109189825B (zh) * 2018-08-10 2022-03-15 深圳前海微众银行股份有限公司 横向数据切分联邦学习建模方法、服务器及介质
CN109002861B (zh) * 2018-08-10 2021-11-09 深圳前海微众银行股份有限公司 联邦建模方法、设备及存储介质
CN109255247B (zh) * 2018-08-14 2020-08-14 阿里巴巴集团控股有限公司 多方安全计算方法及装置、电子设备
TWM577148U (zh) * 2019-01-03 2019-04-21 兆豐金融控股股份有限公司 評估金融風險的電子裝置
TWM583089U (zh) * 2019-04-09 2019-09-01 輔仁大學學校財團法人輔仁大學 智慧型信用風險評估系統
CN110245510B (zh) * 2019-06-19 2021-12-07 北京百度网讯科技有限公司 用于预测信息的方法和装置
CN110309587B (zh) * 2019-06-28 2024-01-16 京东城市(北京)数字科技有限公司 决策模型构建方法、决策方法与决策模型
CN110427969B (zh) * 2019-07-01 2020-11-27 创新先进技术有限公司 数据处理方法、装置和电子设备
CN110378749B (zh) * 2019-07-25 2023-09-26 深圳前海微众银行股份有限公司 客户端相似性的评估方法、装置、终端设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279742A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Determining an obverse weight
CN110537191A (zh) * 2017-03-22 2019-12-03 维萨国际服务协会 隐私保护机器学习
CN109861816A (zh) * 2019-02-22 2019-06-07 矩阵元技术(深圳)有限公司 数据处理方法和装置
CN109902611A (zh) * 2019-02-22 2019-06-18 矩阵元技术(深圳)有限公司 目标证件的检测方法、装置和终端设备
CN109960936A (zh) * 2019-03-28 2019-07-02 吴道钰 一种对移动终端进行自动化模拟业务访问的风险识别方法
CN111046425A (zh) * 2019-12-12 2020-04-21 支付宝(杭州)信息技术有限公司 多方联合进行风险识别的方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538127A (zh) * 2021-07-16 2021-10-22 四川新网银行股份有限公司 支持多合作方同时联合风控测试方法、系统、设备及介质
CN113538127B (zh) * 2021-07-16 2023-06-23 四川新网银行股份有限公司 支持多合作方同时联合风控测试方法、系统、设备及介质
CN114417388A (zh) * 2022-01-25 2022-04-29 云南电网有限责任公司信息中心 基于纵向联邦学习的电力负荷预测方法、系统、设备及介质
CN114417388B (zh) * 2022-01-25 2022-08-26 云南电网有限责任公司信息中心 基于纵向联邦学习的电力负荷预测方法、系统、设备及介质
CN116151627A (zh) * 2023-04-04 2023-05-23 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备
CN116151627B (zh) * 2023-04-04 2023-09-01 支付宝(杭州)信息技术有限公司 一种业务风控的方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
TW202123124A (zh) 2021-06-16
CN111046425B (zh) 2021-07-13
TWI798550B (zh) 2023-04-11
CN111046425A (zh) 2020-04-21

Similar Documents

Publication Publication Date Title
WO2021114820A1 (zh) 多方联合进行风险识别的方法和装置
CN111698322B (zh) 一种基于区块链和联邦学习的医疗数据安全共享方法
Mouratidis et al. A framework to support selection of cloud providers based on security and privacy requirements
Suciu et al. Comparative analysis of distributed ledger technologies
CN110326251A (zh) 提供使用交叉验证特征来验证用户的通用分散解决方案的系统和方法
KR101876674B1 (ko) 블록 체인을 이용한 공동 계좌 관리 방법 및 이를 실행하는 시스템
CN109035014A (zh) 数据交易系统
CN110502927A (zh) 一种信息处理方法及相关装置
CN106251114B (zh) 应用中实现审批的方法和装置
CN111860865B (zh) 模型构建和分析的方法、装置、电子设备和介质
CN109146413A (zh) 一种基于区块链的智能合约的构建方法
CN107886006A (zh) 数据操作方法、装置及电子设备
Mohril et al. Blockchain enabled maintenance management framework for military equipment
US11362806B2 (en) System and methods for recording codes in a distributed environment
CN106845178A (zh) 一种rim身份管理系统及方法
Karger et al. Blockchain for AI Data-State of the Art and Open Research.
CN114422147B (zh) 基于区块链的多方安全计算方法
CN109087053A (zh) 基于关联拓扑图的协同办公处理方法、装置、设备及介质
US20220318706A1 (en) Incentive-based data exchange
Lee et al. Design and Implementation of E-Discovery as a Service based on Cloud Computing
US11799658B2 (en) Tracking data throughout an asset lifecycle
Bartoletti et al. Security and privacy risks in the blockchain ecosystem
Halder et al. Digital Degree Issuing and Verification Using Blockchain
Jabeen et al. Incorporating artificial intelligence technique into DSDM
Gottschalk How criminal organisations work: some theoretical perspectives

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20898516

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20898516

Country of ref document: EP

Kind code of ref document: A1