WO2023024735A1 - 用于隐私保护的机器学习算法脚本编译方法和编译器 - Google Patents

用于隐私保护的机器学习算法脚本编译方法和编译器 Download PDF

Info

Publication number
WO2023024735A1
WO2023024735A1 PCT/CN2022/105056 CN2022105056W WO2023024735A1 WO 2023024735 A1 WO2023024735 A1 WO 2023024735A1 CN 2022105056 W CN2022105056 W CN 2022105056W WO 2023024735 A1 WO2023024735 A1 WO 2023024735A1
Authority
WO
WIPO (PCT)
Prior art keywords
privacy
algorithm
operator
algorithms
several
Prior art date
Application number
PCT/CN2022/105056
Other languages
English (en)
French (fr)
Inventor
郑龙飞
陈超超
王力
Original Assignee
支付宝(杭州)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Priority to US18/571,351 priority Critical patent/US20240281226A1/en
Publication of WO2023024735A1 publication Critical patent/WO2023024735A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • One or more embodiments of this specification relate to the field of machine learning, and in particular, relate to a privacy-protected machine learning algorithm script, a method for compiling it, and a corresponding compiler.
  • machine learning has been applied to various technical fields to analyze and predict various business data.
  • the data needed for machine learning often involves multiple platforms.
  • the electronic payment platform has the transaction flow data of the merchants
  • the e-commerce platform stores the sales data of the merchants
  • the banking institution has the loan data of the merchants.
  • Data often exists in silos. Due to issues such as industry competition, data security, and user privacy, data integration faces great resistance. It is difficult to integrate data scattered across various platforms to train machine learning models. Therefore, there is a need to develop a privacy-preserving machine learning algorithm, which is used to jointly train machine learning models or use the trained model for joint business prediction on the premise of ensuring that the private data of all parties is not leaked.
  • One or more embodiments of this specification describe a compiling method and a compiler, which can compile the description script describing the logic of the upper-layer machine learning algorithm into a security algorithm execution code that uses a specific privacy algorithm to realize each security operator, thereby facilitating Developers can more easily develop privacy-protected machine learning algorithms and improve development efficiency.
  • a script compilation method executed by a compiler, the method includes:
  • the program code corresponding to the description script is generated.
  • determining a number of privacy algorithms used to execute the number of operators involved in the calculation formula specifically includes: analyzing the calculation formula to determine the number of operators; determining the number of algorithms used to execute the number of operators Several privacy algorithms.
  • the description script also defines the privacy protection level of several parameters involved in the calculation formula; the several operators include the first operator; in this case, the The privacy protection level of the first parameter involved in the first operator is determined to determine the first privacy algorithm for executing the first operator.
  • the privacy protection level includes: public parameters, a first privacy level visible only to the holder, and a second privacy level invisible to all participants.
  • determining the first privacy algorithm may specifically include: determining a first algorithm list that can be used to execute the first operator; Several candidate algorithms for the privacy protection level of the first parameter; select the first privacy algorithm from the several candidate algorithms.
  • the above method further includes: acquiring the performance index of the target computing platform that runs the machine learning algorithm; the plurality of operators include the first operator; in this case, according to the The performance index is used to determine the first privacy algorithm for executing the first operator.
  • determining the first privacy algorithm may specifically include: determining a first algorithm list that can be used to execute the first operator; selecting from the first algorithm list, whose resource requirements are consistent with the The algorithm matching the performance index is used as the first privacy algorithm.
  • the first privacy algorithm may also be determined according to the privacy protection level of the first parameter involved in the first operator and the performance index of the target computing platform.
  • determining the first privacy algorithm may specifically include: determining a first algorithm list that can be used to execute the first operator; selecting from the first algorithm list, the privacy protection level of its calculation parameters conforms to the specified Several candidate algorithms for the privacy protection level of the first parameter; and an algorithm whose resource requirements match the performance index is selected from the several candidate algorithms as the first privacy algorithm.
  • the compiler runs on the target computing platform; at this time, the performance index can be obtained by reading a configuration file of the target computing platform.
  • the compiler runs on a third-party platform; at this time, the performance index sent by the target computing platform may be received.
  • generating the program code corresponding to the description script may include: combining code segments in the several code modules according to the calculation logic of the calculation formula, and subsuming them into the program code.
  • generating the program code corresponding to the description script may include: obtaining interface information of several interfaces formed by encapsulating the several code modules; generating and calling the several interfaces according to the interface information The calling code of , falls under the program code.
  • a compiler comprising:
  • the description script acquisition unit is configured to acquire a description script written in a predetermined format, the description script at least defines the calculation formula in the privacy-protected machine learning algorithm;
  • a privacy algorithm determining unit configured to determine several privacy algorithms used to execute several operators involved in the calculation formula
  • a code module acquisition unit configured to acquire a number of code modules for executing the number of privacy algorithms
  • the program code generating unit is configured to generate program code corresponding to the description script based on the plurality of code modules.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method described in the first aspect.
  • a computing device including a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the method of the first aspect is implemented .
  • a language adaptation layer is introduced between the machine learning algorithm layer and the security operator layer, and the language adaptation layer includes a compiler designed for the domain-specific language DSL.
  • the language adaptation layer includes a compiler designed for the domain-specific language DSL.
  • Figure 1 is a schematic diagram of the realization level of the machine learning algorithm for privacy protection
  • Fig. 2 shows the implementation level diagram of the machine learning algorithm of privacy protection in one embodiment
  • Fig. 3 shows the flowchart of the compiling method according to one embodiment
  • Fig. 4 shows a schematic structural diagram of a compiler according to an embodiment.
  • Figure 1 is a schematic diagram of the realization level of the privacy-preserving machine learning algorithm.
  • the top layer is the machine learning algorithm layer, which defines a specific machine learning model, the training process of the model, and/or the use process, etc.
  • the above-mentioned machine learning model can be selected from, for example, a linear model, a logistic regression model, a decision tree model (such as GBDT), a deep neural network (DNN), a graph convolutional neural network (GCN), and the like.
  • the next layer is the security operator layer.
  • Security operators are basic operations that are abstracted from various machine learning algorithms and require privacy protection, including secure matrix addition, secure matrix multiplication, secure numerical comparison, secure intersection (PSI), and so on.
  • Various machine learning algorithms can be disassembled into the operation combination of several security operators. For example, multiple iterations of safe matrix multiplication and matrix addition are used in linear models and logistic regression models; multiple iterations of safe numerical comparisons are used in decision tree models, and so on.
  • the bottom layer is the cryptographic primitive (primitive) layer, which includes the specific basic principles of cryptography adopted to realize the operation of the security operator, such as secret sharing (SS), homomorphic encryption (HE), confusion circuit (GC), inadvertent transfer (OT), etc.
  • SS secret sharing
  • HE homomorphic encryption
  • GC confusion circuit
  • OT inadvertent transfer
  • a security operator can be implemented based on many different cryptographic primitives.
  • secure numerical comparisons can be implemented either through obfuscation circuits (where some data is exchanged using oblivious transfer OT), or through secret sharing.
  • Secure matrix multiplication can be achieved by both secret sharing and homomorphic encryption.
  • a privacy algorithm the specific implementation process or specific calculation method of implementing a security operator based on cryptographic primitives is called a privacy algorithm. Since such calculation methods generally involve multi-party calculations, privacy algorithms are sometimes called privacy calculation protocols between multiple parties.
  • Fig. 2 shows a schematic diagram of the implementation levels of a privacy-preserving machine learning algorithm in an embodiment.
  • a language adaptation layer is introduced between the machine learning algorithm layer and the security operator layer, and the language adaptation layer includes a compiler designed for a domain specific language (DSL, Domain Specific Language) .
  • DSL Domain Specific Language
  • developers can directly use the above DSL to develop privacy-protected machine learning algorithms, in which only the logic of the machine learning algorithm needs to be described to form a description script without the need to perceive the underlying security operators.
  • the description script is compiled into a security algorithm execution code that uses a specific privacy algorithm to realize each security operator.
  • developers don't need to pay attention to specific security operators and privacy algorithms. They only need to design for machine learning algorithms, and finally get the execution code of privacy-protected machine learning algorithms, which greatly simplifies the difficulty of development and improves the development efficiency. Effect.
  • Fig. 3 shows a flowchart of a compiling method according to an embodiment, the compiling method is used for compiling a description script of a privacy-preserving machine learning algorithm.
  • the method is executed by a compiler, and the compiler can be deployed in any device, device, platform, or device cluster with computing and processing capabilities.
  • the compiling method includes the following steps: step 31, obtaining a description script written in a predetermined format, wherein at least the calculation formula in the privacy-protected machine learning algorithm is defined; step 32, determining the Several privacy algorithms of several operators involved in the formula; Step 33, obtain several code modules for executing the several privacy algorithms; and Step 34, based on the several code modules, generate the program code corresponding to the description script.
  • step 31 obtaining a description script written in a predetermined format, wherein at least the calculation formula in the privacy-protected machine learning algorithm is defined
  • step 32 determining the Several privacy algorithms of several operators involved in the formula
  • Step 33 obtain several code modules for executing the several privacy
  • a description script written in a predetermined format is obtained. It can be understood that the description script is a script written by the developer according to the format required by the compiler to describe the privacy-protected machine learning algorithm.
  • the above predetermined format, or the format required by the compiler, forms a DSL in the field of privacy algorithms.
  • the description script of a privacy-preserving machine learning algorithm will at least define the parameters involved in the privacy-preserving machine learning algorithm and the calculation formula based on these parameters.
  • a privacy-preserving machine learning algorithm is currently being developed for joint training of a model between Party A and Party B.
  • X A represents the characteristics of samples (such as users) held by party A
  • W A represents the model parameters used to process X A
  • X B represents party B
  • the sample features held W B represents the model parameters used to process X B
  • y represents the predicted value
  • y' represents the label value
  • G A represents the gradient used for W A
  • G B represents the gradient used for W B
  • etc. wait Each of the above parameters is expressed in the form of a matrix (wherein, the predicted value and the label value are generally in the form of a vector, which can be regarded as a special matrix).
  • calculation formula of the gradient is only shown as an example above, and more calculation formulas may be involved in the model training process, such as the calculation formula of updating parameters according to the gradient, which are not listed here one by one.
  • a compiler and its corresponding DSL have a preset privacy protection level, for example, it is preset that all intermediate results and final output results in the algorithm operation process are owned by all parties Invisible privacy protection form (such as encrypted ciphertext form, secret shared fragmentation form); or, all intermediate results are preset as privacy protection form, and the final output result is in plaintext form, etc.
  • Invisible privacy protection form such as encrypted ciphertext form, secret shared fragmentation form
  • all intermediate results are preset as privacy protection form, and the final output result is in plaintext form, etc.
  • the compiler and the corresponding DSL support developers to customize different privacy protection levels for each parameter in the algorithm.
  • developers can set different privacy protection levels for the above parameters X A , X B , W A , W B , and so on.
  • the privacy protection level can be divided into the following three levels: Public means public parameters visible to all participants, Private means only the holder can see (can be called the first privacy level), and Secret means all participants Neither are visible (could be called a second privacy level).
  • Public means public parameters visible to all participants
  • Private means only the holder can see
  • Secret means all participants Neither are visible (could be called a second privacy level).
  • developers can, for example, define the following privacy protection levels for each of the above parameters:
  • lr represents the learning rate, which is a hyperparameter in model learning.
  • the privacy protection levels may also be divided in different manners, with more or fewer levels.
  • a third privacy level that is visible to some participants and invisible to some participants can also be added.
  • the compiler is developed by technicians who are familiar with cryptography and privacy protection algorithms. In order to realize the above compiled functions, the compiler is pre-configured with the corresponding relationship between security operators and privacy algorithms, as well as the implementation codes of various privacy algorithms. Next, the compiler will use the above-mentioned corresponding relationship and the implementation code to compile and convert the above-mentioned description script through steps 32 to 34.
  • step 32 the compiler parses the description script, and parses the calculation formula in it into a combination of several operators; then, for each Operator, which determines the privacy algorithm used to execute the operator.
  • the operation X A *W A +X B *W B can be analytically split into, using the operator of safe matrix multiplication to calculate X A *W A and X B *W B , get two result matrices; then use the operator of safe matrix addition to calculate the sum of the aforementioned two result matrices.
  • each calculation formula in the description script can be parsed and split into a combination of several operators.
  • the compiler is configured with a corresponding relationship between security operators and privacy algorithms, which records the privacy algorithms that can be used to implement each operator. Based on the corresponding relationship, the compiler can determine the corresponding privacy algorithm for each parsed operator.
  • an operator can be implemented through a variety of specific privacy algorithms.
  • some operators may have multiple corresponding privacy algorithms, forming a list of privacy algorithms.
  • a certain operator parsed out for a certain calculation formula is hereinafter referred to as a first operator (for example, a matrix multiplication operator), and in the corresponding relationship configured by the compiler, the first operator has multiple privacy algorithms.
  • the compiler can select the privacy algorithm that best matches the current requirement from the various privacy algorithms, and use it as the execution algorithm of the first operator, which is hereinafter referred to as the first privacy algorithm.
  • the compiler has a preset privacy protection level, and correspondingly, each pre-configured privacy algorithm has a matching privacy protection capability.
  • each pre-configured privacy algorithm has a matching privacy protection capability.
  • one of multiple privacy algorithms capable of realizing the operator may be randomly selected as the first privacy algorithm.
  • the execution of the aforementioned privacy algorithms requires different amounts of resources, for example, different communication amounts and different calculation amounts.
  • the resource requirements required for executing the algorithm are recorded in the compiler.
  • the privacy algorithm of the first operator can be selected according to the performance of the target computing platform that will run the machine learning algorithm.
  • the performance indicators of the target computing platform running the above machine learning algorithm can be obtained; the performance indicators can include performance indicators of communication performance, such as network bandwidth, network card configuration, etc., and performance indicators of computing performance, such as CPU configuration, memory configuration and so on.
  • the compiler may determine a first algorithm list that can be used to execute the first operator based on the aforementioned correspondence, and then select an algorithm whose resource requirements match the performance index from the first algorithm list as the The first privacy algorithm.
  • the resource requirements of a certain privacy algorithm can indicate that to execute the privacy algorithm, n times of two-party communication are required, m times of certain basic operations are required, and so on. Based on this, it can be estimated that the computing platform with the above performance indicators needs to execute the privacy algorithm.
  • the time length is within a certain range, for example, less than a certain threshold, it is considered that the resource requirements of the privacy algorithm match the above performance indicators, and thus it is determined as the first privacy algorithm.
  • other matching algorithms can also be used, for example, matching the communication performance and computing performance respectively, and then determining the comprehensive matching degree, and so on.
  • a privacy algorithm that matches in computing performance can be determined.
  • the above-mentioned target computing platform for running the machine learning algorithm may be the same as or different from the platform where the compiler is located.
  • the compiler itself also runs on the target computing platform.
  • the compiler can read the configuration file of the target computing platform, so as to obtain the above performance indicators.
  • the compiler runs on a third-party platform, which may be called a compilation platform.
  • developers develop machine learning algorithms for the target computing platform and form a description script, they can send the description script together with the performance indicators of the target computing platform to the compilation platform. Therefore, the compiler can receive the performance index sent by the target computing platform, and then select the privacy algorithm according to the performance index.
  • the compiler supports developers to customize different privacy protection levels for each parameter in the algorithm.
  • the compiler records the privacy protection level of its calculation parameters.
  • the compiler determines the first privacy algorithm for executing the first operator according to the privacy protection level of the first parameter involved in the first operator.
  • the compiler can determine the first parameter involved in the first operator by analyzing the calculation formula, and combined with the customization of the privacy protection level of the parameter in the description script, determine the value of the first parameter Privacy protection level.
  • the compiler can determine the first algorithm list that can be used to execute the first operator; and then select from the first algorithm list, the privacy protection level of its calculation parameters conforms to the number of the privacy protection level of the first parameter above.
  • Alternative Algorithms are selected as the above-mentioned first privacy algorithm.
  • the first operator is a matrix multiplication operator for calculating X A *W A , where the first parameters involved include X A and W A .
  • the privacy protection level is divided into three levels, where the privacy protection level of X A is Private, and the privacy protection level of W A is Secret.
  • the privacy algorithms configured in the compiler for performing matrix multiplication operators include Algorithms 1 to 5 shown in Table 1 below, which can be used as an example of the above-mentioned first algorithm list .
  • Table 1 List of algorithms used to compute matrix multiplication
  • the privacy protection levels of the calculation parameters (U and V) conform to the privacy protection level of the first parameter Algorithms of the protection level include Algorithm 3 and Algorithm 5, so Algorithm 3 and Algorithm 5 can be used as alternative algorithms.
  • the compiler selects one of the candidate algorithms as the first privacy algorithm for executing the operator.
  • the compiler selects one of the candidate algorithms as the first privacy algorithm.
  • the privacy algorithm of the first operator is selected from the alternative algorithms in further combination with the performance index of the target computing platform running the machine learning algorithm.
  • the compiler also obtains the performance index of the target computing platform. After the candidate algorithms are determined as mentioned above, select from the candidate algorithms the algorithm whose resource requirements match the performance index as the first privacy algorithm.
  • the content, the acquisition method of the performance index, and the matching method of the resource requirement and the performance index can refer to the previous embodiments, and will not be repeated here.
  • the applicable privacy algorithm can be determined respectively.
  • step 33 a code module for executing the aforementioned privacy algorithm is acquired.
  • these code modules can be pre-developed by those skilled in cryptography. Therefore, in step 34, the program code corresponding to the description script may be generated based on the above code module.
  • the code segments in each code module corresponding to each operator can be combined to form program code based on these code segments.
  • the program code formed in this way includes the code realization body of each operator.
  • each of the above code modules may also be packaged in advance to form an interface, or called an interface function.
  • Each interface has corresponding interface information, including, for example, the function name of the interface function, the number of parameters, the type of parameters, and so on.
  • the interface information of the interface corresponding to each operator may be obtained, and a calling code for calling the corresponding interface is generated according to the interface information, and the calling code is included in the generated program code.
  • the formed program code may not include the code realization body of each operator, but call the corresponding code realization body in an interface manner.
  • the program code corresponding to the description script is generated.
  • the generated program code and the pre-developed code modules for implementing various privacy algorithms use the same program language.
  • the program code can be high-level language code, such as Java, C, or intermediate code between high-level language and machine language, such as assembly language code, byte code, and so on.
  • the code language and code form are not limited here.
  • the compiler in the embodiment of this specification compiles the description script that describes the logic of the upper-level machine learning algorithm into a specific privacy algorithm. Implement the security algorithm execution code of each security operator. In this way, developers do not need to pay attention to specific security operators and privacy algorithms, but only need to design for machine learning algorithms, and finally get the execution code of privacy-protected machine learning algorithms, reducing development difficulty and improving development efficiency.
  • a compiler for compiling a script of a privacy-preserving machine learning algorithm.
  • Fig. 4 shows a schematic structural diagram of a compiler according to an embodiment, and the compiler can be deployed in any device, platform or device cluster with data storage, computing and processing capabilities. As shown in Figure 4, the compiler 400 includes:
  • the description script obtaining unit 41 is configured to obtain a description script written in a predetermined format, the description script at least defines the calculation formula in the privacy-protected machine learning algorithm;
  • the privacy algorithm determining unit 42 is configured to determine several privacy algorithms used to execute the several operators involved in the calculation formula
  • a code module obtaining unit 43 configured to obtain several code modules for executing the several privacy algorithms
  • the program code generation unit 44 is configured to generate the program code corresponding to the description script based on the several code modules.
  • the privacy algorithm determining unit 42 includes (not shown):
  • An operator analysis module configured to analyze the calculation formula and determine the plurality of operators
  • the algorithm determining module is configured to determine several privacy algorithms for executing the several operators.
  • the description script also defines the privacy protection level of several parameters involved in the calculation formula; the several operators include the first operator; in this case, the privacy algorithm determines
  • the unit 42 may be configured to determine a first privacy algorithm for executing the first operator according to the privacy protection level of the first parameter involved in the first operator.
  • the privacy protection level may include: public parameters, a first privacy level visible only to the holder, and a second privacy level invisible to all participants.
  • the privacy algorithm determining unit 42 is specifically configured as:
  • the first privacy algorithm is selected from the plurality of candidate algorithms.
  • the above-mentioned compiler 400 further includes a performance index acquisition unit (not shown), configured to acquire the performance index of the target computing platform running the machine learning algorithm; and the several operators include the first An operator; at this time, the privacy algorithm determining unit 42 may be configured to determine a first privacy algorithm for executing the first operator according to the performance index.
  • a performance index acquisition unit (not shown), configured to acquire the performance index of the target computing platform running the machine learning algorithm; and the several operators include the first An operator; at this time, the privacy algorithm determining unit 42 may be configured to determine a first privacy algorithm for executing the first operator according to the performance index.
  • the privacy algorithm determining unit 42 may be specifically configured as:
  • An algorithm whose resource requirement matches the performance index is selected from the first algorithm list as the first privacy algorithm.
  • the privacy algorithm determining unit 42 may also be configured to, according to the privacy protection level of the first parameter involved in the first operator, and the performance index of the target computing platform, determine The first privacy algorithm.
  • the privacy algorithm determining unit 42 may specifically perform the following steps:
  • An algorithm whose resource requirement matches the performance index is selected from the several candidate algorithms as the first privacy algorithm.
  • the compiler 400 runs on the target computing platform.
  • the performance index obtaining unit is configured to read the configuration file of the target computing platform to obtain the performance index.
  • the compiler 400 runs on a third-party platform; in this case, the performance index acquisition unit is configured to receive the performance index sent by the target computing platform.
  • the program code generating unit 44 is configured to: combine the code segments in the several code modules according to the calculation logic of the calculation formula, and include them into the program code.
  • program code generation unit 44 is configured to:
  • the description script describing the logic of the upper-layer machine learning algorithm can be compiled into a security algorithm execution code that implements each security operator using a specific privacy algorithm, thereby simplifying the developer's development process.
  • a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed in a computer, the computer is instructed to execute the method described in conjunction with FIG. 3 .
  • a computing device including a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, the implementation described in conjunction with FIG. 3 is implemented. method.
  • the functions described in the present invention may be implemented by hardware, software, firmware or any combination thereof.
  • the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Devices For Executing Special Programs (AREA)
  • Stored Programmes (AREA)

Abstract

本说明书实施例提供了用于对隐私保护的机器学习算法脚本进行编译的编译方法和编译器。根据该编译方法,编译器获取按照预定格式编写的描述脚本,其中至少定义,隐私保护的机器学习算法中的计算式。然后编译器确定用于执行所述计算式中涉及的若干算子的若干隐私算法;接着,获取用于执行所述若干隐私算法的若干代码模块;进而基于所述若干代码模块,生成所述描述脚本对应的程序代码。

Description

用于隐私保护的机器学习算法脚本编译方法和编译器
本申请要求于2021年08月25日提交中国国家知识产权局、申请号为202110984175.0、申请名称为“用于隐私保护的机器学习算法脚本编译方法和编译器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书一个或多个实施例涉及机器学习领域,尤其涉及针对隐私保护的机器学习算法脚本,进行编译的方法和对应的编译器。
背景技术
随着计算机技术的发展,机器学习已经应用到各种各样的技术领域,用于分析、预测各种业务数据。机器学习所需要的数据往往会涉及到多个平台。例如在基于机器学习的商户分类分析场景中,电子支付平台拥有商户的交易流水数据,电子商务平台存储有商户的销售数据,银行机构拥有商户的借贷数据。数据往往以孤岛的形式存在。由于行业竞争、数据安全、用户隐私等问题,数据整合面临着很大阻力,将分散在各个平台的数据整合在一起训练机器学习模型难以实现。因此,产生了开发保护隐私的机器学习算法的需求,用于在保证各方的隐私数据不泄露的前提下,联合进行机器学习模型的训练,或者使用训练好的模型进行联合业务预测。
为了进行隐私保护的机器学习算法的开发,开发人员既需要设计上层的机器学习算法,还需要同时熟知各种算子的底层隐私计算过程,这对于开发人员来说要求很高,实现难度很大。
由此,希望能有改进的方案,可以便于开发人员更容易地开发隐私保护的机器学习算法,从而便于各个平台进行隐私保护的联合机器学习。
发明内容
本说明书一个或多个实施例描述了一种编译方法和编译器,可以将描述上层的机器学习算法逻辑的描述脚本,编译为采用具体隐私算法实现各个安全算子的安全算法执行代码,从而便于开发人员更容易地开发隐私保护的机器学习算法,提升开发效率。
根据第一方面,提供了一种脚本编译方法,通过编译器执行,该方法包括:
获取按照预定格式编写的描述脚本,所述描述脚本至少定义,隐私保护的机器学习算法中的计算式;
确定用于执行所述计算式中涉及的若干算子的若干隐私算法;
获取用于执行所述若干隐私算法的若干代码模块;
基于所述若干代码模块,生成所述描述脚本对应的程序代码。
在一个实施例中,确定用于执行所述计算式中涉及的若干算子的若干隐私算法,具体包括:解析所述计算式,确定所述若干算子;确定用于执行所述若干算子的若干隐私算法。
在一种可能的实施方式中,所述描述脚本还定义,所述计算式中涉及的若干参数的隐私保护等级;所述若干算子包括第一算子;在这样的情况下,可以根据所述第一算子中涉及的第一参数的隐私保护等级,确定执行该第一算子的第一隐私算法。
进一步的,在一个实施例中,所述隐私保护等级包括:公开参数,仅持有方可见的第一隐私级别,所有参与方均不可见的第二隐私级别。
在一个实施例中,确定第一隐私算法具体可以包括:确定可用于执行所述第一算子的第一算法列表;从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;从所述若干备选算法中选择所述第一隐私算法。
在一种可能的实施方式中,上述方法还包括:获取运行所述机器学习算法的目标计算平台的性能指标;所述若干算子包括第一算子;在这样的情况下,可以根据所述性能指标,确定执行该第一算子的第一隐私算法。
进一步的,在一个实施例中,确定第一隐私算法具体可以包括:确定可用于执行所述第一算子的第一算法列表;从所述第一算法列表中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
在一个可能的实施例中,还可以根据所述第一算子中涉及的第一参数的隐私保护等级,以及所述目标计算平台的所述性能指标,确定所述第一隐私算法。
更进一步的,确定所述第一隐私算法具体可以包括:确定可用于执行所述第一算子的第一算法列表;从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;从所述若干备选算法中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
在一个实施场景中,编译器运行在所述目标计算平台;此时,可以通过读取所述目标计算平台的配置文件,获得所述性能指标。
在另一实施场景中,编译器运行在第三方平台;此时,可以接收所述目标计算平台发送的所述性能指标。
在一种可能的实施方式中,生成所述描述脚本对应的程序代码可以包括:按照所述计算式的计算逻辑,组合所述若干代码模块中的代码段,归入所述程序代码。
在另一种可能的实施方式中,生成所述描述脚本对应的程序代码可以包括:获取对所述若干代码模块进行封装形成的若干接口的接口信息;根据所述接口信息生成调用所述若干接口的调用代码,归入所述程序代码。
根据第二方面,提供了一种编译器,包括:
描述脚本获取单元,配置为获取按照预定格式编写的描述脚本,所述描述脚本至少定义,隐私保护的机器学习算法中的计算式;
隐私算法确定单元,配置为确定用于执行所述计算式中涉及的若干算子的若干隐私算法;
代码模块获取单元,配置为获取用于执行所述若干隐私算法的若干代码模块;
程序代码生成单元,配置为基于所述若干代码模块,生成所述描述脚本对应的程序代码。
根据第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面所述的方法。
根据第四方面,提供了一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面的方法。
在本说明书的实施例中,在机器学习算法层和安全算子层之间,引入了语言适配层,该语言适配层包括针对领域特定语言DSL设计的编译器。如此,开发人员可以直接使用上述DSL开发隐私保护的机器学习算法,其中仅需要描述机器学习算法的逻辑,形成描述脚本,而不需要感知底层的安全算子。然后,通过该编译器,将描述脚本编译为采用具体隐私算法实现各个安全算子的安全算法执行代码。如此,可以使得开发人员无需关注具体的安全算子和隐私算法,只需针对机器学习算法进行设计,就可以最终得到隐私保护的机器学习算法的执行代码,极大地简化了开发难度,提高了开发效果。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为隐私保护的机器学习算法的实现层次示意图;
图2示出在一个实施例中隐私保护的机器学习算法的实现层次示意图;
图3示出根据一个实施例的编译方法的流程图;
图4示出根据一个实施例的编译器的结构示意图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
图1为隐私保护的机器学习算法的实现层次示意图。最上层为机器学习算法层,其中定义具体的机器学习模型,该模型的训练过程,和/或使用过程等。上述的机器学习模型例 如可以选自线性模型,逻辑回归模型,决策树模型(例如GBDT),深度神经网络(DNN),图卷积神经网络(GCN),等等。
向下一层为安全算子层。安全算子是从各种机器学习算法出抽象出的、需要进行隐私保护的基础运算,包括安全矩阵加法,安全矩阵乘法,安全数值比较,安全求交(PSI),等等。各种机器学习算法可以拆解为若干安全算子的运算组合。例如,线性模型和逻辑回归模型中多次反复使用安全矩阵相乘和矩阵相加;决策树模型中多次反复使用安全数值比较,等等。
最底层为密码学原语(primitive)层,其中包括为实现安全算子的运算而采用的具体的密码学基本原理,例如包括,秘密分享(SS),同态加密(HE),混淆电路(GC),不经意传输(OT)等。
需要理解的是,一种安全算子可以基于多种不同的密码学原语实现。例如,安全数值比较既可以通过混淆电路实现(而混淆电路中也会利用不经意传输OT交换一些数据),也可以通过秘密分享实现。安全矩阵乘法既可以通过秘密分享实现,又可以通过同态加密实现。即使是基于同样的密码学原语实现某一安全算子,也可能存在多种不同的具体实现过程。例如,在基于秘密分享实现安全矩阵加法的过程中,有待进行安全矩阵加法的双方可以直接进行矩阵分片的运算,或者利用可信第三方进行矩阵分片的运算,最终得到和矩阵明文,或者各自得到和矩阵的一个分片,等等。
下文中,将基于密码学原语实现安全算子的具体实现过程或具体计算方式,称为隐私算法。由于这样的计算方式一般涉及多方计算,因此隐私算法有时候又称为多方之间的隐私计算协议。
基于图1的隐私保护的机器学习算法的实现层次,可以设想,当开发人员想要针对某个技术场景开发隐私保护的机器学习算法时,该开发人员不仅需要了解各种机器学习算法,从而针对当前技术场景设计适用的上层机器学习算法,还需要熟知下层的实现各种安全算子的各种隐私算法,从而根据当前的算法需求,从上到下开发出一整套实现隐私保护机器学习的代码逻辑。由于需要感知具体的安全算子和隐私算法,而这部分算法依赖于非常专业的密码学技术,这使得隐私保护的机器学习算法开发难度极大,效率很低。
为此,在本说明书的实施例中,提出一种方案,通过引入一种新的编译器和编译方法,便于开发人员进行隐私保护机器学习算法的开发。
图2示出在一个实施例中隐私保护的机器学习算法的实现层次示意图。相比于图1可以看到,在机器学习算法层和安全算子层之间,引入了语言适配层,该语言适配层包括针对领域特定语言(DSL,Domain Specific Language)设计的编译器。如此,开发人员可以直接使用上述DSL开发隐私保护的机器学习算法,其中仅需要描述机器学习算法的逻辑,形成描述脚本,而不需要感知底层的安全算子。然后,通过该编译器,将描述脚本编译为采用具体隐私算法实现各个安全算子的安全算法执行代码。如此,可以使得开发人员无需 关注具体的安全算子和隐私算法,只需针对机器学习算法进行设计,就可以最终得到隐私保护的机器学习算法的执行代码,极大地简化了开发难度,提高了开发效果。
下面具体描述实现上述功能的编译方法和由此实现的编译器。
图3示出根据一个实施例的编译方法的流程图,该编译方法用于对隐私保护的机器学习算法的描述脚本进行编译。该方法通过编译器执行,该编译器可以部署在任何具有计算、处理能力的装置、设备、平台、设备集群中。如图3所示,该编译方法包括以下步骤:步骤31,获取按照预定格式编写的描述脚本,其中至少定义,隐私保护的机器学习算法中的计算式;步骤32,确定用于执行所述计算式中涉及的若干算子的若干隐私算法;步骤33,获取用于执行所述若干隐私算法的若干代码模块;以及步骤34,基于所述若干代码模块,生成所述描述脚本对应的程序代码。下面对上述各个步骤的具体执行过程进行详细描述。
首先在步骤31,获取按照预定格式编写的描述脚本。可以理解,该描述脚本即开发人员按照编译器要求的格式编写的、用于描述隐私保护的机器学习算法的脚本。上述预定格式,或者说编译器要求的格式,形成在隐私算法领域的DSL。
一般而言,隐私保护的机器学习算法的描述脚本中至少会定义,需要进行隐私保护的机器学习算法中涉及的参数,以及基于这些参数进行计算的计算式。
举例而言,在一个实施例中,当前要开发的隐私保护的机器学习算法,用于在A方和B方之间联合训练一个模型。针对这样的机器学习算法,其描述脚本中可能会定义若干参数,例如X A表示A方持有的样本(例如用户)特征,W A表示用于处理X A的模型参数,X B表示B方持有的样本特征,W B表示用于处理X B的模型参数,y表示预测值,y'表示标签值,G A表示用于W A的梯度,G B表示用于W B的梯度,等等。以上各个参数都表示为矩阵形式(其中,预测值、标签值一般是向量形式,可以视为特殊的矩阵)。
基于以上参数,描述脚本中可以定义以下的计算式:
y=f1(W A,X A,W B,X B)     (1)
G A=f2(y,y',X A)       (2)
G B=f3(y,y',X B)       (3)
更具体的,在以上模型采用逻辑回归模型的情况下,以上计算式(1)中的函数f1具体为:
f1(W A,X A,W B,X B)=sigmoid(X A*W A+X B*W B)     (4)
当开发人员采用基于似然性的损失函数形式时,梯度的计算式可以具体化为:
f2(y,y',X A)=(y-y')*X A      (5)
f3(y,y',X B)=(y-y')*X B     (6)
可以理解,以上计算式的形式仅为示例。当采用其他模型,例如线性模型,树模型等,模型训练过程会采用其他形式的计算式。进一步的,以上仅示例性示出梯度的计算式,模型训练过程还可能涉及更多的计算式,例如根据梯度更新参数的计算式,在此不一一枚举。
在一种可能的实施方式中,一种编译器及其对应的DSL具有预先设定的隐私保护等级,例如,预先设定,算法运算过程中的所有中间结果和最终输出结果,均为各方不可见的隐私保护形式(例如加密密文形式,秘密分享的分片形式);或者,预设所有中间结果为隐私保护形式,最终输出结果为明文形式,等等。如此,开发人员可以根据机器学习算法的隐私保护等级的需要选择对应的编译器。
在另一种可能的实施方式中,编译器和对应的DSL支持开发人员针对算法中的各个参数,自定义不同的隐私保护等级。延续以上A方和B方进行联合模型训练的算法示例。根据该示例,开发人员可以针对上述参数X A,X B,W A,W B,等,设定不同的隐私保护等级。
在一个实施例中,隐私保护等级可以划分为如下三个级别:Public表示所有参与方均可见的公开参数,Private表示仅持有方可见(可称为第一隐私级别),Secret表示所有参与方均不可见(可称为第二隐私级别)。在这样的隐私保护等级划分之下,开发人员例如可以对上述各个参数进行如下隐私保护等级的定义:
Public lr
Private X A,X B
Secret W A,W B
其中,lr表示学习率,为模型学习中的超参数。
在其他实施例中,隐私保护等级还可以具有不同的划分方式,具有更多或更少的等级。例如,还可以在以上三个等级外,增加对部分参与方可见、部分参与方不可见的第三隐私级别。
通过以上描述,可以理解,开发人员仅需要通过上述描述脚本描述机器学习算法的算法逻辑,即,涉及哪些参数(参数定义),参数之间进行怎样的运算(计算式);可选地,还可以定义各个参数的隐私保护等级。开发人员无需具备密码学专业知识,无需关注上述算法逻辑如何通过各种密码学原语实现,而是将上述描述脚本输入编译器,由编译器将其转换为具体的隐私算法的实现。
编译器由熟知密码学技术和隐私保护算法的技术人员开发。为实现上述编译的功能,编译器中预先配置有安全算子-隐私算法的对应关系,以及各种隐私算法的实现代码。接下来,编译器就会利用上述对应关系和实现代码,通过步骤32到步骤34,对上述描述脚本进行编译转换。
具体地,在接收到机器学习算法的开发人员编写的DSL描述脚本后,在步骤32,编译器对该描述脚本进行解析,将其中的计算式解析为若干算子的组合;然后,针对每个算子,确定用于执行该算子的隐私算法。
例如,对于前述公式(4)所示的计算式,其中的运算X A*W A+X B*W B可以解析拆分为,用安全矩阵乘法的算子分别计算X A*W A以及X B*W B,得到两个结果矩阵;然后用安全矩阵加法的算子计算前述两个结果矩阵之和。如此,描述脚本中的各个计算式都可以解 析拆分为若干算子的组合。
可以理解,各种算子都可以利用密码学原语,通过一些隐私算法或隐私计算协议实现。为此,编译器中配置有安全算子-隐私算法的对应关系,其中记录可用于实现各算子的隐私算法。基于该对应关系,编译器可以针对解析出的每个算子,确定其对应的隐私算法。
如前所述,一种算子可以通过多种具体的隐私算法实现。相应的,在配置的对应关系中,可能存在一些算子具有多个对应的隐私算法,形成隐私算法列表。假定针对某个计算式解析出的某个算子,下文称为第一算子(例如,矩阵乘法算子),编译器配置的对应关系中,该第一算子具有多种隐私算法。在这样的情况下,编译器可以从该多种隐私算法中,选择与当前需求最为匹配的隐私算法,作为该第一算子的执行算法,下文称为第一隐私算法。
根据一种可能的实施方式,编译器具有预先设定的隐私保护等级,相应的,预先配置的各个隐私算法均具有与之匹配的隐私保护能力。在这样的情况下,在一个实施例中,针对上述第一算子,可以从能够实现该算子的多个隐私算法中随机选择之一,作为第一隐私算法。
在另一实施例中,上述各个隐私算法的执行需要耗费不同量的资源,例如需要不同的通信量,不同的计算量。相应的,编译器中针对各个隐私算法,记录有执行该算法所需的资源需求。在这样的情况下,可以依据将要运行该机器学习算法的目标计算平台的性能,选择第一算子的隐私算法。具体的,可以获取运行上述机器学习算法的目标计算平台的性能指标;该性能指标可以包括,通信性能的性能指标,例如网络带宽,网卡配置等,以及计算性能的性能指标,例如CPU配置,内存配置等等。然后根据上述性能指标,确定执行该第一算子的第一隐私算法。具体的,编译器可以基于前述对应关系确定出可用于执行该第一算子的第一算法列表,然后从该第一算法列表中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
例如,某个隐私算法的资源需求可以指示,执行该隐私算法,需要进行n次两方通信,需要进行m次某种基础运算等等。基于此,可以估计,具有上述性能指标的计算平台执行该隐私算法所需的时长。当时长在一定范围之内,例如小于某个阈值,则认为该隐私算法的资源需求与上述性能指标匹配,从而将其确定为第一隐私算法。当然,还可以利用其他匹配算法,例如分别对通信性能和计算性能进行匹配,然后确定综合匹配度,等等。总之,通过将目标计算平台的性能指标与各个隐私算法的资源需求进行比较,可以确定出在计算性能上匹配的隐私算法。
在不同实施场景下,上述运行该机器学习算法的目标计算平台,与编译器所在的平台可以相同也可以不同。具体地,在一种场景下,编译器自身也运行在该目标计算平台。此时,编译器可以读取该目标计算平台的配置文件,从而获得上述性能指标。在另一种场景下,编译器运行在第三方平台,该第三方平台可以称为编译平台。开发人员针对目标计算 平台开发机器学习算法并形成描述脚本后,可以将描述脚本连同该目标计算平台的性能指标,一并发送到编译平台。于是,编译器可以接收目标计算平台发送的性能指标,进而根据该性能指标进行隐私算法的选择。
根据一种可能的实施方式,如前所述,编译器支持开发人员针对算法中的各个参数,自定义不同的隐私保护级别。相应的,编译器中针对各个隐私算法,记录有其计算参数的隐私保护等级。在这样的情况下,对于任意的第一算子,编译器根据该第一算子中涉及的第一参数的隐私保护等级,确定执行该第一算子的第一隐私算法。
具体地,在一个实施例中,编译器通过对计算式的解析,可以确定第一算子涉及的第一参数,并结合描述脚本中对参数的隐私保护级别的自定义,确定第一参数的隐私保护级别。另一方面,编译器可以确定出可用于执行该第一算子的第一算法列表;然后从第一算法列表中选择,其计算参数的隐私保护等级符合上述第一参数的隐私保护等级的若干备选算法。进而,从该若干备选算法中选择之一作为上述第一隐私算法。
举例而言,延续公式(4)中计算式的例子,假定第一算子为用于计算X A*W A的矩阵乘法算子,其中涉及的第一参数包括X A和W A。结合对参数隐私保护级别自定义的例子,假定隐私保护等级分为3个级别,其中X A的隐私保护等级为Private,W A的隐私保护等级为Secret。
另一方面,在一个例子中,编译器中配置的用于执行矩阵乘法算子的隐私算法包括下表1所示的算法1至算法5,该表1可作为上述第一算法列表的一个示例。
表1:用于计算矩阵乘法的算法列表
用于计算U*V的算法 U的隐私保护等级 V的隐私保护等级
算法1 Private Private
算法2 Public Private
算法3 Private Secret
算法4 Secret Public
算法5 Private Secret
由于当前要计算X A*W A,第一参数X A和W A的隐私保护等级分别为Private和Secret,则以上算法中,计算参数(U和V)的隐私保护等级符合第一参数的隐私保护等级的算法有算法3和算法5,因此算法3和算法5可作为备选算法。然后,编译器从备选算法中选择之一作为执行该算子的第一隐私算法。
在一个实施例中,编译器从备选算法中任选其中之一作为上述第一隐私算法。
在另一实施例中,进一步结合运行机器学习算法的目标计算平台的性能指标,从备选算法中选择第一算子的隐私算法。在该实施例中,编译器同样获取目标计算平台的性能指标。在如前所述确定出备选算法后,从备选算法中选择,其资源需求与该性能指标相匹配 的算法作为第一隐私算法。其中,性能指标的内容、获取方式,以及资源需求与性能指标的匹配方式,均可以参照之前的实施例,不复赘述。
如此,通过以上各种方式,对于计算式中涉及的各个算子,可以分别确定出适用的隐私算法。
接着,在步骤33,获取用于执行上述隐私算法的代码模块。如前所述,这些代码模块可以由熟知密码学的技术人员预先开发。于是,在步骤34,可以基于上述代码模块,生成描述脚本对应的程序代码。
在一个实施例中,可以按照描述脚本中计算式的计算逻辑,组合与各个算子对应的各个代码模块中的代码段,基于这些代码段形成程序代码。如此形成的程序代码,包含各个算子的代码实现体。
在另一实施例中,也可以预先对上述各个代码模块进行封装,形成接口,或称为接口函数。每个接口具有相应的接口信息,例如包括,接口函数的函数名,参数数量,参数类型,等等。相应的,在步骤33,可以获取各个算子对应的接口的接口信息,并根据所述接口信息生成调用对应接口的调用代码,将该调用代码包含在生成的程序代码中。在该实施例中,形成的程序代码可以不包含各个算子的代码实现体,而是以接口方式调用相应的代码实现体。
如此,通过以上各种方式,生成了描述脚本对应的程序代码。典型地,该生成的程序代码与预先开发的实现各种隐私算法的代码模块,采用同样的程序语言。一般地,该程序代码可以是高级语言代码,例如Java,C,也可以是介于高级语言和机器语言之间的中间代码,例如,汇编语言代码,字节码,等等。对于代码语言和代码形式,在此不做限定。
由此可见,不同于常规编译器将高级语言代码编译为便于机器执行的底层代码,本说明书实施例中的编译器,是将描述上层的机器学习算法逻辑的描述脚本,编译为采用具体隐私算法实现各个安全算子的安全算法执行代码。如此,可以使得开发人员无需关注具体的安全算子和隐私算法,只需针对机器学习算法进行设计,就可以最终得到隐私保护的机器学习算法的执行代码,降低开发难度,提升开发效率。
根据另一方面的实施例,提供了一种编译器,用于对隐私保护的机器学习算法的脚本进行编译。图4示出根据一个实施例的编译器的结构示意图,该编译器可以部署在任何具有数据存储、计算、处理能力的设备、平台或设备集群中。如图4所示,该编译器400包括:
描述脚本获取单元41,配置为获取按照预定格式编写的描述脚本,所述描述脚本至少定义,隐私保护的机器学习算法中的计算式;
隐私算法确定单元42,配置为确定用于执行所述计算式中涉及的若干算子的若干隐私算法;
代码模块获取单元43,配置为获取用于执行所述若干隐私算法的若干代码模块;
程序代码生成单元44,配置为基于所述若干代码模块,生成所述描述脚本对应的程序代码。
根据一个实施例,所述隐私算法确定单元42包括(未示出):
算子解析模块,配置为解析所述计算式,确定所述若干算子;
算法确定模块,配置为确定用于执行所述若干算子的若干隐私算法。
在一种可能的实施方式中,描述脚本还定义,所述计算式中涉及的若干参数的隐私保护等级;所述若干算子包括第一算子;在这样的情况下,所述隐私算法确定单元42可配置为,根据所述第一算子中涉及的第一参数的隐私保护等级,确定执行该第一算子的第一隐私算法。
进一步的,在一个实施例中,所述隐私保护等级可以包括:公开参数,仅持有方可见的第一隐私级别,所有参与方均不可见的第二隐私级别。
在一个具体实施例中,所述隐私算法确定单元42具体配置为:
确定可用于执行所述第一算子的第一算法列表;
从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
从所述若干备选算法中选择所述第一隐私算法。
在一种可能的实施方式中,上述编译器400还包括性能指标获取单元(未示出),配置为获取运行所述机器学习算法的目标计算平台的性能指标;而所述若干算子包括第一算子;此时,所述隐私算法确定单元42可配置为,根据所述性能指标,确定执行该第一算子的第一隐私算法。
进一步的,在一个实施例中,所述隐私算法确定单元42可具体配置为:
确定可用于执行所述第一算子的第一算法列表;
从所述第一算法列表中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
在一个具体实施例中,所述隐私算法确定单元42还可以配置为,根据所述第一算子中涉及的第一参数的隐私保护等级,以及所述目标计算平台的所述性能指标,确定所述第一隐私算法。
具体的,在一个例子中,所述隐私算法确定单元42可以具体执行以下步骤:
确定可用于执行所述第一算子的第一算法列表;
从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
从所述若干备选算法中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
在一个实施场景中,所述编译器400运行在所述目标计算平台。在这样的情况下,所述性能指标获取单元配置为,读取所述目标计算平台的配置文件,获得所述性能指标。
在另一实施场景中,所述编译器400运行在第三方平台;在这样的情况下,所述性能指标获取单元配置为,接收所述目标计算平台发送的所述性能指标。
在一个实施例中,所述程序代码生成单元44配置为:按照所述计算式的计算逻辑,组合所述若干代码模块中的代码段,归入所述程序代码。
在另一实施例中,所述程序代码生成单元44配置为:
获取对所述若干代码模块进行封装形成的若干接口的接口信息;
根据所述接口信息生成调用所述若干接口的调用代码,归入所述程序代码。
通过以上编译器,可以将描述上层的机器学习算法逻辑的描述脚本,编译为采用具体隐私算法实现各个安全算子的安全算法执行代码,从而简化开发人员的开发过程。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图3所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图3所述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (26)

  1. 一种脚本编译方法,通过编译器执行,该方法包括:
    获取按照预定格式编写的描述脚本,所述描述脚本至少定义,隐私保护的机器学习算法中的计算式;
    确定用于执行所述计算式中涉及的若干算子的若干隐私算法;
    获取用于执行所述若干隐私算法的若干代码模块;
    基于所述若干代码模块,生成所述描述脚本对应的程序代码。
  2. 根据权利要求1所述的方法,其中,确定用于执行所述计算式中涉及的若干算子的若干隐私算法,包括:
    解析所述计算式,确定所述若干算子;
    确定用于执行所述若干算子的若干隐私算法。
  3. 根据权利要求1所述的方法,其中,所述描述脚本还定义,所述计算式中涉及的若干参数的隐私保护等级;所述若干算子包括第一算子;
    所述确定用于执行所述计算式中涉及的若干算子的若干隐私算法,包括:
    根据所述第一算子中涉及的第一参数的隐私保护等级,确定执行该第一算子的第一隐私算法。
  4. 根据权利要求3所述的方法,其中,所述隐私保护等级包括:公开参数,仅持有方可见的第一隐私级别,所有参与方均不可见的第二隐私级别。
  5. 根据权利要求3所述的方法,其中,确定用于执行该第一算子的第一隐私算法,包括:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
    从所述若干备选算法中选择所述第一隐私算法。
  6. 根据权利要求1所述的方法,还包括:获取运行所述机器学习算法的目标计算平台的性能指标;所述若干算子包括第一算子;
    所述确定用于执行所述计算式中涉及的若干算子的若干隐私算法,包括:
    根据所述性能指标,确定执行该第一算子的第一隐私算法。
  7. 根据权利要求6所述的方法,其中,确定用于执行该第一算子的第一隐私算法,包括:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
  8. 根据权利要求3所述的方法,还包括:获取运行所述机器学习算法的目标计算平台的性能指标;
    确定执行该第一算子的第一隐私算法,包括:
    根据所述第一算子中涉及的第一参数的隐私保护等级,以及所述目标计算平台的所述性能指标,确定所述第一隐私算法。
  9. 根据权利要求8所述的方法,其中,确定所述第一隐私算法,包括:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
    从所述若干备选算法中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
  10. 根据权利要求6或8所述的方法,其中,所述编译器运行在所述目标计算平台;
    所述获取运行所述机器学习算法的目标计算平台的性能指标包括:读取所述目标计算平台的配置文件,获得所述性能指标。
  11. 根据权利要求6或8所述的方法,其中,所述编译器运行在第三方平台;
    所述获取运行所述机器学习算法的目标计算平台的性能指标包括:接收所述目标计算平台发送的所述性能指标。
  12. 根据权利要求1所述的方法,其中,基于所述若干代码模块,生成所述描述脚本对应的程序代码,包括:
    按照所述计算式的计算逻辑,组合所述若干代码模块中的代码段,归入所述程序代码。
  13. 根据权利要求1所述的方法,其中,基于所述若干代码模块,生成所述描述脚本对应的程序代码,包括:
    获取对所述若干代码模块进行封装形成的若干接口的接口信息;
    根据所述接口信息生成调用所述若干接口的调用代码,归入所述程序代码。
  14. 一种编译器,包括:
    描述脚本获取单元,配置为获取按照预定格式编写的描述脚本,所述描述脚本至少定义,隐私保护的机器学习算法中的计算式;
    隐私算法确定单元,配置为确定用于执行所述计算式中涉及的若干算子的若干隐私算法;
    代码模块获取单元,配置为获取用于执行所述若干隐私算法的若干代码模块;
    程序代码生成单元,配置为基于所述若干代码模块,生成所述描述脚本对应的程序代码。
  15. 根据权利要求14所述的编译器,其中,所述隐私算法确定单元包括:
    算子解析模块,配置为解析所述计算式,确定所述若干算子;
    算法确定模块,配置为确定用于执行所述若干算子的若干隐私算法。
  16. 根据权利要求14所述的编译器,其中,所述描述脚本还定义,所述计算式中涉及的若干参数的隐私保护等级;所述若干算子包括第一算子;
    所述隐私算法确定单元配置为,根据所述第一算子中涉及的第一参数的隐私保护等级,确定执行该第一算子的第一隐私算法。
  17. 根据权利要求16所述的编译器,其中,所述隐私保护等级包括:公开参数,仅持有方可见的第一隐私级别,所有参与方均不可见的第二隐私级别。
  18. 根据权利要求16所述的编译器,其中,所述隐私算法确定单元具体配置为:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
    从所述若干备选算法中选择所述第一隐私算法。
  19. 根据权利要求14所述的编译器,还包括性能指标获取单元,配置为获取运行所述机器学习算法的目标计算平台的性能指标;
    所述若干算子包括第一算子;
    所述隐私算法确定单元配置为,根据所述性能指标,确定执行该第一算子的第一隐私算法。
  20. 根据权利要求19所述的编译器,其中,所述隐私算法确定单元具体配置为:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
  21. 根据权利要求16所述的编译器,还包括性能指标获取单元,配置为获取运行所述机器学习算法的目标计算平台的性能指标;
    所述隐私算法确定单元配置为,根据所述第一算子中涉及的第一参数的隐私保护等级,以及所述目标计算平台的所述性能指标,确定所述第一隐私算法。
  22. 根据权利要求21所述的编译器,其中,所述隐私算法确定单元具体配置为:
    确定可用于执行所述第一算子的第一算法列表;
    从所述第一算法列表中选择,其计算参数的隐私保护等级符合所述第一参数的隐私保护等级的若干备选算法;
    从所述若干备选算法中选择,其资源需求与所述性能指标匹配的算法作为所述第一隐私算法。
  23. 根据权利要求19或21所述的方法,其中,
    所述编译器运行在所述目标计算平台;所述性能指标获取单元配置为,读取所述目标 计算平台的配置文件,获得所述性能指标;或者,
    所述编译器运行在第三方平台;所述性能指标获取单元配置为,接收所述目标计算平台发送的所述性能指标。
  24. 根据权利要求14所述的编译器,其中,所述程序代码生成单元配置为:按照所述计算式的计算逻辑,组合所述若干代码模块中的代码段,归入所述程序代码。
  25. 根据权利要求14所述的编译器法,其中,所述程序代码生成单元配置为:
    获取对所述若干代码模块进行封装形成的若干接口的接口信息;
    根据所述接口信息生成调用所述若干接口的调用代码,归入所述程序代码。
  26. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-13中任一项所述的方法。
PCT/CN2022/105056 2021-08-25 2022-07-12 用于隐私保护的机器学习算法脚本编译方法和编译器 WO2023024735A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/571,351 US20240281226A1 (en) 2021-08-25 2022-07-12 Script compilation method and compiler for privacy-preserving machine learning algorithm

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110984175.0A CN113672985B (zh) 2021-08-25 2021-08-25 用于隐私保护的机器学习算法脚本编译方法和编译器
CN202110984175.0 2021-08-25

Publications (1)

Publication Number Publication Date
WO2023024735A1 true WO2023024735A1 (zh) 2023-03-02

Family

ID=78546332

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105056 WO2023024735A1 (zh) 2021-08-25 2022-07-12 用于隐私保护的机器学习算法脚本编译方法和编译器

Country Status (3)

Country Link
US (1) US20240281226A1 (zh)
CN (1) CN113672985B (zh)
WO (1) WO2023024735A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118151906A (zh) * 2024-05-11 2024-06-07 上海燧原智能科技有限公司 一种算子的自动生成方法、装置、设备及介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672985B (zh) * 2021-08-25 2023-11-14 支付宝(杭州)信息技术有限公司 用于隐私保护的机器学习算法脚本编译方法和编译器
CN114327486B (zh) * 2021-12-31 2024-01-23 北京瑞莱智慧科技有限公司 基于领域专用语言实现多方安全计算的方法、装置及介质
CN116257303B (zh) * 2023-05-04 2023-08-15 支付宝(杭州)信息技术有限公司 一种数据安全处理的方法、装置、存储介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130074052A1 (en) * 2011-09-16 2013-03-21 Keith Adams Run time incremental compilation of script code
KR20180099044A (ko) * 2017-02-28 2018-09-05 엘에스산전 주식회사 Scada 시스템 및 스크립트 언어 컴파일 방법
CN111415013A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 隐私机器学习模型生成、训练方法、装置及电子设备
CN111414646A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 实现隐私保护的数据处理方法和装置
CN111783124A (zh) * 2020-07-07 2020-10-16 矩阵元技术(深圳)有限公司 基于隐私保护的数据处理方法、装置和服务器
CN113672985A (zh) * 2021-08-25 2021-11-19 支付宝(杭州)信息技术有限公司 用于隐私保护的机器学习算法脚本编译方法和编译器

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101483860B (zh) * 2009-01-23 2010-09-01 清华大学 Ims网络中基于sip安全策略等级的协商控制方法
US10936750B2 (en) * 2018-03-01 2021-03-02 International Business Machines Corporation Data de-identification across different data sources using a common data model
US11431470B2 (en) * 2019-08-19 2022-08-30 The Board Of Regents Of The University Of Texas System Performing computations on sensitive data while guaranteeing privacy
CN111428880A (zh) * 2020-03-20 2020-07-17 矩阵元技术(深圳)有限公司 隐私机器学习实现方法、装置、设备及存储介质
CN111859267B (zh) * 2020-06-22 2024-04-26 复旦大学 基于bgw协议的隐私保护机器学习激活函数的运算方法
CN112883408B (zh) * 2021-04-29 2021-07-16 深圳致星科技有限公司 用于隐私计算的加解密系统和芯片
CN113158252A (zh) * 2021-05-10 2021-07-23 浙江工商大学 一种基于深度学习的大数据隐私保护方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130074052A1 (en) * 2011-09-16 2013-03-21 Keith Adams Run time incremental compilation of script code
KR20180099044A (ko) * 2017-02-28 2018-09-05 엘에스산전 주식회사 Scada 시스템 및 스크립트 언어 컴파일 방법
CN111415013A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 隐私机器学习模型生成、训练方法、装置及电子设备
CN111414646A (zh) * 2020-03-20 2020-07-14 矩阵元技术(深圳)有限公司 实现隐私保护的数据处理方法和装置
CN111783124A (zh) * 2020-07-07 2020-10-16 矩阵元技术(深圳)有限公司 基于隐私保护的数据处理方法、装置和服务器
CN113672985A (zh) * 2021-08-25 2021-11-19 支付宝(杭州)信息技术有限公司 用于隐私保护的机器学习算法脚本编译方法和编译器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118151906A (zh) * 2024-05-11 2024-06-07 上海燧原智能科技有限公司 一种算子的自动生成方法、装置、设备及介质

Also Published As

Publication number Publication date
CN113672985A (zh) 2021-11-19
CN113672985B (zh) 2023-11-14
US20240281226A1 (en) 2024-08-22

Similar Documents

Publication Publication Date Title
Viand et al. SoK: Fully homomorphic encryption compilers
WO2023024735A1 (zh) 用于隐私保护的机器学习算法脚本编译方法和编译器
Kosba et al. xJsnark: A framework for efficient verifiable computation
Riazi et al. Chameleon: A hybrid secure computation framework for machine learning applications
Steffen et al. Zeestar: Private smart contracts by homomorphic encryption and zero-knowledge proofs
Li et al. Privacy-preserving feature selection with secure multiparty computation
CN111428887B (zh) 一种基于多个计算节点的模型训练控制方法、装置及系统
US8370621B2 (en) Counting delegation using hidden vector encryption
WO2019091016A1 (zh) 数据采集工具包定制方法、装置、终端和存储介质
Mouris et al. Zilch: A framework for deploying transparent zero-knowledge proofs
CN111783124A (zh) 基于隐私保护的数据处理方法、装置和服务器
US20220121775A1 (en) System and method for converting machine learning algorithm, and electronic device
WO2021184347A1 (zh) 实现隐私保护的数据处理方法和装置
Almousa et al. Alice and Bob: Reconciling formal models and implementation
Fang et al. CostCO: An automatic cost modeling framework for secure multi-party computation
Gouert et al. HELM: Navigating Homomorphic Encryption through Gates and Lookup Tables
EP3907616B1 (en) Generation of optimal program variation
US20140298455A1 (en) Cryptographic mechanisms to provide information privacy and integrity
Pentyala et al. Privfair: a library for privacy-preserving fairness auditing
US20170302437A1 (en) Nondecreasing sequence determining device, method and program
US11394668B1 (en) System and method for executing operations in a performance engineering environment
CN110569659B (zh) 数据处理方法、装置和电子设备
CN114327486B (zh) 基于领域专用语言实现多方安全计算的方法、装置及介质
Günther et al. HElium: A Language and Compiler for Fully Homomorphic Encryption with Support for Proxy Re-Encryption
Mood et al. PAL: A pseudo assembly language for optimizing secure function evaluation in mobile devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860081

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18571351

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22860081

Country of ref document: EP

Kind code of ref document: A1