WO2022109861A1 - 一种加密机器学习的训练数据准备方法、装置和设备 - Google Patents

一种加密机器学习的训练数据准备方法、装置和设备 Download PDF

Info

Publication number
WO2022109861A1
WO2022109861A1 PCT/CN2020/131449 CN2020131449W WO2022109861A1 WO 2022109861 A1 WO2022109861 A1 WO 2022109861A1 CN 2020131449 W CN2020131449 W CN 2020131449W WO 2022109861 A1 WO2022109861 A1 WO 2022109861A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
privacy
target
owner node
input
Prior art date
Application number
PCT/CN2020/131449
Other languages
English (en)
French (fr)
Inventor
黄高峰
谢翔
陈元丰
晏意林
史俊杰
李升林
孙立林
Original Assignee
上海阵方科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海阵方科技有限公司 filed Critical 上海阵方科技有限公司
Priority to PCT/CN2020/131449 priority Critical patent/WO2022109861A1/zh
Publication of WO2022109861A1 publication Critical patent/WO2022109861A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of this specification relate to the technical field of machine learning, and in particular, to a method, apparatus, and device for preparing training data for encrypted machine learning.
  • the current privacy calculator learning platforms based on TensorFlow and the secure multi-party computation protocol include: LatticeX-Rosetta.
  • the secure multi-party computation protocol requires computing participants to perform the same computing operations and use secret shared values or encrypted values for computation.
  • the above LatticeX-Rosetta is for For values that have been secretly shared or encrypted, you can directly use the tf.data API (interface) to construct an input stream, so that all computing participants can continuously obtain input data from the data source and provide it for private machine learning training.
  • the data is the original value (private data)
  • only the private data owner has the data source (private data)
  • the non-data owner has no data source and data structure information, so the input stream cannot be constructed, and the legal data cannot be obtained.
  • the embodiments of this specification provide a method, device, and device for preparing training data for encrypted machine learning, so as to solve the problem that the privacy calculator learning platform in the prior art needs to secretly share or encrypt the privacy data in the privacy data source first, and the efficiency lower problem.
  • An embodiment of this specification provides a training data preparation method for encrypted machine learning, including: a data owner node constructs an input stream based on a private data source; the data owner node obtains from the private data source by using the input stream target privacy data; the data owner node sends the target feature information of the privacy data in the privacy data source to multiple non-data owner nodes; wherein the target feature information is used to represent the privacy data in the privacy data source
  • the structure of the data; each non-data owner node constructs invalid privacy input data according to the target feature information; the data owner node and the non-data owner node are based on the invalid privacy input data and the secret input calculation of the secure multi-party computation protocol.
  • the target privacy data is converted into the target encrypted data type, and the target encrypted data is obtained; the multiple non-data owner nodes and the data owner node use the target encrypted data and the secure multi-party computing protocol to perform collaborative machine learning train.
  • the embodiments of this specification also provide an apparatus for preparing training data for encrypted machine learning, including: an input stream building module for the data owner node to construct an input stream based on a private data source; an acquisition module for the data owner node Obtain target privacy data from the privacy data source by using the input stream; a sending module is used for the data owner node to send the target feature information of the privacy data in the privacy data source to multiple non-data owner nodes wherein, the target feature information is used to characterize the structure of the privacy data in the privacy data source; the determination module is used for each non-data owner node to construct invalid privacy input data according to the target feature information; the conversion module is used for The data owner node and the non-data owner node convert the target privacy data into a target encrypted data type based on the invalid privacy input data and the secret input operator of the secure multi-party computation protocol to obtain target encrypted data; the training module, for the plurality of non-data owner nodes and the data owner node to perform collaborative machine learning training using the target encrypted data and a secure multi-party
  • Embodiments of the present specification also provide a training data preparation device for encrypted machine learning, including a processor and a memory for storing instructions executable by the processor, and when the processor executes the instructions, the training of the encrypted machine learning is implemented Steps of the data preparation method.
  • Embodiments of the present specification also provide a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed, implement the steps of the method for preparing training data for encrypted machine learning.
  • the embodiments of this specification provide a training data preparation method for encrypted machine learning, which can construct an input stream based on a private data source through a data owner node, and the constructed input stream obtains target private data from the private data source. Since the non-data owner node cannot directly obtain the target private data, in order to obtain the same shape of the data obtained by each computing node, the data owner node can send the target feature information of the private data in the private data source to multiple non-data owner nodes. , the target feature information is used to represent the structure of the private data in the private data source, and the invalid private input data of each non-data owner node can be determined according to the target feature information.
  • the data owner node and the non-data owner node can coordinate the target based on the invalid private input data and the secret input operator of the secure multi-party computing protocol.
  • the private data is converted into the target encrypted data type, and the target encrypted data is obtained.
  • Multiple non-data owner nodes and data owner nodes can use target encrypted data for collaborative machine learning training, so that multiple participants can be jointly trained for encrypted machine learning on the premise of ensuring data privacy.
  • FIG. 1 is a schematic diagram of steps of a training data preparation method for encrypted machine learning provided according to an embodiment of the present specification
  • FIG. 2 is a schematic diagram of an interactive execution relationship between multiple participants provided according to an embodiment of the present specification
  • FIG. 3 is a schematic diagram of a process of jointly performing machine learning training or reasoning by multiple participants according to an embodiment of the present specification
  • FIG. 4 is a schematic structural diagram of an apparatus for preparing training data for encrypted machine learning provided according to an embodiment of the present specification
  • FIG. 5 is a schematic structural diagram of a training data preparation device for encrypted machine learning provided according to an embodiment of the present specification.
  • this embodiment can provide a training data preparation method for encrypted machine learning.
  • the training data preparation method for encrypted machine learning can be used to prepare training data for encrypted machine learning in conjunction with multiple participants under the premise of ensuring data privacy.
  • the above-mentioned method for preparing training data for encrypted machine learning may include the following steps.
  • S101 The data owner node constructs an input stream based on the private data source.
  • the data used for training may be provided by multiple participants respectively, out of competitive advantages, privacy concerns, regulations
  • the system and issues concerning data sovereignty and jurisdiction cannot openly share their data. Therefore, data has the attribute of owner relationship, and multiple participants can be divided into data owners and non-data owners.
  • a stream is an abstract concept, which is an abstraction of input and output devices.
  • An input stream can be regarded as an input channel, and an input stream is required for external incoming data to a program. Since the data owner node can directly use the API of tf.data to construct the input stream, the data owner node can construct the input stream based on the private data source.
  • the above tf.data is an API (Application Programming Interface) for data reading pipeline construction.
  • tf.data The API of tf.data is used to build a data pipeline, mainly relying on two APIs: tf.data.Dataset and tf.data .Iterator, tf.data.Dataset is used to read in data, do preprocessing, and read data depends on the tf.data.Iterator interface.
  • the above-mentioned Dataset is an independent data set that does not depend on the database. Even if the data link is disconnected or the database is closed, the Dataset is still available.
  • the above Iterator is used to provide a way to sequentially access the elements of an aggregate object without exposing the object's internal representation.
  • the Iterator pattern is a pattern applied to aggregate objects. By using this pattern, each element in an aggregate object can be accessed in a certain order (methods provided by Iterator) without knowing the internal representation of the object.
  • the above-mentioned LatticeX-Rosetta is a privacy calculator learning platform constructed based on TensorFlow and a secure multi-party computing protocol, and its computing nodes are relatively independent in the privacy machine learning training, rather than the computing nodes of the TensorFlow distributed cluster , each participating node executes the exact same computation graph and computation logic. Since the data has the attribute of owner relationship, for the value that has been secretly shared or encrypted, the input stream can be constructed directly using the API (interface) of tf. Privacy Machine Learning Training.
  • a Dataset object of the private data source of tf.data can be designed to process external devices (disk, network IO, Reading of private data in databases, etc.).
  • the above get_next function is used to return the next object in the collection. If there is no next object (for example, if it has been positioned at the end of the collection), it will return and throw an exception.
  • the above network IO is network input/output.
  • the data obtained by each computing node from the privacy data pipeline has the same shape, when constructing an input stream, it can be defined that the data owner node not only completes the input stream construction, but also needs to read the privacy data.
  • the feature information of the data is sent to the non-data owner, so that the non-data owner can obtain the structure information of the private data, so that the data pipeline construction of multi-party computing can be completed.
  • TensorFlow is a system that transmits complex data structures to artificial intelligence neural networks for analysis and processing.
  • Tensor means N-dimensional array
  • Flow means based on data flow graph
  • TensorFlow is the calculation process of tensors flowing from one end of the flow graph to the other end. The execution process is actually from the input to the computational graph, and then the computational graph executes the computational node to get the output.
  • the computational graph consists of a series of nodes ( operator), therefore, the private data source Dataset object actually runs the TensorFlow operator during the execution process.
  • the private data owned by the data owner node can be stored in a corresponding external device (disk, network IO, database, etc.), and the above-mentioned original private data is plaintext data that can be obtained by the data owner and is stored in an external device.
  • the collection of private data in the device is a private data source.
  • S102 The data owner node obtains the target privacy data from the privacy data source by using the input stream.
  • the data owner node can obtain the data from the private data source by using the constructed input stream Target Privacy Data.
  • the data owner node can obtain the target privacy data from the privacy data source by using the get_next operation of the Iterator iterator, and the target privacy data can be randomly obtained from the privacy data source or preset in a preset order. unit data.
  • the above-mentioned preset order may be defined in the Iterator iterator, and the above-mentioned preset unit may be 32 elements, one line, etc., which can be determined according to the actual situation, which is not limited in this specification.
  • the target privacy data may include identification data of the data owner node, and the identification data may be the identification of the data owner node, such as: the data owner belongs to The computing node is P0, and the identification data of the corresponding data owner node may be P0.
  • the identification data is not limited to the above examples, and those skilled in the art may make other changes under the inspiration of the technical essence of the embodiments of this specification. Covered within the protection scope of the embodiments of this specification.
  • the data owner node sends the target feature information of the private data in the private data source to a plurality of non-data owner nodes; wherein, the target feature information is used to represent the structure of the private data in the private data source.
  • the data owner node in order to ensure that in privacy computing, the shape of the data obtained by each computing node from the privacy data pipeline is consistent, the data owner node can obtain the target privacy data from the privacy data source by using the get_next function for the first time.
  • the target feature information of the private data in the private data source required for this training, and the target feature information is sent to multiple non-data owners, so that the non-data owners can obtain the structure information of the private data in the private data source.
  • the target feature information may be used to represent the structure of the private data in the privacy data source, and the target feature information may be used to represent the structure of the privacy data in the privacy data source.
  • the above-mentioned target feature information may be header information of private data in the private data source, and the above-mentioned header information may include: header summary information such as the number of rows, the number of columns, and the separator.
  • header summary information such as the number of rows, the number of columns, and the separator.
  • the above-mentioned header information may also include other possible data, and the specific data may be determined according to the actual situation, which is not limited in the embodiment of this specification.
  • S104 Each non-data owner node constructs invalid privacy input data according to the target feature information.
  • each non-data owner node can determine invalid privacy input data according to the acquired target feature information.
  • the above invalid privacy input data is only to ensure that TensorFlow execution can be performed correctly , is invalid input data and will not be used for secure multi-party computation.
  • secure multi-party computation means that in the absence of a trusted third party, multiple participating parties jointly compute an agreed function, and ensure that each party only obtains its own computation result, and cannot pass the interaction in the computation process.
  • the data infers the input and output data of any other party.
  • Secret Sharing (SS, Secret Sharing) refers to splitting data into multiple meaningless numbers and distributing these numbers to multiple participants. What each participant gets is a part of the original data. One or a few participants cannot restore the original data. Only when the respective data of each participant is put together can the real data be restored.
  • a secret sharing method can be used for secret distribution, and each non-data owner can fill it with 0 (that is, no numerical contribution) according to the target feature information, and use 0 as the invalid privacy input of each non-data owner. data. Therefore, the data field value of all non-data owner nodes is 0 (that is, fake fake data), and the data owner node has the actual privacy value, which not only achieves the purpose of encryption by secret sharing, but also ensures that the data pipeline can be correct. implement.
  • the filling according to the target feature information may not be limited to filling 0, but may also be any other data value of the same type. can be determined according to the actual situation, which is not limited in the embodiments of this specification.
  • the interactive execution relationship between multiple participants is shown in Figure 2. It is assumed that there are three participants: P0, P1, and P2, where P0 is the data owner, and the target privacy data value is a , then its secret sharing value is still a; the secret sharing value of the non-data owners P1 and P2 is filled with 0 (fake data) according to the target feature information broadcasted by the data owner P0.
  • P0 owns the data, it still retains its original data value a, while P1 and P2 do not have the data, and can be directly set to 0 (that is, there is no numerical contribution), so that the data values of the participants P1 and P2 who do not own the data are added
  • the sum of the data values of all participants P0, P1 and P2 is still a, so that the purpose of secret sharing can be achieved, and the participants P0, P1 and P2 can use the secret sharing value for joint calculation.
  • S105 The data owner node and the non-data owner node convert the target private data into the target encrypted data type based on the invalid private input data and the secret input operator of the secure multi-party computation protocol to obtain the target encrypted data.
  • the data owner node can combine the invalid privacy input data of the non-data owner node to convert the target privacy data of the data owner node into the target encrypted data type, thus, the target encrypted data is obtained.
  • the secure multi-party computation may utilize technologies such as additive secret sharing, oblivious transfer (Oblivious Transfer), and obfuscation circuit (Garble Circle), which are not limited in this description.
  • technologies such as additive secret sharing, oblivious transfer (Oblivious Transfer), and obfuscation circuit (Garble Circle), which are not limited in this description.
  • the above-mentioned target encrypted data types can be corresponding to the Latticex-Rosetta framework.
  • the encrypted data type of the target Secure Multiparty Computation Protocol The externally input data needs to be converted into the encrypted data type of the corresponding protocol of the framework. Therefore, a PrivateInput operator can be added at the end of the data preprocessing function to convert the target private data of the data owner node into the target encrypted data type. The private input operator will call the secret input PrivateInput interface corresponding to the secure multi-party computation protocol to complete its function.
  • the PrivateInput operator can be called in the final output of text decoding to convert the private data into an encrypted data type corresponding to the secure multi-party computation protocol.
  • the PrivateInput operator includes a data_owner to identify the data owner.
  • tf.data provides the map interface.
  • the map interface can add flexible and diverse preprocessing operations for the output of the data pipeline by adding conversion functions. You can use map to parse each line of a csv file (ie, a text line file, the field separator is ","). And convert it to a tf.float32 vector, and then add batch batch conversion, each execution to the batch will get the next new 32 elements, and finally can provide the ability of data pipeline through the iterator's get_next.
  • the map interface stores a set of key-value objects, and provides a mapping from key (key) to value (value).
  • the keys in the map are not required to be ordered and are not allowed to be repeated.
  • the above tf.float can be used to specify the type of data, batch (batch) is usually considered to build a batch collection of data.
  • a PrivateTextLineDataset class can be designed to implement the function of the privacy source data Dataset operator, wherein the constructor of PrivateTextLineDataset can add an attribute field data_owner to identify the data owner.
  • the privacy data pipeline can be implemented by minimally changing the use of the tf.data API and cooperating with minimal changes in the user layer.
  • other types of source Datasets for example: tf.data.TFRecordDataset, tf.data.FixedLengthRecordDataset or Dataset contributed by third parties
  • the data owner node and the non-data owner node convert the target privacy data into the target encrypted data type based on the invalid privacy input data and the secret input operator of the secure multi-party computation protocol, and obtain the target encrypted data, which may include: Valid privacy data is determined according to the invalid privacy input data of each non-data owner node and the target privacy data of the data owner node; wherein, the valid privacy data is the target privacy data. Further, the target private data can be converted into the target encrypted data type based on the invalid private input data and the secret input operator of the secure multi-party computation protocol to obtain the target encrypted data.
  • each non-data owner node owns invalid privacy input data
  • since the invalid privacy input data is fake data, not its real data, in fact, each non-data owner node does not have data. Therefore, when using the secret input operator of the secure multi-party computing protocol for encryption, the valid privacy data of multiple participants can be determined first.
  • the above valid privacy data is the target privacy data of the data owner, so that the data owner nodes can be united
  • the target private data is converted into the target encrypted data type by using the secure multi-party computation protocol with the non-data owner node, so that the target encrypted data can be obtained.
  • S106 A plurality of non-data owner nodes and data owner nodes perform collaborative machine learning training using target encrypted data and a secure multi-party computing protocol.
  • secret sharing refers to splitting data into multiple meaningless numbers, and distributing these numbers to multiple participants, where each participant gets the original data. In part, one or a few participants cannot restore the original data, and the real data can only be restored when the respective data of each participant is put together. Meanwhile, secret sharing can be considered as the target encryption method of data. Therefore, multiple non-data owner nodes and data owner nodes can utilize target encrypted data for collaborative machine learning training.
  • multiple non-data owner nodes and data owner nodes use target encrypted data and a secure multi-party computing protocol to perform collaborative machine learning training, which may include: using target encrypted data as training data, multiple non-data owner nodes The node and the data owner node perform collaborative machine learning training based on the training data.
  • the target encrypted data can be used as training data. Since the training data is encrypted data, the non-data owner cannot obtain the plaintext data of the data owner during the training process, thereby ensuring the privacy of the data. Under the premise of sexuality, it combines multiple participants for privacy machine learning training.
  • the way of determining the training data is not limited to the above examples, and those skilled in the art may make other changes under the inspiration of the technical essence of the embodiments of this specification, but as long as the functions and effects achieved are the same or similar to the embodiments of this specification , shall be covered within the protection scope of the embodiments of this specification.
  • a PrivateTextLineDatasetOp class can be designed to implement the functions of the privacy source data Dataset operator according to the way that TensorFlow defines the Dataset operator.
  • the functions of the privacy source data Dataset operator mainly include: constructing an input stream, providing a get_next operation Used to obtain the private data of the next unit.
  • a PrivateTextLineDatasetV1 class can be created, the export alias is PrivateTextLineDataset, and the attribute field data_owner is added to its constructor to identify the data owner. Users only need to use the following python code to create the privacy source data Dataset operator.
  • the PrivateTextLineDatasetV1 class is a python layer class, which is directly used by the user code;
  • the PrivateTextLineDatasetOp class is a C++ bottom class, which is called by the TensorFlow engine when executing the computation graph.
  • the Dataset object since the connection between the Dataset object and the private data source occurs for a short time, it is disconnected from the data source immediately after acquiring the data, and the connection is not established until the next time the data in the data source is to be operated. Therefore, using the Dataset constructed in this embodiment to provide data pipeline is more superior than the method of reading private data into the memory at one time, because the data is dynamically executed during the execution of the calculation graph, and the data can be obtained on demand each time. Effectively reduces the memory required for training.
  • the method may further include: calling the get_next function of the iterator for the first time to obtain After the target privacy data is obtained, the data owner node can call the get_next function of the iterator to obtain the next privacy data of the target privacy data from the privacy data source by using the input stream. Further, the data owner node and each non-data owner node can cooperatively construct data of the target encrypted data type based on the privacy input operator of the secure multi-party protocol, obtain the encrypted data of the next privacy data of the target privacy data, and use the target privacy data. The encrypted data of the next private data of the data and the secure multi-party computing protocol perform collaborative machine learning iterative training until each private data in the private data source is traversed.
  • the get_next operation can be used to obtain the private data of the next unit, and the private data of the next unit can be obtained by using the get_next operation.
  • the non-data owner node calls the get_next function of the iterator, uses the target feature information to construct invalid privacy input data, and the data owner node and the non-data owner node are based on the invalid privacy input data.
  • the privacy input operator (PrivateInput operator) of the secure multi-party protocol converts the private data of the next unit of the target private data of the data owner into the target encrypted data type, and obtains the encrypted data of the next private data of the target private data, until Traverse each private data in the private data source.
  • the get_next function of the above Iterator is used to return the private data subset (record) of the next unit in the set, if there is no private data subset of the next unit (for example, if it is located at the end of the set ), throw an exception and return.
  • the method may further include: adding a unary privacy input operator at the end of the data preprocessing function, wherein the unary privacy input operator contains the data The identification data of the owner node, and the unary privacy input operator is used to convert the target privacy data of the data owner node into the target encrypted data type.
  • the above-mentioned unary privacy input operator can be the PrivateInput operator.
  • the PrivateInput operator is constructed in the same way as the standard TensorFlow operator.
  • the PrivateInput operator includes a data_owner to identify the data owner, which can be constructed by using the following python code PrivateInput operator:
  • a PrivateInput operator can be added at the end of the data preprocessing function of the map to convert the target private data of the data owner node into the target encrypted data type.
  • the PrivateInput operator The secure input interface corresponding to the secure multi-party computation protocol will be called to complete its function.
  • the process of jointly performing machine learning training or inference by multiple participants may be as shown in FIG. 3 .
  • multiple participants can be divided into data owner nodes and non-data owner nodes.
  • the data owner node can construct an input stream based on private data sources, and the private data input stream of the data owner node is associated with valid private data (that cannot be leaked). privacy source data).
  • the private data input stream constructed by the non-data owner node has no associated valid private data, and needs to be provided by get_next to provide fake data.
  • the data owner node can extract the feature information of the private data in the private data source and send it to the non-data owner node.
  • the data owner node can obtain valid privacy input data from the privacy data source through the get_next function of the iterator, and the corresponding non-data owner node can construct invalid privacy input data according to the feature information of the valid privacy input data.
  • the data owner node and the non-data owner node can respectively perform operations such as input data preprocessing transformation (batch, repeat, etc.), input data map preprocessing transformation, and the like.
  • the data owner node and the non-data owner node can convert the valid private input data after map preprocessing into the encrypted data type corresponding to the secure multi-party computation protocol based on the private input operator (PrivateInput operator) of the secure multi-party computation protocol, and obtain Encrypted data, which can be used as training data for collaborative machine learning training or inference.
  • Primary input operator PrincipalInput operator
  • an input stream can be constructed based on a private data source through a data owner node, and the constructed input stream can obtain target private data from the private data source. Since the non-data owner node cannot directly obtain the target private data, in order to obtain the same shape of the data obtained by each computing node, the data owner node can send the target feature information of the private data in the private data source to multiple non-data owner nodes. , the target feature information is used to represent the structure of the private data in the private data source, and the invalid private input data of each non-data owner node can be determined according to the target feature information.
  • the data owner node and the non-data owner node can coordinate the target based on the invalid private input data and the secret input operator of the secure multi-party computing protocol.
  • the private data is converted into the target encrypted data type, and the target encrypted data is obtained.
  • Multiple non-data owner nodes and data owner nodes can use target encrypted data for collaborative machine learning training, so that multiple participants can be jointly trained for encrypted machine learning on the premise of ensuring data privacy.
  • the embodiments of this specification also provide an apparatus for preparing training data for encrypted machine learning, such as the following embodiments. Since the principle of solving the problem of the training data preparation device for encrypted machine learning is similar to that of the training data preparation method for encrypted machine learning, the implementation of the training data preparation device for encrypted machine learning can refer to the implementation of the training data preparation method for encrypted machine learning. It is not repeated here.
  • the term "unit” or “module” may be a combination of software and/or hardware that implements a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated. FIG.
  • FIG. 4 is a structural block diagram of an apparatus for preparing training data for encrypted machine learning according to an embodiment of the present specification. As shown in FIG. 4 , it may include: an input stream construction module 401, an acquisition module 402, a transmission module 403, a determination module 404, a conversion module The structure of module 405 and training module 406 will be described below.
  • the input stream construction module 401 can be used for the data owner node to construct an input stream based on the private data source.
  • the obtaining module 402 can be used for the data owner node to obtain the target privacy data from the privacy data source by using the input stream.
  • the sending module 403 can be used by the data owner node to send the target feature information of the private data in the private data source to a plurality of non-data owner nodes; wherein the target feature information is used to represent the structure of the private data in the private data source.
  • the determination module 404 can be used for each non-data owner node to construct invalid privacy input data according to the target feature information.
  • the conversion module 405 can be used for the data owner node and the non-data owner node to convert the target private data into the target encrypted data type based on the invalid private input data and the secret input operator of the secure multi-party computation protocol to obtain the target encrypted data.
  • the training module 406 can be used for multiple non-data owner nodes and data owner nodes to perform collaborative machine learning training using target encrypted data and a secure multi-party computing protocol.
  • the above determination module 404 may include: a filling unit for each non-data owner to fill in 0 according to the target feature information; a processing unit for using 0 as the invalid privacy input data of each non-data owner .
  • the filling according to the target feature information may not be limited to filling 0, but may also be any other data value of the same type. can be determined according to the actual situation, which is not limited in the embodiments of this specification.
  • the above-mentioned apparatus for preparing training data for encrypted machine learning may further include: an obtaining unit, configured to call the data owner node to call the iterator's private data after calling the get_next function of the iterator for the first time to obtain the target privacy data.
  • the get_next function uses the input stream to obtain the next privacy data of the target privacy data from the privacy data source; a construction unit is used for the data owner node and each non-data owner node based on the secure multi-party
  • the privacy input operator of the protocol cooperatively constructs the data of the target encrypted data type, and obtains the encrypted data of the next privacy data of the target privacy data;
  • the iterative training unit is used for the plurality of non-data owner nodes and the The data owner node uses the encrypted data of the next private data of the target private data and the secure multi-party computing protocol to perform collaborative machine learning iterative training until traversing each private data in the private data source.
  • the above-mentioned training module 406 may include: a determining unit for using the target encrypted data as training data; a machine learning training unit for a plurality of non-data owner nodes and data owner nodes according to the training data Conduct collaborative machine learning training.
  • the embodiment of the present specification also provides an electronic device.
  • the electronic device may specifically include an input Device 51 , processor 52 , memory 53 .
  • the input device 51 may be used to input the address of the private data source.
  • the processor 52 can be specifically configured to construct an input stream based on the privacy data source according to the owner node; the data owner node obtains the target privacy data from the privacy data source by using the input stream; the data owner node converts the target feature of the privacy data in the privacy data source.
  • the information is sent to multiple non-data owner nodes; the target feature information is used to represent the structure of private data in the privacy data source; each non-data owner node constructs invalid privacy input data according to the target feature information; The data owner node converts the target private data into the target encrypted data type based on the invalid private input data and the secret input operator of the secure multi-party computation protocol, and obtains the target encrypted data; multiple non-data owner nodes and data owner nodes use the target encryption. Data and secure multi-party computation protocols for collaborative machine learning training.
  • the memory 53 can specifically be used to store parameters such as invalid privacy input data and target encrypted data.
  • the input device may specifically be one of the main devices for information exchange between the user and the computer system.
  • Input devices may include keyboards, mice, cameras, scanners, light pens, handwriting tablets, voice input devices, etc.; the input devices are used to input raw data and programs for processing these numbers into the computer.
  • the input device can also acquire and receive data transmitted by other modules, units and devices.
  • a processor may be implemented in any suitable manner.
  • a processor may take the form of, for example, a microprocessor or a processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor, logic gates, switches, application specific integrated circuits ( Application Specific Integrated Circuit, ASIC), programmable logic controller and embedded microcontroller form, etc.
  • the memory may specifically be a memory device used for storing information in modern information technology.
  • the memory can include multiple levels. In a digital system, as long as it can store binary data, it can be a memory; in an integrated circuit, a circuit with a storage function without physical form is also called a memory, such as RAM, FIFO, etc.; in the system In , the storage device with physical form is also called memory, such as memory stick, TF card, etc.
  • the embodiments of the present specification also provide a computer storage medium for a training data preparation method based on encrypted machine learning.
  • the computer storage medium stores computer program instructions, which can be implemented when the computer program instructions are executed: according to the owner node based on the
  • the private data source constructs an input stream; the data owner node uses the input stream to obtain the target private data from the private data source; the data owner node sends the target feature information of the private data in the private data source to multiple non-data owner nodes; , the target feature information is used to characterize the structure of the private data in the private data source; each non-data owner node constructs invalid privacy input data according to the target feature information; the data owner node and the non-data owner node are based on invalid privacy input data and secure multi-party data.
  • the secret input operator of the computing protocol converts the target private data into the target encrypted data type to obtain the target encrypted data; multiple non-data owner nodes and data owner nodes use the target encrypted data and the secure multi-party computing protocol to perform collaborative machine learning training
  • the above-mentioned storage medium includes but is not limited to random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), cache (Cache), hard disk (Hard Disk Drive, HDD) Or a memory card (Memory Card).
  • the memory may be used to store computer program instructions.
  • the network communication unit may be an interface for performing network connection communication, which is set according to a standard specified by a communication protocol.
  • each module or each step of the above-mentioned embodiments of the present specification can be implemented by a general-purpose computing device, and they can be centralized on a single computing device, or distributed in multiple computing devices. network, they can optionally be implemented with program code executable by a computing device, so that they can be stored in a storage device and executed by the computing device, and in some cases, can be different from the The illustrated or described steps are performed in order, either by fabricating them separately into individual integrated circuit modules, or by fabricating multiple modules or steps of them into a single integrated circuit module. As such, embodiments of this specification are not limited to any particular combination of hardware and software.
  • the embodiments of the present specification provide the operation steps of the method as described in the above-mentioned embodiments or flowcharts, more or less operation steps may be included in the method based on routine or without creative work. In steps that logically do not have a necessary causal relationship, the execution order of these steps is not limited to the execution order provided by the embodiments of the present specification.
  • the described method When the described method is executed in an actual device or terminal product, it can be executed sequentially or in parallel (for example, in a parallel processor or multi-threaded processing environment) according to the method shown in the embodiments or the accompanying drawings.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Storage Device Security (AREA)

Abstract

一种加密机器学习的训练数据准备方法、装置和设备,其中,该方法包括:数据拥有方节点基于隐私数据源构建输入流,利用输入流从隐私数据源中获取目标隐私数据,将隐私数据源中隐私数据的目标特征信息发送给非数据拥有方节点;非数据拥有方节点根据目标特征信息构造无效隐私输入数据;将目标隐私数据转换为目标加密数据类型得到目标加密数据;非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练。在训练过程中数据拥有方节点的目标隐私数据是加密的,非数据拥有方节点不能获取目标隐私数据值,从而可以在确保隐私性的前提下联合多个参与方进行机器学习训练。

Description

一种加密机器学习的训练数据准备方法、装置和设备 技术领域
本说明书实施例涉及机器学习技术领域,特别涉及一种加密机器学习的训练数据准备方法、装置和设备。
背景技术
在模型训练中只有通过足够多的数据或多样化的数据,才能训练出一个相对较好的模型使用。由于竞争优势、隐私顾虑、规章制度以及关于数据主权和管辖权等问题许多组织无法公开分享它们的数据,因此,随着数据量的增大、数据多样化的增加,以及随着数据隐私的关注程度越来越高,如何在保护数据隐私的前提下,把各不同源、不同企业之间的数据融合运用十分关键。通过安全多方计算(Secure Multi-Party Computation,MPC)进行的隐私保护机器学习提供了一项有前景的解决方案,可以让不同实体在它们的联合数据上训练各种模型,而不会泄露后果无法承担的任何信息。
目前基于TensorFlow和安全多方计算协议构建的隐私计算器学习平台包括:LatticeX-Rosetta,安全多方计算协议要求计算参与方执行同样的计算操作,使用秘密分享值或加密值进行计算,上述LatticeX-Rosetta对于已经进行了秘密分享或加密的值,可以直接使用tf.data的API(接口)构建输入流,使得计算参与方都能够不断从数据源获取输入数据,提供给隐私机器学习训练。但是在数据为原始值(隐私数据)状态下,只有隐私数据拥有方才具备数据源(隐私数据),非数据拥有方没有数据源和数据结构信息,因此无法构建输入流,也无法获取到合法的元素,从而使得在计算过程中非数据拥有方无法按照多方安全计算协议的要求利用加密数据进行迭代训练。由此可见,采用现有技术中的隐私计算器学习平台需要先对隐私数据源中的隐私数据进行秘密分享或者加密,效率较低。
针对上述问题,目前尚未提出有效的解决方案。
发明内容
本说明书实施例提供了一种加密机器学习的训练数据准备方法、装置和设备,以解决现有技术中的隐私计算器学习平台需要先对隐私数据源中的隐私数据进行秘密分享或者加密,效率较低的问题。
本说明书实施例提供了一种加密机器学习的训练数据准备方法,包括:数据拥有方节点基于隐私数据源构建输入流;所述数据拥有方节点利用所述输入流从所述隐私数据源中获取目标隐私数据;所述数据拥有方节点将所述隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,所述目标特征信息用于表征所述隐私数据源中隐私数据的结构;各个非数据拥有方节点根据所述目标特征信息构造无效隐私输入数据;所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据;所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练。
本说明书实施例还提供了一种加密机器学习的训练数据准备装置,包括:输入流构建模块,用于数据拥有方节点基于隐私数据源构建输入流;获取模块,用于所述数据拥有方节点利用所述输入流从所述隐私数据源中获取目标隐私数据;发送模块,用于所述数据拥有方节点将所述隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,所述目标特征信息用于表征所述隐私数据源中隐私数据的结构;确定模块,用于各个非数据拥有方节点根据所述目标特征信息构造无效隐私输入数据;转换模块,用于所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据;训练模块,用于所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练。
本说明书实施例还提供了一种加密机器学习的训练数据准备设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现所述加密机器学习的训练数据准备方法的步骤。
本说明书实施例还提供了一种计算机可读存储介质,其上存储有计算机指令,所述指令被执行时实现所述加密机器学习的训练数据准备方法的步骤。
本说明书实施例提供了一种加密机器学习的训练数据准备方法,可以通过数据拥有方节点基于隐私数据源构建输入流,并构建的输入流从隐私数据源中获取目标隐私 数据。由于非数据拥有方节点无法直接获取目标隐私数据,为了各个计算节点获取得数据形状一致,数据拥有方节点可以将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点,其中,目标特征信息用于表征隐私数据源中隐私数据的结构,并且可以根据目标特征信息确定出各个非数据拥有方节点的无效隐私输入数据。进一步的,由于隐私数据为明文数据,因此,为了确保目标隐私数据不被泄露,数据拥有方节点和非数据拥有方节点可以基于无效隐私输入数据和安全多方计算协议的秘密输入算子协同将目标隐私数据转换为目标加密数据类型,得到目标加密数据。多个非数据拥有方节点和数据拥有方节点可以利用目标加密数据进行协同机器学习训练,从而可以在确保数据的隐私性的前提下联合多个参与方进行加密机器学习训练。
附图说明
此处所说明的附图用来提供对本说明书实施例的进一步理解,构成本说明书实施例的一部分,并不构成对本说明书实施例的限定。在附图中:
图1是根据本说明书实施例提供的加密机器学习的训练数据准备方法的步骤示意图;
图2是根据本说明书实施例提供的多个参与方之间的交互执行关系的示意图;
图3是根据本说明书实施例提供的多个参与方联合进行机器学习训练或推理的流程的示意图;
图4是根据本说明书实施例提供的加密机器学习的训练数据准备装置的结构示意图;
图5是根据本说明书实施例提供的加密机器学习的训练数据准备设备的结构示意图。
具体实施方式
下面将参考若干示例性实施方式来描述本说明书实施例的原理和精神。应当理解,给出这些实施方式仅仅是为了使本领域技术人员能够更好地理解进而实现本说明书实施例,而并非以任何方式限制本说明书实施例的范围。相反,提供这些实施方式是为了使本说明书实施例公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
本领域的技术人员知道,本说明书实施例的实施方式可以实现为一种系统、装置设备、方法或计算机程序产品。因此,本说明书实施例公开可以具体实现为以下形式,即:完全的硬件、完全的软件(包括固件、驻留软件、微代码等),或者硬件和软件结合的形式。
虽然下文描述流程包括以特定顺序出现的多个操作,但是应该清楚了解,这些过程可以包括更多或更少的操作,这些操作可以顺序执行或并行执行(例如使用并行处理器或多线程环境)。
请参阅图1,本实施方式可以提供一种加密机器学习的训练数据准备方法。该加密机器学习的训练数据准备方法可以用于在确保数据的隐私性的前提下联合多个参与方进行加密机器学习的训练数据准备。上述加密机器学习的训练数据准备方法可以包括以下步骤。
S101:数据拥有方节点基于隐私数据源构建输入流。
在本实施方式中,在LatticeX-Rosetta框架中由于加密机器学习训练的过程涉及多个参与方,而用于训练的数据可能是由多个参与方分别提供,出于竞争优势、隐私顾虑、规章制度以及关于数据主权和管辖权等问题各个参与方无法公开分享它们的数据,因此,数据具有拥有者关系属性,多个参与方可以分为数据拥有方和非数据拥有方。
在本实施方式中,流是个抽象的概念,是对输入输出设备的抽象,输入流可以看作一个输入通道,外部传入数据给程序需要借助输入流。由于数据拥有方节点可以直接使用tf.data的API构建输入流,因此,数据拥有方节点可以基于隐私数据源构建输入流。上述tf.data是一个用于数据读取管道搭建的API(Application Programming Interface,应用程序接口),使用tf.data的API构建数据管道,主要依靠两个API:tf.data.Dataset和tf.data.Iterator,tf.data.Dataset用于读入数据,做预处理,而读取数据则依赖于tf.data.Iterator接口。
在本实施方式中,上述Dataset是不依赖于数据库的独立数据集合,即使断开数据链路,或者关闭数据库,Dataset依然是可用的。上述Iterator(迭代器)用于提供一种方法顺序访问一个聚合对象中各个元素,而又不需暴露该对象的内部表示。Iterator模式是运用于聚合对象的一种模式,通过运用该模式可以在不知道对象内部表示的情况下,按照一定顺序(由Iterator提供的方法)访问聚合对象中的各个元素。
在本实施方式中,上述LatticeX-Rosetta是基于TensorFlow和安全多方计算协议构建的隐私计算器学习平台,在隐私机器学习训练中其计算节点都是相对独立的,而非TensorFlow分布式集群的计算节点,每个参与的节点都执行完全一样的计算图和计算逻辑。由于数据具有拥有者关系属性,对于已经进行了秘密分享或加密的值,可以直接使用tf.data的API(接口)构建输入流,使得计算参与方都能够不断从数据源获取输入数据,提供给隐私机器学习训练。
但是在数据为原始值(隐私数据)状态下,对于拥有数据的计算节点,它仍可以使用tf.data的API(接口)完整地执行构建输入流,也能够在构建好的输入流上通过调用get_next函数源源不断读取下一批量的隐私数据,而非拥有者计算节点没有数据源和数据结构信息,从而无法构建输入流。因此,为了复用tf.data的API构建隐私数据管道,并且最小化改动tf.data的API的使用可以设计一个tf.data的隐私数据源Dataset对象,用来处理外部设备(磁盘、网络IO、数据库等)中隐私数据的读取。其中,上述get_next函数用于返回集合中的下一个对象,如果不存在下一个对象(例如,如果已定位在集合的末尾),则返回抛出异常,上述网络IO为网络输入/输出。
在本实施方式中,为了保证隐私计算中,各个计算节点从隐私数据管道获取得数据形状一致,在构建输入流时,可以定义数据拥有方节点除了完成输入流构建,还需要将读取的隐私数据的特征信息发送给非数据拥有方,以使非数据拥有方可以获取隐私数据的结构信息,从而可以完成多方计算的数据管道搭建。
在本实施方式中,上述TensorFlow是将复杂的数据结构传输至人工智能神经网中进行分析和处理过程的系统,Tensor(张量)意味着N维数组,Flow(流)意味着基于数据流图的计算,TensorFlow为张量从流图的一端流动到另一端计算过程,执行过程实际上是从输入到计算图,然后由计算图执行计算节点得到输出的过程,计算图由一系列的节点(算子)组成,因此,隐私数据源Dataset对象在执行过程中实际是运行TensorFlow算子。
在本实施方式中,数据拥有方节点所拥有的隐私数据可以存储在对应的外部设备(磁盘、网络IO、数据库等)中,上述原始隐私数据为数据拥有方可获取的明文数据,存储在外部设备中的隐私数据的集合为隐私数据源。
S102:数据拥有方节点利用输入流从隐私数据源中获取目标隐私数据。
在本实施方式中,由于隐私数据源中可能存储有多个隐私数据,而一次机器学习 训练中不会全部用到,因此,数据拥有方节点可以利用构建好的输入流从隐私数据源中获取目标隐私数据。
在本实施方式中,数据拥有方节点可以利用Iterator迭代器的get_next操作从隐私数据源中获取目标隐私数据,上述目标隐私数据可以是从隐私数据源中随机获取或者按照预设顺序获取的预设单位的数据。其中,上述预设顺序可以是Iterator迭代器中定义的,上述预设单位可以为32个元素、一行等,具体的可以根据实际情况确定,本说明书对此不作限定。
在一个实施方式中,为了标识数据拥有方节点(data_owner),上述目标隐私数据中可以包含数据拥有方节点的标识数据,上述标识数据可以为数据拥有方节点的标识,例如:数据拥有方所属的计算节点为P0,对应的数据拥有方节点的标识数据可以为P0。当然,标识数据不限于上述举例,所属领域技术人员在本说明书实施例技术精髓的启示下,还可能做出其它变更,但只要其实现的功能和效果与本说明书实施例相同或相似,均应涵盖于本说明书实施例保护范围内。
S103:数据拥有方节点将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,目标特征信息用于表征隐私数据源中隐私数据的结构。
在本实施方式中,为了保证隐私计算中,各个计算节点从隐私数据管道获取得数据形状一致,数据拥有方节点可以在第一次利用get_next函数从隐私数据源中获取得到目标隐私数据时,获取本次训练所需的隐私数据源中隐私数据的目标特征信息,并将目标特征信息发送给多个非数据拥有方,以使非数据拥有方可以获取隐私数据源中隐私数据的结构信息。
在本实施方式中,上述目标特征信息可以用于表征隐私数据源中隐私数据的结构,上述目标特征信息可以用于表征隐私数据源中隐私数据的结构。在一些实施例中,上述目标特征信息可以为隐私数据源中隐私数据的头部信息,上述头部信息可以包括:行数、列数和分隔符等头部概要信息。当然可以理解的是上述头部信息中还可以包含其他可能的数据,具体的可以根据实际情况确定,本说明书实施例对此不作限定。
S104:各个非数据拥有方节点根据目标特征信息构造无效隐私输入数据。
在本实施方式中,根据安全多方计算协议中数据的特点,各个非数据拥有方节点可以根据已经获取的目标特征信息确定出无效隐私输入数据,上述无效隐私输入数据只为了保证TensorFlow执行能够正确进行,是无效的输入数据,不会用于进行安全 多方计算。
在本实施方式中,安全多方计算是指在无可信第三方的情况下,多个参与方协同计算一个约定的函数,并且保证每一方仅获取自己的计算结果,无法通过计算过程中的交互数据推测出其他任意一方的输入和输出数据。秘密分享(SS,Secret Sharing)是指将数据拆散成多个无意义的数,并将这些数分发到多个参与方那里。每个参与方拿到的都是原始数据的一部分,一个或少数几个参与方无法还原出原始数据,只有把每个参与方各自的数据凑在一起时才能还原出真实数据。
在一个实施方式中,可以采用秘密分享方式来进行秘密分发,各个非数据拥有方可以分别根据目标特征信息填充为0(即没有数值贡献),并将0作为各个非数据拥有方的无效隐私输入数据。因此所有的非数据拥有方节点的数据字段取值为0(即伪造的fake数据),数据拥有方节点拥有实际的隐私数值,这样既达到了秘密共享方式加密的目的,又保证数据管道能够正确执行。
在本实施方式中,由于上述非数据拥有方节点的无效隐私输入数据为无效数据,因此,根据目标特征信息进行填充时可以不限定于填充0,还可以是其他的任意同类型数据值,具体的可以根据实际情况确定,本说明书实施例对此不作限定。
在一个场景示例中,多个参与方之间的交互执行关系如图2中所示,假设共有三个参与方:P0、P1和P2,其中,P0为数据拥有方,目标隐私数据值为a,则其秘密分享值仍为a;非数据拥有方P1和P2的秘密分享值按照数据拥有方P0广播的目标特征信息填充为0(fake数据)。由于P0拥有数据,其仍然保留其原始数据值a,而P1和P2并不具备该数据,可以直接置为0(即没有数值贡献),这样不拥有数据的参与方P1和P2的数据值加起来仍为0,所有参与方P0、P1和P2的数据值加起来仍为a,从而可以达到秘密共享的目的,参与方P0、P1和P2可以利用秘密分享值联合进行计算。
S105:数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据。
在本实施方式中,由于加密机器学习训练需要使用加密数据,因此,数据拥有方节点可以联合非数据拥有方节点的无效隐私输入数据将数据拥有方节点的目标隐私数据转换为目标加密数据类型,从而得到目标加密数据。
在本实施方式中,安全多方计算可以利用加法秘密共享或不经意传输(Oblivious Transfer)和混淆电路(Garble Circle)等技术,本说明不限定具体技术。
在本实施方式中,由于机器学习训练基于Latticex-Rosetta框架实现的,在Latticex-Rosetta框架中,由于其支持多中密码学协议和自定义类型,上述目标加密数据类型可以为Latticex-Rosetta框架对应的目标安全多方计算协议的加密数据类型。外部输入的数据需要转换为框架对应协议的加密数据类型,因此,可以在数据预处理函数尾部添加一个PrivateInput算子,用于将数据拥有方节点的目标隐私数据转换为目标加密数据类型,对于不同的安全多方计算协议,PrivateInput算子将调用安全多方计算协议对应的秘密输入PrivateInput接口来完成其功能。
在一个实施方式中,在文本解码decode最后的输出可以调用PrivateInput算子将隐私数据转换为安全多方计算协议对应的加密数据类型,PrivateInput算子包含一个data_owner用于标识数据拥有方。tf.data提供了map接口,map接口可以添加转换函数对数据管道输出的灵活多样的预处理操作,可以使用map将csv文件(即文本行文件,字段分隔符为“,”)的每行解析并转换为tf.float32向量,接着添加batch批量化转换,每次执行到batch都会获取下一个新的32个元素,最终可以通过迭代器Iterator的get_next提供数据管道的能力。其中,map接口储存一组成对的键-值对象,提供key(键)到value(值)的映射,map中的key不要求有序,不允许重复。上述tf.float可以用于指定数据的类型,batch(批处理)通常被认为构建一个数据批量集合。
在本实施方式中,可以设计一个PrivateTextLineDataset类用来实现隐私源数据Dataset算子的功能,其中,PrivateTextLineDataset的构造函数可以添加属性字段data_owner标识数据拥有方。
在本实施方式中,可以通过最小化改动tf.data API的使用,并配合用户层的极小改动来实现隐私数据管道。参照上述隐私数据管道的实现方式,其他类型的源Dataset(例如:tf.data.TFRecordDataset、tf.data.FixedLengthRecordDataset或第三方贡献的Dataset)都可以使用这种方式实现隐私数据管道,从而有效提高了隐私数据管道实现的易用性和通用性,并且能够有效降低隐私机器学习的门槛。
在一个实施方式中,数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据,可以包括:根据各个非数据拥有方节点的无效隐私输入数据与数据拥 有方节点的目标隐私数据确定有效隐私数据;其中,有效隐私数据为目标隐私数据。进一步的,可以基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据。
在本实施方式中,由于各个非数据拥有方节点拥有的是无效隐私输入数据,由于无效隐私输入数据是伪造的fake数据,并非是其正真的数据,实际上各个非数据拥有方节点并没有数据。因此,在利用安全多方计算协议的秘密输入算子进行加密时,可以先确定多个参与方的有效隐私数据,上述有效隐私数据即为数据拥有方的目标隐私数据,从而可以联合数据拥有方节点和非数据拥有方节点利用安全多方计算协议将目标隐私数据转换为目标加密数据类型,从而可以得到目标加密数据。
S106:多个非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练。
在本实施方式中,由于采用了秘密分享,秘密分享是指将数据拆散成多个无意义的数,并将这些数分发到多个参与方那里每个参与方拿到的都是原始数据的一部分,一个或少数几个参与方无法还原出原始数据,只有把每个参与方各自的数据凑在一起时才能还原出真实数据,同时秘密共享可以认为是数据的目标加密方式。因此,多个非数据拥有方节点和数据拥有方节点可以利用目标加密数据进行协同机器学习训练。
在一个实施方式中,多个非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练,可以包括:将目标加密数据作为训练数据,多个非数据拥有方节点和数据拥有方节点根据训练数据进行协同机器学习训练。
在本实施方式中,可以将目标加密数据作为训练数据,由于该训练数据为加密数据,因此,在训练过程中非数据拥有方并不能获取数据拥有方的明文数据,从而可以在确保数据的隐私性的前提下联合多个参与方进行隐私机器学习训练。当然,训练数据确定的方式不限于上述举例,所属领域技术人员在本说明书实施例技术精髓的启示下,还可能做出其它变更,但只要其实现的功能和效果与本说明书实施例相同或相似,均应涵盖于本说明书实施例保护范围内。
在一个实施方式中,可以按照TensorFlow定义Dataset算子的方式,设计一个PrivateTextLineDatasetOp类用来实现隐私源数据Dataset算子的功能,隐私源数据Dataset算子的功能主要包括:构建输入流、提供get_next操作用来获取下一个单位的隐私数据。在用户层可以创建一个PrivateTextLineDatasetV1类,导出别名为 PrivateTextLineDataset,其构造函数添加属性字段data_owner标识数据拥有方,用户只需要使用下述python代码即可实现隐私源数据Dataset算子的创建。
import latticex.rosetta as rtt
dataset=rtt.PrivateTextLineDataset(data_file,data_owner=0)
在本实施方式中,PrivateTextLineDatasetV1类为python层类,给用户代码直接使用;PrivateTextLineDatasetOp类为c++底层类,由TensorFlow引擎在执行计算图的时候调用。
在本实施方式中,由于Dataset对象和隐私数据源的联机发生的很短暂,在取得数据后就立即和数据源断开了,等到下一次要操作数据源内的数据时才会再建立连接。因此,利用本实施方式中构建的Dataset提供数据流水,相比较一次性将隐私数据读入内存的方式更加优越,因为数据是在计算图执行过程中动态执行的,可以每次按需获取数据,有效降低了训练所需的内存。
在一个实施方式中,在多个非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练之后,还可以包括:在第一次调用迭代器的get_next函数获取目标隐私数据后,数据拥有方节点可以调用迭代器的get_next函数,利用所述输入流从所述隐私数据源中获取所述目标隐私数据的下一个隐私数据。进一步的,数据拥有方节点和各个非数据拥有方节点可以基于安全多方协议的隐私输入算子协同构造目标加密数据类型的数据,得到目标隐私数据的下一个隐私数据的加密数据,并利用目标隐私数据的下一个隐私数据的加密数据和安全多方计算协议进行协同机器学习迭代训练,直至遍历隐私数据源中的每个隐私数据。
在本实施方式中,由于每次均只获取预设单位的数据,因此,为了实现机器学习的迭代训练,可以利用get_next操作用来获取下一个单位的隐私数据,并对下一单位的隐私数据重复步骤102-步骤106中的操作,非数据拥有方节点调用迭代器的get_next函数,利用所述目标特征信息构造无效隐私输入数据,将数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方协议的隐私输入算子(PrivateInput算子)将数据拥有方的目标隐私数据的下一个单位的隐私数据转换为目标加密数据类型,得到目标隐私数据的下一个隐私数据的加密数据,直至遍历隐私数据源中的每个隐私数据。
在本实施方式中,上述Iterator的get_next函数用于返回集合中的下一个单位的 隐私数据子集合(记录),如果不存在下一个单位的隐私数据子集合(例如,如果已定位在集合的末尾),则抛出异常返回。
在一个实施方式中,在将数据拥有方节点的目标隐私数据转换为目标加密数据类型之前,还可以包括:在数据预处理函数尾部添加一元隐私输入算子,其中,一元隐私输入算子包含数据拥有方节点的标识数据,一元隐私输入算子用于将数据拥有方节点的目标隐私数据转换为目标加密数据类型。
在本实施方式中,上述一元隐私输入算子可以为PrivateInput算子,PrivateInput算子构建的方式与标准的TensorFlow算子一致,PrivateInput算子包含一个data_owner标识数据拥有方,可以采用下述python代码构建PrivateInput算子:
import latticex.rosetta as rtt
dataset=rtt.PrivateInput(inputs,data_owner=0)
在本实施方式中,可以在map的数据预处理函数尾部添加一个PrivateInput算子,用于将数据拥有方节点的目标隐私数据转换为目标加密数据类型,对于不同的安全多方计算协议,PrivateInput算子将调用安全多方计算协议对应的安全输入接口来完成其功能。
在一个场景示例中,多个参与方联合进行机器学习训练或推理的流程可以如图3中所示。其中,多个参与方可以分为数据拥有方节点和非数据拥有方节点,数据拥有方节点可以基于隐私数据源构建输入流,数据拥有方节点的隐私数据输入流关联有效隐私数据(不能泄露的隐私源数据)。非数据拥有方节点构建的隐私数据输入流无关联有效隐私数据,需要由get_next提供fake数据。
进一步的,数据拥有方节点可以提取隐私数据源中隐私数据的特征信息并发送给非数据拥有方节点。数据拥有方节点可以通过迭代器的get_next函数从隐私数据源中获取有效隐私输入数据,对应的非数据拥有方节点可以根据有效隐私输入数据的特征信息构造无效隐私输入数据。进一步的,数据拥有方节点和非数据拥有方节点可以分别执行输入数据预处理转换(batch、repeat等)、输入数据map预处理转换等操作。并且数据拥有方节点和非数据拥有方节点可以基于安全多方协议的隐私输入算子(PrivateInput算子)将map预处理转换后的有效隐私输入数据转换为安全多方计算协议对应的加密数据类型,得到加密数据,可以将该加密数据作为训练数据进行协同机器学习训练或推理。
从以上的描述中,可以看出,本说明书实施例实现了如下技术效果:可以通过数据拥有方节点基于隐私数据源构建输入流,并构建的输入流从隐私数据源中获取目标隐私数据。由于非数据拥有方节点无法直接获取目标隐私数据,为了各个计算节点获取得数据形状一致,数据拥有方节点可以将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点,其中,目标特征信息用于表征隐私数据源中隐私数据的结构,并且可以根据目标特征信息确定出各个非数据拥有方节点的无效隐私输入数据。进一步的,由于隐私数据为明文数据,因此,为了确保目标隐私数据不被泄露,数据拥有方节点和非数据拥有方节点可以基于无效隐私输入数据和安全多方计算协议的秘密输入算子协同将目标隐私数据转换为目标加密数据类型,得到目标加密数据。多个非数据拥有方节点和数据拥有方节点可以利用目标加密数据进行协同机器学习训练,从而可以在确保数据的隐私性的前提下联合多个参与方进行加密机器学习训练。
基于同一发明构思,本说明书实施例中还提供了一种加密机器学习的训练数据准备装置,如下面的实施例。由于加密机器学习的训练数据准备装置解决问题的原理与加密机器学习的训练数据准备方法相似,因此加密机器学习的训练数据准备装置的实施可以参见加密机器学习的训练数据准备方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。图4是本说明书实施例的加密机器学习的训练数据准备装置的一种结构框图,如图4所示,可以包括:输入流构建模块401、获取模块402、发送模块403、确定模块404、转换模块405和训练模块406,下面对该结构进行说明。
输入流构建模块401,可以用于数据拥有方节点基于隐私数据源构建输入流。
获取模块402,可以用于数据拥有方节点利用输入流从隐私数据源中获取目标隐私数据。
发送模块403,可以用于数据拥有方节点将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,目标特征信息用于表征隐私数据源中隐私数据的结构。
确定模块404,可以用于各个非数据拥有方节点根据目标特征信息构造无效隐私 输入数据。
转换模块405,可以用于数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据。
训练模块406,可以用于多个非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练。
在一个实施方式中,上述确定模块404可以包括:填充单元,用于各个非数据拥有方分别根据目标特征信息填充为0;处理单元,用于将0作为各个非数据拥有方的无效隐私输入数据。
在本实施方式中,由于上述非数据拥有方节点的无效隐私输入数据为无效数据,因此,根据目标特征信息进行填充时可以不限定于填充0,还可以是其他的任意同类型数据值,具体的可以根据实际情况确定,本说明书实施例对此不作限定。
在一个实施方式中,上述加密机器学习的训练数据准备装置还可以包括:获取单元,用于在第一次调用迭代器的get_next函数获取目标隐私数据后,所述数据拥有方节点调用迭代器的get_next函数,利用所述输入流从所述隐私数据源中获取所述目标隐私数据的下一个隐私数据;构造单元,用于所述数据拥有方节点和各个非数据拥有方节点基于所述安全多方协议的隐私输入算子协同构造所述目标加密数据类型的数据,得到所述目标隐私数据的下一个隐私数据的加密数据;迭代训练单元,用于所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标隐私数据的下一个隐私数据的加密数据和安全多方计算协议进行协同机器学习迭代训练,直至遍历所述隐私数据源中的每个隐私数据。
在一个实施方式中,上述训练模块406可以包括:确定单元,用于将所述目标加密数据作为训练数据;机器学习训练单元,用于多个非数据拥有方节点和数据拥有方节点根据训练数据进行协同机器学习训练。
本说明书实施例实施方式还提供了一种电子设备,具体可以参阅图5所示的基于本说明书实施例提供的加密机器学习的训练数据准备方法的电子设备组成结构示意图,电子设备具体可以包括输入设备51、处理器52、存储器53。其中,输入设备51具体可以用于输入隐私数据源的地址。处理器52具体可以用于据拥有方节点基于隐私数据源构建输入流;数据拥有方节点利用输入流从隐私数据源中获取目标隐私数 据;数据拥有方节点将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,目标特征信息用于表征隐私数据源中隐私数据的结构;各个非数据拥有方节点根据目标特征信息构造无效隐私输入数据;数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据;多个非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练。存储器53具体可以用于存储无效隐私输入数据、目标加密数据等参数。
在本实施方式中,输入设备具体可以是用户和计算机系统之间进行信息交换的主要装置之一。输入设备可以包括键盘、鼠标、摄像头、扫描仪、光笔、手写输入板、语音输入装置等;输入设备用于把原始数据和处理这些数的程序输入到计算机中。输入设备还可以获取接收其他模块、单元、设备传输过来的数据。处理器可以按任何适当的方式实现。例如,处理器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式等等。存储器具体可以是现代信息技术中用于保存信息的记忆设备。存储器可以包括多个层次,在数字系统中,只要能保存二进制数据的都可以是存储器;在集成电路中,一个没有实物形式的具有存储功能的电路也叫存储器,如RAM、FIFO等;在系统中,具有实物形式的存储设备也叫存储器,如内存条、TF卡等。
在本实施方式中,该电子设备具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
本说明书实施例实施方式中还提供了一种基于加密机器学习的训练数据准备方法的计算机存储介质,计算机存储介质存储有计算机程序指令,在计算机程序指令被执行时可以实现:据拥有方节点基于隐私数据源构建输入流;数据拥有方节点利用输入流从隐私数据源中获取目标隐私数据;数据拥有方节点将隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,目标特征信息用于表征隐私数据源中隐私数据的结构;各个非数据拥有方节点根据目标特征信息构造无效隐私输入数据;数据拥有方节点和非数据拥有方节点基于无效隐私输入数据和安全多方计算协议的秘密输入算子将目标隐私数据转换为目标加密数据类型,得到目标加密数据;多个 非数据拥有方节点和数据拥有方节点利用目标加密数据和安全多方计算协议进行协同机器学习训练。
在本实施方式中,上述存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、缓存(Cache)、硬盘(Hard Disk Drive,HDD)或者存储卡(Memory Card)。所述存储器可以用于存储计算机程序指令。网络通信单元可以是依照通信协议规定的标准设置的,用于进行网络连接通信的接口。
在本实施方式中,该计算机存储介质存储的程序指令具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
显然,本领域的技术人员应该明白,上述的本说明书实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本说明书实施例不限制于任何特定的硬件和软件结合。
虽然本说明书实施例提供了如上述实施例或流程图所述的方法操作步骤,但基于常规或者无需创造性的劳动在所述方法中可以包括更多或者更少的操作步骤。在逻辑性上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本说明书实施例提供的执行顺序。所述的方法的在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。
应该理解,以上描述是为了进行图示说明而不是为了进行限制。通过阅读上述描述,在所提供的示例之外的许多实施方式和许多应用对本领域技术人员来说都将是显而易见的。因此,本说明书实施例的范围不应该参照上述描述来确定,而是应该参照前述权利要求以及这些权利要求所拥有的等价物的全部范围来确定。
以上所述仅为本说明书实施例的优选实施例而已,并不用于限制本说明书实施例,对于本领域的技术人员来说,本说明书实施例可以有各种更改和变化。凡在本说明书实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本说明书实施例的保护范围之内。

Claims (15)

  1. 一种加密机器学习的训练数据准备方法,其特征在于,包括:
    数据拥有方节点基于隐私数据源构建输入流;
    所述数据拥有方节点利用所述输入流从所述隐私数据源中获取目标隐私数据;
    所述数据拥有方节点将所述隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,所述目标特征信息用于表征所述隐私数据源中隐私数据的结构;
    各个非数据拥有方节点根据所述目标特征信息构造无效隐私输入数据;
    所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据;
    所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练。
  2. 根据权利要求1所述的方法,其特征在于,所述目标特征信息为所述目标隐私数据的头部信息,所述目标特征信息包括:所述目标隐私数据的行数、列数和分隔符。
  3. 根据权利要求1所述的方法,其特征在于,在所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练之后,还包括:
    在第一次调用迭代器的get_next函数获取目标隐私数据后,所述数据拥有方节点调用迭代器的get_next函数,利用所述输入流从所述隐私数据源中获取所述目标隐私数据的下一个隐私数据;
    所述数据拥有方节点和各个非数据拥有方节点基于所述安全多方协议的隐私输入算子协同构造所述目标加密数据类型的数据,得到所述目标隐私数据的下一个隐私数据的加密数据;
    所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标隐私数据的下一个隐私数据的加密数据和安全多方计算协议进行协同机器学习迭代训练,直至遍历所述隐私数据源中的每个隐私数据。
  4. 根据权利要求3所述的方法,其特征在于,所述数据拥有方节点和各个非数 据拥有方节点基于所述安全多方协议的隐私输入算子协同构造所述目标加密数据类型的数据,包括:
    所述非数据拥有方节点调用迭代器的get_next函数,利用所述目标特征信息构造无效隐私输入数据;
    所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和所述安全多方协议的隐私输入算子,将所述目标隐私数据的下一个隐私数据转换为目标加密数据类型。
  5. 根据权利要求1所述的方法,其特征在于,所述数据拥有方节点利用所述输入流从所述隐私数据源中获取目标隐私数据,包括:所述数据拥有方节点利用所述输入流根据迭代器定义的预设顺序从所述隐私数据源中获取目标隐私数据。
  6. 根据权利要求5所述的方法,其特征在于,所述目标隐私数据为预设单位的隐私数据。
  7. 根据权利要求1所述的方法,其特征在于,所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练,包括:
    将所述目标加密数据作为训练数据;
    所述多个非数据拥有方节点和所述数据拥有方节点分别根据所述训练数据进行协同机器学习训练。
  8. 根据权利要求7所述的方法,其特征在于,各个非数据拥有方节点分别根据所述目标特征信息构造无效隐私输入数据,包括:
    所述各个非数据拥有方分别根据所述目标特征信息填充为0;
    将0作为所述各个非数据拥有方的无效隐私输入数据。
  9. 根据权利要求8所述的方法,其特征在于,所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据,包括:
    根据所述各个非数据拥有方节点的无效隐私输入数据与所述数据拥有方节点的目标隐私数据确定有效隐私数据;其中,所述有效隐私数据为所述目标隐私数据;
    基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据。
  10. 根据权利要求1所述的方法,其特征在于,机器学习训练基于Rosetta框架实现,所述目标加密数据类型为Rosetta框架对应的目标安全多方计算协议的加密数据类型。
  11. 根据权利要求1所述的方法,其特征在于,在将所述数据拥有方节点的所述目标隐私数据转换为目标加密数据类型之前,还包括:在数据预处理函数尾部添加一元隐私输入算子,其中,所述一元隐私输入算子包含所述数据拥有方节点的标识数据,所述一元隐私输入算子用于将所述数据拥有方节点的目标隐私数据转换为目标加密数据类型。
  12. 根据权利要求1所述的方法,其特征在于,所述目标隐私数据在构建时指定所述数据拥有方节点的标识数据。
  13. 一种加密机器学习的训练数据准备装置,其特征在于,包括:
    输入流构建模块,用于数据拥有方节点基于隐私数据源构建输入流;
    获取模块,用于所述数据拥有方节点利用所述输入流从所述隐私数据源中获取目标隐私数据;
    发送模块,用于所述数据拥有方节点将所述隐私数据源中隐私数据的目标特征信息发送给多个非数据拥有方节点;其中,所述目标特征信息用于表征所述隐私数据源中隐私数据的结构;
    确定模块,用于各个非数据拥有方节点根据所述目标特征信息构造无效隐私输入数据;
    转换模块,用于所述数据拥有方节点和非数据拥有方节点基于所述无效隐私输入数据和安全多方计算协议的秘密输入算子将所述目标隐私数据转换为目标加密数据类型,得到目标加密数据;
    训练模块,用于所述多个非数据拥有方节点和所述数据拥有方节点利用所述目标加密数据和安全多方计算协议进行协同机器学习训练。
  14. 一种加密机器学习的训练数据准备设备,其特征在于,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现权利要求1至12中任一项所述方法的步骤。
  15. 一种计算机可读存储介质,其特征在于,其上存储有计算机指令,所述指令被执行时实现权利要求1至12中任一项所述方法的步骤。
PCT/CN2020/131449 2020-11-25 2020-11-25 一种加密机器学习的训练数据准备方法、装置和设备 WO2022109861A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/131449 WO2022109861A1 (zh) 2020-11-25 2020-11-25 一种加密机器学习的训练数据准备方法、装置和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/131449 WO2022109861A1 (zh) 2020-11-25 2020-11-25 一种加密机器学习的训练数据准备方法、装置和设备

Publications (1)

Publication Number Publication Date
WO2022109861A1 true WO2022109861A1 (zh) 2022-06-02

Family

ID=81755001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/131449 WO2022109861A1 (zh) 2020-11-25 2020-11-25 一种加密机器学习的训练数据准备方法、装置和设备

Country Status (1)

Country Link
WO (1) WO2022109861A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684855A (zh) * 2018-12-17 2019-04-26 电子科技大学 一种基于隐私保护技术的联合深度学习训练方法
CN111079939A (zh) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 基于数据隐私保护的机器学习模型特征筛选方法及装置
US20200242466A1 (en) * 2017-03-22 2020-07-30 Visa International Service Association Privacy-preserving machine learning
CN111783124A (zh) * 2020-07-07 2020-10-16 矩阵元技术(深圳)有限公司 基于隐私保护的数据处理方法、装置和服务器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242466A1 (en) * 2017-03-22 2020-07-30 Visa International Service Association Privacy-preserving machine learning
CN109684855A (zh) * 2018-12-17 2019-04-26 电子科技大学 一种基于隐私保护技术的联合深度学习训练方法
CN111079939A (zh) * 2019-11-28 2020-04-28 支付宝(杭州)信息技术有限公司 基于数据隐私保护的机器学习模型特征筛选方法及装置
CN111783124A (zh) * 2020-07-07 2020-10-16 矩阵元技术(深圳)有限公司 基于隐私保护的数据处理方法、装置和服务器

Similar Documents

Publication Publication Date Title
Koti et al. {SWIFT}: Super-fast and robust {Privacy-Preserving} machine learning
US20230109352A1 (en) Node group-based data processing method and system, device, and medium
Zhao et al. Secure multi-party computation: theory, practice and applications
WO2021068445A1 (zh) 数据处理方法、装置、计算机设备和存储介质
Ulukus et al. Private retrieval, computing, and learning: Recent progress and future challenges
CN110719159A (zh) 抗恶意敌手的多方隐私集合交集方法
WO2022142366A1 (zh) 机器学习模型更新的方法和装置
Zheng et al. Securely and efficiently outsourcing decision tree inference
CN113761563B (zh) 数据交集计算方法、装置及电子设备
Song et al. Privacy-preserving unsupervised domain adaptation in federated setting
CN112270415B (zh) 一种加密机器学习的训练数据准备方法、装置和设备
WO2020199785A1 (zh) 私有数据的处理方法、计算方法及所适用的设备
Hazay et al. Concretely efficient large-scale MPC with active security (or, TinyKeys for TinyOT)
WO2023020216A1 (zh) 多方安全确定最值的方法、装置、设备及存储介质
CN116681141A (zh) 隐私保护的联邦学习方法、终端及存储介质
Zhao et al. SMSS: Secure member selection strategy in federated learning
Zhou et al. VDFChain: Secure and verifiable decentralized federated learning via committee-based blockchain
Yang et al. AdaSTopk: Adaptive federated shuffle model based on differential privacy
WO2022109861A1 (zh) 一种加密机器学习的训练数据准备方法、装置和设备
Liu et al. ESA-FedGNN: Efficient secure aggregation for federated graph neural networks
Hegde et al. Attaining GOD beyond honest majority with friends and foes
Kurniawan et al. A privacy-preserving sensor aggregation model based deep learning in large scale internet of things applications
Ge et al. Practical two-party privacy-preserving neural network based on secret sharing
CN115310120A (zh) 一种基于双陷门同态加密的鲁棒性联邦学习聚合方法
Corrigan-Gibbs Protecting Privacy by Splitting Trust

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.10.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20962752

Country of ref document: EP

Kind code of ref document: A1