WO2021139467A1 - 联邦学习方法、系统、计算机设备和存储介质 - Google Patents

联邦学习方法、系统、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021139467A1
WO2021139467A1 PCT/CN2020/134837 CN2020134837W WO2021139467A1 WO 2021139467 A1 WO2021139467 A1 WO 2021139467A1 CN 2020134837 W CN2020134837 W CN 2020134837W WO 2021139467 A1 WO2021139467 A1 WO 2021139467A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
intersection
sample data
federated
model
Prior art date
Application number
PCT/CN2020/134837
Other languages
English (en)
French (fr)
Inventor
周学立
陈玉
孙召元
杜均
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139467A1 publication Critical patent/WO2021139467A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present application relate to the field of big data, and in particular, to a federated learning method, system, computer equipment, and computer-readable storage medium.
  • an embodiment of the present application provides a federated learning method.
  • the method steps include: sending corresponding multiple ID intersection requests to multiple data providers; receiving each data provider according to the corresponding ID intersection request Return the corresponding sample data to obtain multiple sample data; determine whether each sample data has a corresponding federated model; if the sample data does not have a corresponding federated model, send the sample data to the target federated model for processing Training; and if the sample data has a corresponding federated model, the sample data will be sent to the corresponding federated model for training.
  • an embodiment of the present application also provides a federated learning system, including: a sending module, used to send corresponding multiple ID intersection requests to multiple data providers; a receiving module, used to receive each data provider According to the corresponding sample data returned by the corresponding ID intersection request, the terminal obtains multiple sample data, where each sample data carries the corresponding target parameter; the judgment module is used to judge whether each sample data has a corresponding federated model ; Training module, used for the module, if the sample data does not have a corresponding federated model, then the sample data is sent to the target federated model for training; and if the sample data has a corresponding federated model, then According to the sample data, it is sent to the corresponding federated model for training.
  • an embodiment of the present application also provides a computer device, the computer device including a memory, a processor, and a computer program stored on the memory and running on the processor, the computer program When executed by the processor, the following method is implemented: sending corresponding multiple ID intersection requests to multiple data providers; receiving corresponding sample data returned by each data provider according to the corresponding ID intersection request to obtain multiple sample data; Determine whether each sample data has a corresponding federated model; if the sample data does not have a corresponding federated model, send the sample data to the target federated model for training; and if the sample data has a corresponding federated model, Then, the sample data will be sent to the corresponding federated model for training.
  • an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the following method: sending corresponding multiple ID intersection requests to multiple data providers; receiving corresponding sample data returned by each data provider according to the corresponding ID intersection request to obtain multiple sample data; judging Whether each sample data has a corresponding federated model; if the sample data does not have a corresponding federated model, send the sample data to the target federated model for training; and if the sample data has a corresponding federated model, then The sample data will be sent to the corresponding federated model for training.
  • the embodiment of the application solves the problem that the single-model training method will cause the federated learning model by configuring the corresponding federated model for the sample data, and determining whether the corresponding federated model exists for each sample data, and determining the corresponding federated model for the sample data. Effectively improve the accuracy and business effect of the federated learning model.
  • Fig. 1 is a schematic flowchart of a federated learning method according to an embodiment of this application.
  • Figure 2 is a schematic diagram of the program modules of the second embodiment of the federated learning system of this application.
  • FIG. 3 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology.
  • the data involved in this application such as sample data and/or characteristic data, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application.
  • the computer device 2 will be used as the execution subject for exemplary description.
  • FIG. 1 there is shown a flow chart of the steps of the federated learning method in an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following exemplarily describes the computer device 2 as the execution subject. details as follows.
  • Step S100 Send corresponding multiple ID intersection requests to multiple data providers.
  • the ID intersection request is used to instruct the data provider to return a plurality of sample data for training a federated model according to the ID intersection request.
  • the data provider may perform an encryption operation on the returned data.
  • the data requester is the initiator of the service request, has the function of sending a request (request for cooperation and support for data support) to the data provider, and can train the federated model based on the data returned by the data provider.
  • the data provider may be another independent and complete entity with its own computing capability, and can respond to the ID intersection request sent by the data requester, and cooperate with the data requester to complete the federated training of the model.
  • each ID intersection request carries multiple user ID information; the step S100 may further include: sending a corresponding ID intersection request to each data provider, so that each data provider is based on The user ID information carried in the corresponding ID intersection request returns the corresponding first encrypted data.
  • the data requesting end may send a corresponding ID intersection request to each data providing end.
  • the ID intersection request may be parsed to obtain user ID information corresponding to the ID intersection request.
  • the data provider may also obtain target user information corresponding to the user ID information from a database associated with the data provider according to the user ID information, and the target user information is all The information of the user corresponding to the user ID information at the data provider.
  • An encryption operation is performed on the target user information through the first encryption algorithm to obtain the first encrypted data.
  • the data provider may generate a key corresponding to the first encryption algorithm.
  • the first encrypted data may be sent to the data requesting end.
  • the first encryption algorithm may be an asymmetric encryption method or a homomorphic encryption method.
  • the asymmetric encryption method requires two keys: a public key (publickey: public key for short) and a private key (privatekey: private key for short); the public key and the private key are a pair. Encryption can only be decrypted with the corresponding private key; because encryption and decryption use two different keys, this algorithm is called an asymmetric encryption algorithm; the asymmetric encryption algorithm can be RSA algorithm, Elgamal algorithm, Knapsack algorithm, Rabin algorithm, DH algorithm, ECC (elliptic curve encryption algorithm) algorithm or SM2 algorithm, etc.
  • the homomorphic encryption refers to performing addition and multiplication operations on the plaintext to re-encryption, and performing corresponding operations on the ciphertext after encryption, and the result is equivalent.
  • Step S102 receiving corresponding sample data returned by each data provider according to the corresponding ID intersection request, to obtain multiple sample data.
  • the data requester After the data requester sends corresponding ID intersection requests to multiple data providers, it may receive corresponding sample data returned by each data provider according to the corresponding ID intersection requests.
  • the step S102 may further include: step S102a1, receiving the first encrypted data returned by each data provider; step S102a2, performing encryption processing on each first encrypted data to obtain multiple Second encrypted data; and step S102a3, sending each of the second encrypted data to the corresponding data provider.
  • the data requester may perform an encryption operation on the first encrypted data by using a second encryption algorithm to obtain second encrypted data.
  • the second encryption algorithm may be an asymmetric encryption method or a homomorphic encryption method.
  • the asymmetric encryption method requires two keys: a public key (publickey: public key for short) and a private key (privatekey: private key for short); the public key and the private key are a pair.
  • Encryption can only be decrypted with the corresponding private key; because encryption and decryption use two different keys, this algorithm is called an asymmetric encryption algorithm; the asymmetric encryption algorithm can be RSA algorithm, Elgamal algorithm, Knapsack algorithm, Rabin algorithm, DH algorithm, ECC (elliptic curve encryption algorithm) algorithm or SM2 algorithm, etc.
  • the homomorphic encryption refers to performing addition and multiplication operations on the plaintext to re-encryption, and performing corresponding operations on the ciphertext after encryption, and the result is equivalent.
  • each sample data includes multiple intersection data and multiple virtual feature data; the step S102 may further include: step S102b1, obtaining local user information corresponding to each user ID information, and according to the local user information Generate a corresponding target parameter, the target parameter is used to determine the corresponding federation model; step S102b2, insert the target parameter into the corresponding local user information to obtain multiple target local user information; step S102b3, for each The target local user information is encrypted to obtain a plurality of third encrypted data; and step S102b4, each third encrypted data is sent to the corresponding data provider, and each data provider is based on the corresponding second encrypted data and The third encrypted data returns a plurality of corresponding intersection data and a plurality of corresponding virtual feature data.
  • the data requesting terminal may obtain local user information corresponding to each user ID information, and the local user information is user information of the target user at the data requesting terminal.
  • a corresponding target parameter is generated according to the local user information, and the target parameter is used to determine the corresponding federated model; wherein, the target parameter can be a pre-configured parameter according to the corresponding federated model, and the corresponding federated model can be determined by the target parameter .
  • the target parameter may be data in json format.
  • the data requesting terminal may also insert the target parameter into the corresponding local user information to obtain multiple target local user information. And through the third encryption algorithm, each target local user information is encrypted to obtain a plurality of third encrypted data.
  • the third encryption algorithm may be an asymmetric encryption method or a homomorphic encryption method.
  • the data requester may also send the second encrypted data and the third encrypted data to a pre-configured intersection model, so as to compare the second encrypted data through the intersection model. Perform decryption to obtain the decryption result, and determine whether the decryption result is the same as the first encrypted data. If they are the same, perform intersection processing on the first encrypted data and the third encrypted data to obtain the first encrypted data. The intersection data set and the non-intersection data set of the encrypted data and the third encrypted data.
  • the data provider may perform feature labeling processing on each non-intersecting data in the non-intersecting data set to generate multiple virtual features.
  • the step S102 may further include: uploading the multiple sample data to the blockchain.
  • uploading the multiple sample data to the blockchain can ensure its security, fairness and transparency.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • step S104 it is judged whether there is a corresponding federation model for each sample data.
  • the data requester may determine whether to send the sample data to the target federation model according to whether each sample data has a corresponding federation model, so as to train the target federation model.
  • the data requester can improve the business effect of the overall model by summarizing multiple model tasks. For example, you can use ensemble (integration) of multiple models, and configure a task for each model so that one task corresponds to sample data to obtain multiple unit tasks that are decoupled and do not affect the execution of the calculation.
  • the unit task is federated learning Model training and tasks in the ensemble engine.
  • the data requesting end may determine the unit task corresponding to the sample data according to whether each sample data has a corresponding federated model.
  • the step S104 may further include: step S104a, analyzing each sample data to obtain a corresponding target parameter; and step S104b, judging whether there is a corresponding target parameter in the sample data according to the target parameter Federal model.
  • the data requesting terminal may also analyze each sample data to obtain the corresponding target parameter; wherein, the target parameter is used to determine the corresponding federated model. After the data requester obtains the target parameter, it can determine whether the sample data has a corresponding federated model according to the target parameter.
  • Step S106 if the sample data does not have a corresponding federated model, send the sample data to the target federated model for training; and if the sample data has a corresponding federated model, send the sample data to The corresponding federation model is trained.
  • the sample data is sent to the target federation model for training, so as to obtain a trained target federation model.
  • the data requesting terminal may pre-select a training model, and the training model may include LR, XGB, DNN models, and so on.
  • the data requesting end may analyze the sample data to obtain multiple intersection data and multiple virtual feature data.
  • the intersection data set and the multiple virtual features in the sample data are used as the federated training sample of the pre-trained federated model, and the target federated model is trained through the federated training sample to obtain the trained federated model.
  • the target federation model This embodiment not only solves the problem of completing the task without missing information in the intersection part of the sample, but also performs better model training on the data that is the intersection part, and finally obtains a trained target federation model.
  • the step S106 may further include: step S106a, analyzing the sample data to obtain multiple intersection data and multiple virtual feature data; step S106b, generating a corresponding data according to each intersection data Operator task to obtain multiple operator tasks; step S106c, assign a corresponding resource to each operator task to start, so as to perform corresponding intersection data processing through the operator task to obtain corresponding multiple intersection feature data; And step S106d, training the federated model through the multiple intersection feature data and the multiple virtual feature data.
  • the sample data has a corresponding federated model
  • the data requesting terminal may analyze the sample data to obtain multiple intersection data and multiple virtual feature data, and corresponding target parameters (data in json format) ). After extracting the target parameter, the data requesting terminal may generate a corresponding operator task according to the target parameter to obtain multiple operator tasks.
  • the data requesting terminal communicates and transmits related task requirements to the data provider, so that the data provider sends the data to the data provider.
  • the data requesting end cluster asks for resources to execute the received task demand, and cooperates with the data providing end to complete the task.
  • a corresponding operator task executes and processes the corresponding intersection feature data and the plurality of virtual feature data.
  • the training results can be sorted and stored according to different ensemble methods, and output into a format that can be used by the scoring engine. Compared with the traditional single-model results, the results obtained here will have more complex expressions, and the requirements for the scoring model will be higher.
  • FIG. 2 is a schematic diagram of the program modules of the second embodiment of the federated learning system of this application.
  • the federated learning system 20 may include or be divided into one or more program modules.
  • One or more program modules are stored in a storage medium and executed by one or more processors to complete the application and realize the above Federated learning method.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the federated learning system 20 in the storage medium than the program itself. The following description will specifically introduce the function of each program module of this embodiment.
  • the sending module 200 is configured to send corresponding multiple ID intersection requests to multiple data providers.
  • the ID intersection request carries multiple user ID information; the sending module 200 is further configured to: send a corresponding ID intersection request to each data provider, so that each data provider can follow The user ID information carried in the corresponding ID intersection request returns the corresponding first encrypted data.
  • the receiving module 202 is configured to receive corresponding sample data returned by each data provider according to the corresponding ID intersection request to obtain multiple sample data.
  • the receiving module 202 is further configured to: receive the first encrypted data returned by each data provider; perform encryption processing on each first encrypted data to obtain a plurality of second encrypted data; and Send each of the second encrypted data to the corresponding data provider.
  • each sample data includes multiple intersection data and multiple virtual feature data; the receiving module 202 is also used to: obtain the local user information corresponding to each user ID information, and generate a corresponding information according to the local user information.
  • the target parameter is used to determine the corresponding federation model; the target parameter is inserted into the corresponding local user information to obtain multiple target local user information; each target local user information is encrypted to Obtain a plurality of third encrypted data; and send each third encrypted data to the corresponding data provider, and each data provider returns the corresponding plurality of intersection data and data according to the corresponding second encrypted data and third encrypted data.
  • Corresponding multiple virtual feature data is used to: obtain the local user information corresponding to each user ID information, and generate a corresponding information according to the local user information.
  • the target parameter is used to determine the corresponding federation model; the target parameter is inserted into the corresponding local user information to obtain multiple target local user information; each target local user information is encrypted to Obtain a plurality of third encrypted data; and send each third encrypted data to the
  • the judging module 204 is used to judge whether each sample data has a corresponding federated model.
  • the judgment module 204 is further configured to: parse each sample data to obtain a corresponding target parameter; and determine whether the sample data has a corresponding federated model according to the target parameter.
  • the training module 206 is configured to send the sample data to the target federated model for training if the sample data does not have a corresponding federated model; and if the sample data has a corresponding federated model, then send the sample data to the target federated model for training; The data is sent to the corresponding federated model for training.
  • the training module 206 is further configured to: parse the sample data to obtain multiple intersection data and multiple virtual feature data; generate a corresponding operator task according to each intersection data to obtain Multiple operator tasks; each operator task is assigned a corresponding resource to start, so as to perform corresponding intersection data processing through the operator task to obtain corresponding multiple intersection feature data; through the multiple intersection feature data and The plurality of virtual feature data trains the federated model.
  • the federated learning system 20 further includes an upload module, and the upload module is configured to upload the multiple sample data to the blockchain.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device may include a memory, a processor, and a computer program that is stored on the memory and can run on the processor, and the computer program implements part or all of the steps in the above method when the computer program is executed by the processor.
  • the computer equipment may also include a network interface and/or a federated learning system.
  • the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and a federated learning system 20 that can communicate with each other through a system bus.
  • the memory 21 includes at least one type of computer-readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, for example, the program code of the federated learning system 20 in the second embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the federated learning system 20 to implement the federated learning method of the first embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
  • the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network may be an intranet (Intranet), the Internet (Internet), a global system of mobile communication (Global System of Mobile) communicatI/On, GSM), Wideband Code DivisI/On Multiple Access, WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 3 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the federated learning system 20 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and processed by one or more The processor (in this embodiment, the processor 22) is executed to complete the application.
  • FIG. 2 shows a schematic diagram of program modules for implementing the federated learning system 20 according to the second embodiment of the present application.
  • the federated learning system 20 can be divided into a sending module 200, a receiving module 202, and a judgment module. 204 and training module 206.
  • the program module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the federated learning system 20 in the computer device 2.
  • the specific functions of the program modules 200-206 have been described in detail in the second embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium of this embodiment is used in the federated learning system 20, and when executed by a processor, the federated learning method of the first embodiment is implemented.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种联邦学习方法,涉及大数据领域,所述方法包括:向多个数据提供端发送对应的多个ID交集请求(S100);接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据并将所述多个样本数据上传到区块链中(S102);判断每个样本数据是否存在对应的联邦模型(S104);如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练(S106)。该方法有效的提升了联邦学习模型的精准度和业务效果。

Description

联邦学习方法、系统、计算机设备和存储介质
本申请要求于2020年8月7日提交中国专利局、申请号为202010786546.X,发明名称为“联邦学习方法、系统、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及大数据领域,尤其涉及一种联邦学习方法、系统、计算机设备及计算机可读存储介质。
背景技术
随着大数据时代的来临,在互联网领域中数据孤岛的问题越来越突出。而联邦学习的出现,在一定程度上为解决互联网领域中数据孤岛的问题起到了至关重要的作用。但是,发明人意识到,目前的联邦学习大多是在单模型训练的基础上进行,这样虽然可以在一定程度上解决数据孤岛的问题,但是单模型训练方式会使得模型的精准度和业务效果都较低。
因此,如何解决单模型训练的联邦学习使得模型的精准度和业务效果都较低的问题,成为了当前亟需解决的技术问题之一。
技术问题
有鉴于此,有必要提供一种联邦学习方法、系统、计算机设备及计算机可读存储介质,以解决目前单模型训练方式会使得联邦学习模型的精准度和业务效果都较低的技术问题。
技术解决方案
为实现上述目的,本申请实施例提供了一种联邦学习方法,所述方法步骤包括:向多个数据提供端发送对应的多个ID交集请求;接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;判断每个样本数据是否存在对应的联邦模型;如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
为实现上述目的,本申请实施例还提供了一种联邦学习系统,包括:发送模块,用于向多个数据提供端发送对应的多个ID交集请求;接收模块,用于接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据,其中,每个样本数据都携带对应的目标参数;判断模块,用于判断每个样本数据是否存在对应的联邦模型;训练模块,用于模块,用于如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被处理器执行时实现以下方法:向多个数据提供端发送对应的多个ID交集请求;接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;判断每个样本数据是否存在对应的联邦模型;如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下方法:向多个数据提供端发送对应的多个ID交集请求;接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;判断每个样本数据是否存在对应的联邦模型;如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
有益效果
本申请实施例通过为样本数据配置对应的联邦模型,并通过判断每个样本数据是否存在对应的联邦模型确定样本数据配置对应的联邦模型,解决了单模型训练方式会使得联邦学习模型的问题,有效的提升联邦学习模型的精准度和业务效果。
附图说明
图1为本申请实施例联邦学习方法的流程示意图。
图2为本申请联邦学习系统实施例二的程序模块示意图。
图3为本申请计算机设备实施例三的硬件结构示意图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本申请的技术方案可应用于人工智能、区块链和/或大数据技术领域。可选的,本申请涉及的数据如样本数据和/或特征数据等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
以下实施例中,将以计算机设备2为执行主体进行示例性描述。
实施例一。
参阅图1,示出了本申请实施例之联邦学习方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。
步骤S100,向多个数据提供端发送对应的多个ID交集请求。
所述ID交集请求用于指示所述数据提供端根据所述ID交集请求返回多个用于训练联邦模型的样本数据。其中,为了提高数据传输的安全性以及为了避免用户信息的泄露,所述数据提供端可以对返回的数据进行加密操作。
在一些实施例中,所述数据请求端为服务请求的发起方,拥有向数据提供端发送请求(请求配合支持数据支持)的功能,并可以根据数据提供端返回的数据进行联邦模型的训练。所述数据提供端可以是另一独立完全个体,拥有自身的计算能力,可以响应所述数据请求端发送的ID交集请求,并配合数据请求端完成模型的联邦训练。
示例性的,每个ID交集请求携带有多个用户ID信息;所述步骤S100还可以进一步的包括:向每个数据提供端发送对应的ID交集请求,以使所述每个数据提供端根据对应的ID交集请求携带的用户ID信息返回对应的第一加密数据。
所述数据请求端可以向每个数据提供端发送对应的ID交集请求。在所述数据提供端接收到所述ID交集请求后,可以对所述ID交集请求进行解析,以得到ID交集请求对应的用户ID信息。在一些实施例中,所述数据提供端还可以根据所述用户ID信息从与所述数据提供端关联的数据库中获取与所述用户ID信息对应的目标用户信息,所述目标用户信息为所述用户ID信息对应的用户在所述数据提供端的信息。并通过第一加密算法对所述目标用户信息进行加密操作,以得到第一加密数据。其中,所述数据提供端在接收到所述ID交集请求之后,可以生成对应第一加密算法所需要的密钥。在所述数据提供端得到所述第一加密数据后,可以将所述第一加密数据发送到所述数据请求端。其中,所述第一加密算法可以是非对称加密方法或同态加密方法。所述非对称加密方法需要两个密钥:公开密钥(publickey:简称公钥)和私有密钥(privatekey:简称私钥);公钥与私钥是一对,如果用公钥对数据进行加密,只有用对应的私钥才能解密;因为加密和解密使用的是两个不同的密钥,所以这种算法叫作非对称加密算法;所述非对称加密算法可以为RSA算法、Elgamal算法、背包算法、Rabin算法、D-H算法、ECC(椭圆曲线加密算法)算法或者SM2算法等。所述同态加密是指对明文进行环上的加法和乘法运算再加密,与加密后对密文进行相应的运算,结果是等价的。
步骤S102,接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据。
所述数据请求端在向多个数据提供端发送对应的ID交集请求后,可以接收到每个数据提供端根据对应的ID交集请求返回的对应的样本数据。
示例性的,所述步骤S102还可以进一步的包括:步骤S102a1,接收每个数据提供端返回的所述第一加密数据;步骤S102a2,对每个第一加密数据进行加密处理,以得到多个第二加密数据;及步骤S102a3,将所述每个第二加密数据发送到对应的数据提供端。
所述数据请求端在接收到所述每个数据提供端返回的所述第一加密数据后,可以通过第二加密算法对所述第一加密数据进行加密操作,以得到第二加密数据。其中,所述第二加密算法可以是非对称加密方法或同态加密方法。所述非对称加密方法需要两个密钥:公开密钥(publickey:简称公钥)和私有密钥(privatekey:简称私钥);公钥与私钥是一对,如果用公钥对数据进行加密,只有用对应的私钥才能解密;因为加密和解密使用的是两个不同的密钥,所以这种算法叫作非对称加密算法;所述非对称加密算法可以为RSA算法、Elgamal算法、背包算法、Rabin算法、D-H算法、ECC(椭圆曲线加密算法)算法或者SM2算法等。所述同态加密是指对明文进行环上的加法和乘法运算再加密,与加密后对密文进行相应的运算,结果是等价的。
示例性的,每个样本数据包括多个交集数据和多个虚拟特征数据;所述步骤S102还可以进一步的包括:步骤S102b1,获取每个用户ID信息对应的本地用户信息,并根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;步骤S102b2,将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息;步骤S102b3,对每个目标本地用户信息进行加密操作,以得到多个第三加密数据;及步骤S102b4,将每个第三加密数据发送到对应的数据提供端,以每个数据提供端根据对应的第二加密数据和第三加密数据返回对应的多个交集数据和对应的多个虚拟特征数据。
所述数据请求端可以获取每个用户ID信息对应的本地用户信息,所述本地用户信息为目标用户在所述数据请求端的用户信息。根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;其中,所述目标参数可以是根据对应的联邦模型预先配置参数,通过这个目标参数可以确定对应的联邦模型。例如,所述目标参数可以是json格式的数据。
所述数据请求端还可以将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息。并通过第三加密算法对每个目标本地用户信息进行加密处理,以得到多个第三加密数据。其中,所述第三加密算法可以是非对称加密方法或同态加密方法。
在一些实施例中,所述数据请求端还可以将所述第二加密数据和所述第三加密数据发送到预先配置好的交集模型中,以通过所述交集模型对所述第二加密数据进行解密以得到解密结果,并判断所述解密结果与所述第一加密数据是否相同,如果相同则对所述第一加密数据和所述第三加密数据进行交集处理,以得到所述第一加密数据和所述第三加密数据的所述交集数据集和所述非交集数据集。为了保证用户在不同应用中的数据安全,所述数据提供端可以对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征。
示例性的,所述步骤S102还可以进一步的包括:将所述多个样本数据上传到区块链中。
示例性的,将所述多个样本数据上传至区块链可保证其安全性和公正透明性。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
步骤S104,判断每个样本数据是否存在对应的联邦模型。
示例性的,所述数据请求端可以通过每个样本数据是否存在对应的联邦模型,来确定是否将样本数据发送到目标联邦模型中,以对所述目标联邦模型进行训练。
在一些实施例中,所述数据请求端可以通过多个模型任务的汇总,提高整体模型的业务效果。例如,可以通过ensemble(集成)多个模型,并为每个模型配置一个任务,以通过一个任务对应样本数据,以得到多个解耦互不影响执行的计算的单元任务,单元任务为联邦学习模型训练和ensemble引擎中的任务。其中,所述数据请求端可以通过每个样本数据是否存在对应的联邦模型,来确定所述样本数据对应的单元任务。
示例性的,所述步骤S104还可以进一步的包括:步骤S104a,对每个样本数据进行解析,以得到对应的目标参数;及步骤S104b,根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
所述数据请求端还可以对每个样本数据进行解析,以得到对应的目标参数;其中,所述目标参数用于确定对应的联邦模型。当所述数据请求端得到所述目标参数后,可以根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
步骤S106,如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
如果不存在则将所述样本数据发送到目标联邦模型进行训练,以得到训练后的目标联邦模型。在一些实施例中,所述数据请求端可以预先选定训练模型,所述包括训练模型可以包括LR,XGB,DNN模型等。
在一些实施例中,在得到所述数据提供端的样本数据后,所述数据请求端可以对对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据。并将所述样本数据中的所述交集数据集和所述多个虚拟特征作为所述预训练联邦模型的联邦训练样本,并通过所述联邦训练样本对目标联邦模型进行训练,以得到训练后的目标联邦模型。本实施例既解决了交集部分样本无信息缺失的完成了任务,又对为交叉部分的数据进行了更好的模型训练,最终得到了一个训练好的目标联邦模型。
示例性的,所述步骤S106还可以进一步的包括:步骤S106a,对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据;步骤S106b,根据每个交集数据生成一个对应的算子任务,以得到多个算子任务;步骤S106c,为每个算子任务分配一个对应的资源启动,以通过算子任务执行对应的交集数据处理,以得到对应的多个交集特征数据;及步骤S106d,通过所述多个交集特征数据和所述多个虚拟特征数据对所述联邦模型进行训练。
在一些实施例中,如果所述样本数据存在对应的联邦模型,则说明所述数据请求端预先为所述样本数据配置了对应联邦模型。即,在得到所述数据提供端的样本数据后,所述数据请求端可以对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据,以及对应的目标参数(json格式的数据)。提取所述目标参数,所述数据请求端可以根据所述目标参数生成一个对应的算子任务,以得到多个算子任务。为保证多方配合完成任务,在所述数据请求端得到所述目标参数,并开始执行对应的算子任务时,所述数据请求端通信传输相关任务需求至数据提供端,以使数据提供端向所述数据请求端集群索要资源执行收到的任务需求,并配合所述数据提供端成该任务。在一些实施例中,一个对应的算子任务执行处理对应的交集特征数据和所述多个虚拟特征数据。
在一些实施例中,联邦模型训练完成后,可以根据ensemble方法的不同,将训练的结果进行整理存储,输出成打分引擎可以使用的格式。在这里得到的结果相对于传统的单模型结果会有更复杂的表现形式,对打分模型的要求也会更高一些。
实施例二。
图2为本申请联邦学习系统实施例二的程序模块示意图。联邦学习系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述联邦学习方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述联邦学习系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能。
发送模块200,用于向多个数据提供端发送对应的多个ID交集请求。
示例性的,所述ID交集请求携带有多个用户ID信息;所述发送模块200,还用于:向每个数据提供端发送对应的ID交集请求,以使所述每个数据提供端根据对应的ID交集请求携带的用户ID信息返回对应的第一加密数据。
接收模块202,用于接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据。
示例性的,所述接收模块202,还用于:接收每个数据提供端返回的所述第一加密数据;对每个第一加密数据进行加密处理,以得到多个第二加密数据;及将所述每个第二加密数据发送到对应的数据提供端。
示例性的,每个样本数据包括多个交集数据和多个虚拟特征数据;所述接收模块202,还用于:获取每个用户ID信息对应的本地用户信息,并根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息;对每个目标本地用户信息进行加密操作,以得到多个第三加密数据;及将每个第三加密数据发送到对应的数据提供端,以每个数据提供端根据对应的第二加密数据和第三加密数据返回对应的多个交集数据和对应的多个虚拟特征数据。
判断模块204,用于判断每个样本数据是否存在对应的联邦模型。
示例性的,所述判断模块204,还用于:对每个样本数据进行解析,以得到对应的目标参数;及根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
训练模块206,用于如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
示例性的,所述训练模块206,还用于:对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据;根据每个交集数据生成一个对应的算子任务,以得到多个算子任务;为每个算子任务分配一个对应的资源启动,以通过算子任务执行对应的交集数据处理,以得到对应的多个交集特征数据;通过所述多个交集特征数据和所述多个虚拟特征数据对所述联邦模型进行训练。
示例性的,所述联邦学习系统20还包括,上传模块,所述上传模块,用于:将所述多个样本数据上传到区块链中。
实施例三。
参阅图3,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。计算机设备可包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被处理器执行时实现上述方法中的部分或全部步骤。可选的,该计算机设备还可包括网络接口和/或联邦学习系统。例如,如图所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及联邦学习系统20。
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例二的联邦学习系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行联邦学习系统20,以实现实施例一的联邦学习方法。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述计算机设备2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communicatI/On,GSM)、宽带码分多址(Wideband Code DivisI/On Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图3仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的联邦学习系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。
例如,图2示出了本申请实施例二之所述实现联邦学习系统20的程序模块示意图,该实施例中,所述联邦学习系统20可以被划分为发送模块200、接收模块202、判断模块204和训练模块206。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述联邦学习系统20在所述计算机设备2中的执行过程。所述程序模块200-206的具体功能在实施例二中已有详细描述,在此不再赘述。
实施例四。
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于联邦学习系统20,被处理器执行时实现实施例一的联邦学习方法。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种联邦学习方法,其中,所述方法包括:
    向多个数据提供端发送对应的多个ID交集请求;
    接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;
    判断每个样本数据是否存在对应的联邦模型;
    如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及
    如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
  2. 如权利要求1所述的联邦学习方法,其中,所述ID交集请求携带有多个用户ID信息;
    所述向多个数据提供端发送对应的多个ID交集请求,其中,每个ID交集请求携带有一个对应目标参数,包括:
    向每个数据提供端发送对应的ID交集请求,以使所述每个数据提供端根据对应的ID交集请求携带的用户ID信息返回对应的第一加密数据。
  3. 如权利要求2所述的联邦学习方法,其中,所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据,包括:
    接收每个数据提供端返回的所述第一加密数据;
    对每个第一加密数据进行加密处理,以得到多个第二加密数据;及
    将所述每个第二加密数据发送到对应的数据提供端。
  4. 如权利要求3所述的联邦学习方法,其中,每个样本数据包括多个交集数据和多个虚拟特征数据;
    所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据,包括:
    获取每个用户ID信息对应的本地用户信息,并根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;
    将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息;
    对每个目标本地用户信息进行加密操作,以得到多个第三加密数据;及
    将每个第三加密数据发送到对应的数据提供端,以每个数据提供端根据对应的第二加密数据和第三加密数据返回对应的多个交集数据和对应的多个虚拟特征数据。
  5. 如权利要求1所述的联邦学习方法,其中,所述判断每个样本数据是否存在对应的联邦模型,包括:
    对每个样本数据进行解析,以得到对应的目标参数;及
    根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
  6. 如权利要求1所述的联邦学习方法,其中,所述将根据所述样本数据发送到对应的联邦模型进行训练,包括:
    对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据;
    根据每个交集数据生成一个对应的算子任务,以得到多个算子任务;
    为每个算子任务分配一个对应的资源启动,以通过算子任务执行对应的交集数据处理,以得到对应的多个交集特征数据;
    通过所述多个交集特征数据和所述多个虚拟特征数据对所述联邦模型进行训练。
  7. 如权利要求1所述的联邦学习方法,其中,还包括:将所述多个样本数据上传到区块链中。
  8. 一种联邦学习系统,其中,包括:
    发送模块,用于向多个数据提供端发送对应的多个ID交集请求;
    接收模块,用于接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据,其中,每个样本数据都携带对应的目标参数;
    判断模块,用于判断每个样本数据是否存在对应的联邦模型;
    训练模块,用于模块,用于如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
  9. 一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述计算机程序被处理器执行时实现以下方法:
    向多个数据提供端发送对应的多个ID交集请求;
    接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;
    判断每个样本数据是否存在对应的联邦模型;
    如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及
    如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
  10. 如权利要求9所述的计算机设备,其中,所述ID交集请求携带有多个用户ID信息;
    所述向多个数据提供端发送对应的多个ID交集请求,其中,每个ID交集请求携带有一个对应目标参数时,具体实现:
    向每个数据提供端发送对应的ID交集请求,以使所述每个数据提供端根据对应的ID交集请求携带的用户ID信息返回对应的第一加密数据。
  11. 如权利要求10所述的计算机设备,其中,所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据时,具体实现:
    接收每个数据提供端返回的所述第一加密数据;
    对每个第一加密数据进行加密处理,以得到多个第二加密数据;及
    将所述每个第二加密数据发送到对应的数据提供端。
  12. 如权利要求11所述的计算机设备,其中,每个样本数据包括多个交集数据和多个虚拟特征数据;
    所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据时,具体实现:
    获取每个用户ID信息对应的本地用户信息,并根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;
    将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息;
    对每个目标本地用户信息进行加密操作,以得到多个第三加密数据;及
    将每个第三加密数据发送到对应的数据提供端,以每个数据提供端根据对应的第二加密数据和第三加密数据返回对应的多个交集数据和对应的多个虚拟特征数据。
  13. 如权利要求9所述的计算机设备,其中,所述判断每个样本数据是否存在对应的联邦模型时,具体实现:
    对每个样本数据进行解析,以得到对应的目标参数;及
    根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
  14. 如权利要求9所述的计算机设备,其中,所述将根据所述样本数据发送到对应的联邦模型进行训练时,具体实现:
    对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据;
    根据每个交集数据生成一个对应的算子任务,以得到多个算子任务;
    为每个算子任务分配一个对应的资源启动,以通过算子任务执行对应的交集数据处理,以得到对应的多个交集特征数据;
    通过所述多个交集特征数据和所述多个虚拟特征数据对所述联邦模型进行训练。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下方法:
    向多个数据提供端发送对应的多个ID交集请求;
    接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据;
    判断每个样本数据是否存在对应的联邦模型;
    如果所述样本数据不存在对应的联邦模型,则将所述样本数据发送到目标联邦模型进行训练;及
    如果所述样本数据存在对应的联邦模型,则将根据所述样本数据发送到对应的联邦模型进行训练。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述ID交集请求携带有多个用户ID信息;
    所述向多个数据提供端发送对应的多个ID交集请求,其中,每个ID交集请求携带有一个对应目标参数时,具体执行:
    向每个数据提供端发送对应的ID交集请求,以使所述每个数据提供端根据对应的ID交集请求携带的用户ID信息返回对应的第一加密数据。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据时,具体执行:
    接收每个数据提供端返回的所述第一加密数据;
    对每个第一加密数据进行加密处理,以得到多个第二加密数据;及
    将所述每个第二加密数据发送到对应的数据提供端。
  18. 如权利要求17所述的计算机可读存储介质,其中,每个样本数据包括多个交集数据和多个虚拟特征数据;
    所述接收每个数据提供端根据对应的ID交集请求返回的对应的样本数据,以得到多个样本数据时,具体执行:
    获取每个用户ID信息对应的本地用户信息,并根据本地用户信息生成一个对应的目标参数,所述目标参数用于确定对应的联邦模型;
    将所述目标参数插入对应的本地用户信息中,以得到多个目标本地用户信息;
    对每个目标本地用户信息进行加密操作,以得到多个第三加密数据;及
    将每个第三加密数据发送到对应的数据提供端,以每个数据提供端根据对应的第二加密数据和第三加密数据返回对应的多个交集数据和对应的多个虚拟特征数据。
  19. 如权利要求15所述的计算机可读存储介质,其中,所述判断每个样本数据是否存在对应的联邦模型时,具体执行:
    对每个样本数据进行解析,以得到对应的目标参数;及
    根据所述目标参数判断所述样本数据是否存在对应的联邦模型。
  20. 如权利要求15所述的计算机可读存储介质,其中,所述将根据所述样本数据发送到对应的联邦模型进行训练时,具体执行:
    对所述样本数据进行解析,以得到多个交集数据和多个虚拟特征数据;
    根据每个交集数据生成一个对应的算子任务,以得到多个算子任务;
    为每个算子任务分配一个对应的资源启动,以通过算子任务执行对应的交集数据处理,以得到对应的多个交集特征数据;
    通过所述多个交集特征数据和所述多个虚拟特征数据对所述联邦模型进行训练。
PCT/CN2020/134837 2020-08-07 2020-12-09 联邦学习方法、系统、计算机设备和存储介质 WO2021139467A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010786546.XA CN111915019B (zh) 2020-08-07 2020-08-07 联邦学习方法、系统、计算机设备和存储介质
CN202010786546.X 2020-08-07

Publications (1)

Publication Number Publication Date
WO2021139467A1 true WO2021139467A1 (zh) 2021-07-15

Family

ID=73287620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134837 WO2021139467A1 (zh) 2020-08-07 2020-12-09 联邦学习方法、系统、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN111915019B (zh)
WO (1) WO2021139467A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836559A (zh) * 2021-09-28 2021-12-24 中国银联股份有限公司 一种联邦学习中的样本对齐方法、装置、设备及存储介质
CN114358311A (zh) * 2021-12-31 2022-04-15 中国电信股份有限公司 纵向联邦数据处理方法及装置
CN114648130A (zh) * 2022-02-07 2022-06-21 北京航空航天大学 纵向联邦学习方法、装置、电子设备及存储介质
CN117034328A (zh) * 2023-10-09 2023-11-10 国网信息通信产业集团有限公司 一种改进的基于联邦学习的异常用电检测系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915019B (zh) * 2020-08-07 2023-06-20 平安科技(深圳)有限公司 联邦学习方法、系统、计算机设备和存储介质
CN112381000A (zh) * 2020-11-16 2021-02-19 深圳前海微众银行股份有限公司 基于联邦学习的人脸识别方法、装置、设备及存储介质
CN113222169B (zh) * 2021-03-18 2023-06-23 中国地质大学(北京) 结合大数据分析反馈的联邦机器组合服务方法与系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021986A (zh) * 2017-10-27 2018-05-11 平安科技(深圳)有限公司 电子装置、多模型样本训练方法和计算机可读存储介质
CN109492420A (zh) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、终端、系统及介质
US20200019867A1 (en) * 2018-07-11 2020-01-16 International Business Machines Corporation Learning and inferring insights from encrypted data
CN111178538A (zh) * 2019-12-17 2020-05-19 杭州睿信数据科技有限公司 垂直数据的联邦学习方法及装置
CN111915019A (zh) * 2020-08-07 2020-11-10 平安科技(深圳)有限公司 联邦学习方法、系统、计算机设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165683B (zh) * 2018-08-10 2023-09-12 深圳前海微众银行股份有限公司 基于联邦训练的样本预测方法、装置及存储介质
CN109165515A (zh) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 基于联邦学习的模型参数获取方法、系统及可读存储介质
CN109886417B (zh) * 2019-03-01 2024-05-03 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、装置、设备及介质
CN111402095A (zh) * 2020-03-23 2020-07-10 温州医科大学 一种基于同态加密联邦学习来检测学生行为与心理的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021986A (zh) * 2017-10-27 2018-05-11 平安科技(深圳)有限公司 电子装置、多模型样本训练方法和计算机可读存储介质
US20200019867A1 (en) * 2018-07-11 2020-01-16 International Business Machines Corporation Learning and inferring insights from encrypted data
CN109492420A (zh) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、终端、系统及介质
CN111178538A (zh) * 2019-12-17 2020-05-19 杭州睿信数据科技有限公司 垂直数据的联邦学习方法及装置
CN111915019A (zh) * 2020-08-07 2020-11-10 平安科技(深圳)有限公司 联邦学习方法、系统、计算机设备和存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836559A (zh) * 2021-09-28 2021-12-24 中国银联股份有限公司 一种联邦学习中的样本对齐方法、装置、设备及存储介质
CN114358311A (zh) * 2021-12-31 2022-04-15 中国电信股份有限公司 纵向联邦数据处理方法及装置
CN114358311B (zh) * 2021-12-31 2023-11-07 中国电信股份有限公司 纵向联邦数据处理方法及装置
CN114648130A (zh) * 2022-02-07 2022-06-21 北京航空航天大学 纵向联邦学习方法、装置、电子设备及存储介质
CN114648130B (zh) * 2022-02-07 2024-04-16 北京航空航天大学 纵向联邦学习方法、装置、电子设备及存储介质
CN117034328A (zh) * 2023-10-09 2023-11-10 国网信息通信产业集团有限公司 一种改进的基于联邦学习的异常用电检测系统及方法
CN117034328B (zh) * 2023-10-09 2024-03-19 国网信息通信产业集团有限公司 一种改进的基于联邦学习的异常用电检测系统及方法

Also Published As

Publication number Publication date
CN111915019A (zh) 2020-11-10
CN111915019B (zh) 2023-06-20

Similar Documents

Publication Publication Date Title
WO2021139467A1 (zh) 联邦学习方法、系统、计算机设备和存储介质
WO2021204040A1 (zh) 联邦学习数据处理方法、装置、设备及存储介质
US10067810B2 (en) Performing transactions between application containers
EP3484125B1 (en) Method and device for scheduling interface of hybrid cloud
CN109547477B (zh) 一种数据处理方法及其装置、介质、终端
US20190334700A1 (en) Method and system for managing decentralized data access permissions through a blockchain
WO2022142038A1 (zh) 数据传输方法及相关设备
CN111986764B (zh) 基于区块链的医疗数据分享方法、装置、终端及存储介质
US10121021B1 (en) System and method for automatically securing sensitive data in public cloud using a serverless architecture
CN113157648A (zh) 基于区块链的分布式数据存储方法、装置、节点及系统
WO2020207024A1 (zh) 权限管理方法及相关产品
WO2020042798A1 (zh) 密码运算、创建工作密钥的方法、密码服务平台及设备
WO2021139476A1 (zh) 交集数据的生成方法和基于交集数据的联邦模型训练方法
CN111753324B (zh) 私有数据的处理方法、计算方法及所适用的设备
CN111767144A (zh) 交易数据的交易路由确定方法、装置、设备及系统
TWI812366B (zh) 一種資料共用方法、裝置、設備及存儲介質
US11418342B2 (en) System and methods for data exchange using a distributed ledger
CN110990790A (zh) 一种数据处理方法及设备
CN113434906B (zh) 数据查询方法、装置、计算机设备及存储介质
CN110585727B (zh) 一种资源获取方法及装置
CN113094735B (zh) 隐私模型训练的方法
CN112286703B (zh) 用户分类方法、装置、客户端设备及可读存储介质
CN111464542B (zh) 区块链网络的记账方法及装置
CN112799744A (zh) 工业app的调用方法、装置、计算机可读介质及电子设备
CN114519191A (zh) 医疗数据管理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911669

Country of ref document: EP

Kind code of ref document: A1