WO2021139476A1 - 交集数据的生成方法和基于交集数据的联邦模型训练方法 - Google Patents

交集数据的生成方法和基于交集数据的联邦模型训练方法 Download PDF

Info

Publication number
WO2021139476A1
WO2021139476A1 PCT/CN2020/135269 CN2020135269W WO2021139476A1 WO 2021139476 A1 WO2021139476 A1 WO 2021139476A1 CN 2020135269 W CN2020135269 W CN 2020135269W WO 2021139476 A1 WO2021139476 A1 WO 2021139476A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
intersection
user
encrypted data
Prior art date
Application number
PCT/CN2020/135269
Other languages
English (en)
French (fr)
Inventor
周学立
张茜
凌海挺
蔡满天
刘丽扬
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139476A1 publication Critical patent/WO2021139476A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present application relate to the field of data transmission, and in particular to a method for generating intersection data and a method, system, computer device, and computer-readable storage medium for training a federated model based on intersection data.
  • federated learning is mainly through the intersection matching of user IDs, and after the matching is successful, the federated learning is completed by the intersection of some ID users, thereby solving the problem of data islands.
  • it is easy to cause the leakage of the non-intersecting part of the user, and there are certain security risks. Therefore, how to safely and reliably ensure that user information is not leaked to perform federated learning has become one of the current technical problems to be solved.
  • intersection data it is necessary to provide a method for generating intersection data and a federated model training method, system, computer equipment, and computer-readable storage medium based on the intersection data to solve the current technical problems such as the leakage of user information easily caused by federated learning.
  • an embodiment of the present application provides a method for generating intersection data.
  • the method steps include: receiving an ID intersection request sent by a data requesting terminal, where the ID intersection request carries at least one user ID information; The ID intersection request, and return first encrypted data according to the user ID information, so that the data requesting terminal returns second encrypted data and third encrypted data according to the first encrypted data; receiving the second encrypted data Data and the third encrypted data; input the first encrypted data, the second encrypted data, and the third encrypted data into a pre-configured intersection model for intersection processing to obtain an intersection data set and a non- An intersection data set; perform feature labeling processing on each non-intersection data in the non-intersection data set to generate multiple virtual features; and send the intersection data set and the multiple virtual features to the data request terminal for processing Federal training.
  • an embodiment of the present application also provides a federated model training method based on intersection data for a data requesting terminal.
  • the method includes: sending an ID intersection request to the data providing terminal so that the data providing terminal Return the first encrypted data according to the user ID information carried in the ID intersection request; receive the first encrypted data; perform encryption processing on the first encrypted data to obtain the second encrypted data; obtain the user ID information Corresponding local user information, and encrypt the local user information to obtain third encrypted data; send the second encrypted data and the third encrypted data to the data providing terminal, so that the The data providing terminal returns the corresponding intersection data set and multiple virtual features; and uses the intersection data set and the multiple virtual features as federated training samples, and trains the pre-configured pre-trained federated model to obtain the target federation model.
  • an embodiment of the present application also provides a system for generating intersection data, including: a receiving request module, configured to receive an ID intersection request sent by a data requesting terminal, the ID intersection request carrying at least one user ID information
  • the response request module is used to respond to the ID intersection request and return the first encrypted data according to the user ID information, so that the data requesting terminal returns the second encrypted data and the third encrypted data according to the first encrypted data Encrypted data; a data receiving module for receiving the second encrypted data and the third encrypted data; an intersection processing module for encrypting the first encrypted data, the second encrypted data, and the third encrypted data
  • the data is input into a pre-configured intersection model for intersection processing to obtain an intersection data set and a non-intersection data set; the label processing module is used to perform feature labeling processing on each non-intersection data in the non-intersection data set, To generate a plurality of virtual features; and a data sending module, configured to send the intersection data set and the plurality of virtual features to a data request terminal.
  • an embodiment of the present application also provides a computer device, the computer device including a memory, a processor, and a computer program stored on the memory and running on the processor, the computer program When executed by the processor, the following steps are implemented: receiving the ID intersection request sent by the data requesting terminal, the ID intersection request carrying at least one user ID information; responding to the ID intersection request, and returning the first ID intersection request according to the user ID information Encrypting data, so that the data requesting terminal returns second encrypted data and third encrypted data according to the first encrypted data; receiving the second encrypted data and the third encrypted data; sending the first encrypted data , The second encrypted data and the third encrypted data are input into a pre-configured intersection model for intersection processing to obtain an intersection data set and a non-intersection data set; for each non-intersection in the non-intersection data set The data is subjected to feature labeling processing to generate multiple virtual features; and the intersection data set and the multiple virtual features are sent to the data requesting terminal for federated training.
  • An embodiment of the present application also provides a computer device that includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor.
  • the following steps are implemented: sending an ID intersection request to the data providing terminal, so that the data providing terminal returns the first encrypted data according to the user ID information carried in the ID intersection request; receiving the first encrypted data; The encrypted data is encrypted to obtain the second encrypted data; the local user information corresponding to the user ID information is obtained, and the local user information is encrypted to obtain the third encrypted data; the second encrypted data
  • the data and the third encrypted data are sent to the data providing terminal, so that the data providing terminal returns a corresponding intersection data set and multiple virtual features; and using the intersection data set and the multiple virtual features as
  • the federated training sample is used to train the pre-configured pre-trained federated model to obtain the target federated model.
  • an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the following steps: receiving an ID intersection request sent by a data requesting terminal, the ID intersection request carrying at least one user ID information; responding to the ID intersection request, and returning the first encryption according to the user ID information Data, so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data; receives the second encrypted data and the third encrypted data; transfers the first encrypted data, The second encrypted data and the third encrypted data are input into a pre-configured intersection model for intersection processing to obtain an intersection data set and a non-intersection data set; for each non-intersection data in the non-intersection data set Perform feature labeling processing to generate multiple virtual features; and send the intersection data set and the multiple virtual features to a data request terminal for federated training.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program can be executed by at least one processor, so that the at least one processor executes The following steps: send an ID intersection request to the data providing terminal so that the data providing terminal returns the first encrypted data according to the user ID information carried in the ID intersection request; receiving the first encrypted data; encrypting the first The data is encrypted to obtain the second encrypted data; the local user information corresponding to the user ID information is obtained, and the local user information is encrypted to obtain the third encrypted data; the second encrypted data And the third encrypted data are sent to the data providing terminal, so that the data providing terminal returns a corresponding intersection data set and multiple virtual features; and using the intersection data set and the multiple virtual features as a federation
  • the training sample is used to train the pre-configured pre-trained federated model to obtain the target federated model.
  • FIG. 1 is a schematic flowchart of a method for generating intersection data in Embodiment 1 of this application.
  • FIG. 2 is a schematic flowchart of a training method for a federated model based on intersection data in Embodiment 2 of this application.
  • FIG. 3 is a schematic diagram of program modules of Embodiment 3 of the system for generating intersection data according to this application.
  • FIG. 4 is a schematic diagram of program modules of Embodiment 4 of a federated model training system based on intersection data in this application.
  • FIG. 5 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology to improve data security.
  • the data involved in this application can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application.
  • FIG. 1 shows a flowchart of the steps of a method for generating intersection data in an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following is an exemplary description with a data providing terminal as an execution subject.
  • the data providing terminal is a data providing terminal and can perform an encryption operation on data. details as follows.
  • Step S100 Receive an ID intersection request sent by a data requesting terminal, where the ID intersection request carries at least one user ID information.
  • the data providing terminal may receive an ID intersection request sent by the data requesting terminal, where the ID intersection request carries at least one user ID information.
  • the data requesting terminal has the function of sending a request to the data providing terminal for the initiator of the service request.
  • the data providing terminal may be another independent and complete entity with its own computing capabilities.
  • the data requesting terminal and the data providing terminal can communicate with each other.
  • the expression form of the request is generally to send data information, receive data information, transmit status commands, and so on.
  • the data providing terminal and the data requesting terminal may be a computer, a computing cluster, a tablet personal computer (tablet personal computer), a laptop computer (laptop computer) and other devices with a data transmission function.
  • Step S102 In response to the ID intersection request, and return first encrypted data according to the user ID information, so that the data requesting terminal returns second encrypted data and third encrypted data according to the first encrypted data.
  • the data providing terminal may generate a key corresponding to the first encryption algorithm to perform data corresponding to the user ID information using the key of the first encryption algorithm. Encrypting to obtain the first encrypted data, and sending the first encrypted data to the data requesting terminal. So that the data request terminal performs encryption processing on the first encrypted data according to the second encryption algorithm to obtain the second encrypted data. And obtain the local user information corresponding to the user ID information, and perform encryption processing on the local user information through the second encryption algorithm to obtain the third encrypted data.
  • the step S102 may further include steps S102a to S102b, wherein: step S102a, obtaining target user information corresponding to the user ID information according to the user ID information; and step S102b, for The target user information is encrypted to obtain the first encrypted data.
  • the target user information is user information of the target user in the data providing terminal.
  • the data providing terminal may obtain target user information corresponding to the user ID information from the data providing terminal according to the user ID information, where the target user information is that the user corresponding to the user ID information is in the data Provide terminal information. It should be noted that the same user can register an account on the application associated with the data providing terminal and the application associated with the data requesting terminal, respectively. Since it is the information of the same user, the data providing terminal can obtain the target user information on the data providing terminal of the target user corresponding to the user ID information according to the user ID information.
  • the data providing terminal may encrypt the target user information after obtaining the target user information. Processing to obtain the first encrypted data.
  • the user ID information includes first ID information; the step S102a may further include steps S102a1 to S102a2, wherein: step S102a1, according to a preset format conversion rule, the first ID The information is formatted to obtain the second ID information corresponding to the first ID information; and step S102a2, according to the second ID information, the target user information corresponding to the user ID information is obtained.
  • the target user may perform information registration at the data providing terminal to obtain the first ID information, and may also perform information registration at the data request terminal to obtain the second ID information.
  • the first ID information may be "X123”
  • the second ID information may be "XX123”.
  • the data providing terminal may generate second ID information corresponding to the first ID information according to the first ID information carried in the user ID information and the format conversion rule, and then obtain second ID information according to The second ID information obtains target user information corresponding to the user ID information from a database associated with the data provider.
  • Step S104 Receive the second encrypted data and the third encrypted data.
  • the data requesting terminal may perform encryption processing on the first encrypted data to obtain second encrypted data. And obtain the local user information corresponding to the user ID information, and perform encryption processing on the local user information to obtain the third encrypted data.
  • the local user information is user information of the target user at the data request terminal.
  • the data requesting terminal may encrypt the first encrypted data by using a second encryption algorithm to obtain the second encrypted data.
  • the local user information is encrypted by the second encryption algorithm to obtain the third encrypted data.
  • Step S106 Input the first encrypted data, the second encrypted data, and the third encrypted data into a pre-configured intersection model for intersection processing to obtain an intersection data set and a non-intersection data set.
  • the intersection model may decrypt the second encrypted data to obtain a decryption result, and determine whether the decryption result is the same as the first encrypted data, and if the same, encrypt the first encrypted data.
  • the data and the third encrypted data are subjected to intersection processing to obtain the intersection data set and the non-intersection data set of the first encrypted data and the third encrypted data.
  • the intersection model is a model for calculating the intersection of two sets of data. For example, the first encrypted data is [1, 5, 7, 6, 8, 9], and the third data is [1, 2, 7, 8], then the intersection data set is [1, 7, 8], and the non-intersection data set is [2, 5, 6, 9].
  • Step S108 Perform feature labeling processing on each non-intersected data in the non-intersected data set to generate multiple virtual features.
  • the data providing terminal may perform feature tagging processing on each non-intersecting data in the non-intersecting data set to generate multiple virtual features.
  • the non-intersecting data set [2, 5, 6, 9] is converted into multiple virtual features: null, null, tag, tag.
  • Step S110 Send the intersection data set and the multiple virtual features to a data requesting terminal for federated training.
  • intersection data set and the multiple virtual characteristics may be sent to the data requesting terminal, so that the data requesting terminal Training the federated model according to the intersection data set and the plurality of virtual features.
  • the method for generating the intersection data may further include steps S112a to S112c of configuring the format conversion rules, wherein: step S112a, obtaining a plurality of first ID information provided by the data requesting terminal in advance , Wherein each user ID information carries the user identity information of the user; step S112b, determining the second ID information corresponding to each first ID information according to the user identity information; and step S112c, according to each first ID information An ID information and second ID information corresponding to the first ID information configure the format conversion rule.
  • each user can register an account in a different application to obtain corresponding account information.
  • the target user may perform information registration at the data providing terminal to obtain the first ID information, and may also perform information registration at the data request terminal to obtain the second ID information.
  • the first ID information may be "X123”
  • the second ID information may be "XX123". Since the first ID information and the second ID information correspond to the same user (target user), both the data providing terminal and the data requesting terminal have the real identity information of the target user, namely The first ID information may determine the corresponding second ID information according to the real identity information of the target user, and configure the format conversion rule according to the first ID information and the second ID information.
  • the first ID information can be "X123”
  • the conversion rule for converting the second ID information can be "XX123” can be to add an "X" before "X123” to obtain "XX123".
  • the method for generating the intersection data may further include: uploading the intersection data set and a plurality of virtual features to a blockchain.
  • uploading the intersection data set and the multiple virtual features to the blockchain can ensure its security, fairness and transparency.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 2 shows a flow chart of the steps of a method for training a federated model based on intersection data in an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following is an exemplary description with the data requesting terminal as the execution subject.
  • the data requesting terminal may send request information to the data providing terminal so that the data providing terminal returns corresponding data. details as follows.
  • Step S200 Send an ID intersection request to the data providing terminal, so that the data providing terminal returns the first encrypted data according to the user ID information carried in the ID intersection request.
  • the ID intersection request is used to instruct the data providing terminal to return corresponding encrypted data according to the ID intersection request.
  • the data requesting terminal may send an ID intersection request to the data providing terminal.
  • the ID intersection request carries user ID information of the target user.
  • the data providing terminal may obtain target user information corresponding to the user ID information from the data providing terminal according to the user ID information, where the target user information is that the user corresponding to the user ID information is in the data Provide terminal information.
  • An encryption operation is performed on the target user information through the first encryption algorithm to obtain the first encrypted data.
  • the data providing terminal may generate a key required by the first encryption algorithm.
  • the first encrypted data may be sent to the data requesting terminal.
  • the data request terminal is the initiator of the service request, has the function of sending a request (request for cooperation and supporting data support) to the data providing terminal, and can train the federated model according to the data returned by the data providing terminal.
  • the data providing terminal may be another independent and complete entity with its own computing capability, and can respond to the ID intersection request sent by the data requesting terminal, and cooperate with the data requesting terminal to complete the federated training of the model.
  • Step S202 Receive the first encrypted data.
  • Step S204 Perform encryption processing on the first encrypted data to obtain second encrypted data.
  • Step S206 Obtain local user information corresponding to the user ID information, and perform encryption processing on the local user information to obtain third encrypted data.
  • the data requesting terminal may perform encryption processing on the first encrypted data to obtain second encrypted data. And obtain the local user information corresponding to the user ID information, and perform encryption processing on the local user information to obtain the third encrypted data.
  • the local user information is user information of the target user at the data request terminal.
  • the data requesting terminal may encrypt the first encrypted data by using a second encryption algorithm to obtain the second encrypted data.
  • the local user information is encrypted by the second encryption algorithm to obtain the third encrypted data.
  • Step S208 Send the second encrypted data and the third encrypted data to the data providing terminal, so that the data providing terminal returns a corresponding intersection data set and multiple virtual features.
  • the second encrypted data and the third encrypted data may be sent to the data providing terminal.
  • the data providing terminal may input the first encrypted data, the second encrypted data, and the third encrypted data to the pre-configured Intersection processing is performed in a good intersection model to obtain intersection data sets and non-intersection data sets.
  • the intersection model may decrypt the second encrypted data to obtain a decryption result, and determine whether the decryption result is the same as the first encrypted data, and if the same, encrypt the first encrypted data.
  • the data and the third encrypted data are subjected to intersection processing to obtain the intersection data set and the non-intersection data set of the first encrypted data and the third encrypted data.
  • the intersection model is a model for calculating the intersection of two sets of data.
  • the first encrypted data is [1, 5, 7, 6, 8, 9]
  • the third data is [1, 2, 7, 8]
  • the intersection data set is [1, 7, 8]
  • the non-intersection data set is [2, 5, 6, 9].
  • the data providing terminal may perform feature tagging processing on each non-intersecting data in the non-intersecting data set to generate multiple virtual features.
  • the non-intersecting data set [2, 5, 6, 9] is converted into multiple virtual features: null, null, tag, tag.
  • step S210 the intersection data set and the multiple virtual features are used as federated training samples, and a pre-trained federated model configured in advance is trained to obtain a target federated model.
  • the data requesting terminal may obtain the federated model to be trained in advance, and pre-train the federated model to be trained through local user data, where the federated model to be trained may be LR, XGB, DNN, etc.
  • the intersection data set and the multiple virtual features can be used as the federated training samples of the pre-trained federated model, and pass The federated training sample trains the pre-trained federated model to obtain the target federated model.
  • This embodiment not only solves the problem of completing the task without missing information in the intersection part of the sample, but also performs better model training on the data that is the intersection part, and finally obtains a trained target federation model.
  • the data providing terminal can ensure that the real data is safe and not local, and cooperate with the data to request the terminal to complete model training.
  • the intermediate data can be transferred.
  • the intermediate data includes plain text (unencrypted key, etc.), and also includes encrypted (usually homomorphic encryption) model and data information.
  • the method for training a federated model based on intersection data may further include: uploading the intersection data set and a plurality of virtual features to a blockchain.
  • uploading the intersection data set and the multiple virtual features to the blockchain can ensure its security, fairness and transparency.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 3 is a schematic diagram of program modules of Embodiment 3 of the system for generating intersection data according to this application.
  • the intersection data generation system 30 may include or be divided into one or more program modules, one or more program modules are stored in a storage medium and executed by one or more processors to complete the application, and Realize the above-mentioned generation method of intersection data.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the intersection data generation system 30 in the storage medium than the program itself. The following description will specifically introduce the function of each program module of this embodiment.
  • the receiving request module 300 is configured to receive an ID intersection request sent by a data requesting terminal, where the ID intersection request carries at least one user ID information.
  • the response request module 302 is configured to respond to the ID intersection request and return first encrypted data according to the user ID information, so that the data requesting terminal returns second encrypted data and third encrypted data according to the first encrypted data. Encrypt data.
  • the response request module 302 is further configured to: obtain target user information corresponding to the user ID information according to the user ID information; and perform encryption processing on the target user information to obtain the first encryption data.
  • the response request module 302 is further configured to: perform format conversion on the first ID information according to a preset format conversion rule to obtain the second ID information corresponding to the first ID information; and The second ID information acquires target user information corresponding to the user ID information.
  • the data receiving module 304 is configured to receive the second encrypted data and the third encrypted data.
  • An intersection processing module 306 configured to input the first encrypted data, the second encrypted data, and the third encrypted data into a pre-configured intersection model for intersection processing to obtain an intersection data set and non-intersection data set.
  • the label processing module 308 is configured to perform feature labeling processing on each non-intersected data in the non-intersected data set to generate multiple virtual features.
  • the data sending module 310 is configured to send the intersection data set and the multiple virtual characteristics to a data requesting terminal.
  • the system for generating intersection data may further include a configuration module configured to obtain in advance multiple pieces of first ID information provided by the data requesting terminal, wherein each user ID information carries The user identity information of the user; the second ID information corresponding to each first ID information is determined according to the user identity information; and the second ID information corresponding to each first ID information and the first ID information, Configure the format conversion rules.
  • a configuration module configured to obtain in advance multiple pieces of first ID information provided by the data requesting terminal, wherein each user ID information carries The user identity information of the user; the second ID information corresponding to each first ID information is determined according to the user identity information; and the second ID information corresponding to each first ID information and the first ID information, Configure the format conversion rules.
  • system for generating intersection data may further include an upload module configured to upload the intersection data set and multiple virtual features to the blockchain.
  • FIG. 4 is a schematic diagram of program modules of Embodiment 4 of a federated model training system based on intersection data in this application.
  • the federated model training system 40 based on intersection data may include or be divided into one or more program modules, one or more program modules are stored in a storage medium and executed by one or more processors to complete this application , And can realize the above-mentioned federated model training method based on intersection data.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than the program itself to describe the execution process of the federated model training system 40 based on the intersection data in the storage medium. The following description will specifically introduce the function of each program module of this embodiment.
  • the sending request module 400 is configured to send an ID intersection request to a data providing terminal, so that the data providing terminal returns the first encrypted data according to the user ID information carried in the ID intersection request.
  • the receiving response module 402 is configured to receive the first encrypted data.
  • the data encryption module 404 is configured to perform encryption processing on the first encrypted data to obtain second encrypted data.
  • the obtaining information module 406 is configured to obtain local user information corresponding to the user ID information, and perform encryption processing on the local user information to obtain third encrypted data.
  • the data receiving module 408 is configured to send the second encrypted data and the third encrypted data to the data providing terminal, so that the data providing terminal returns a corresponding intersection data set and multiple virtual features.
  • the model training module 410 is configured to use the intersection data set and the multiple virtual features as federated training samples to train in a pre-configured pre-trained federated model to obtain a target federated model.
  • system for generating intersection data may further include an upload module configured to upload the intersection data set and multiple virtual features to the blockchain.
  • the computer device 3 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the computer device 3 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 3 at least includes, but is not limited to, a memory, a processor, and a computer program stored on the memory and capable of running on the processor.
  • the computer program is executed by the processor to realize the foregoing Part or all of the steps in the method.
  • the computer device may also include a network interface, an intersection data generation system, and/or a federated model training system based on intersection data.
  • the memory 31, the processor 32, the network interface 33, and the generation system 30 of intersection data or the federated model training system 40 based on the intersection data may be connected to each other in communication through a system bus.
  • the memory 31 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 31 may be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3.
  • the memory 31 may also be an external storage device of the computer device 3, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 31 may also include both the internal storage unit of the computer device 3 and its external storage device.
  • the memory 31 is generally used to store the operating system and various application software installed in the computer device 3, such as the intersection data generation system 30 of the third embodiment or the intersection data-based federated model training system 40 of the fourth embodiment. The program code and so on.
  • the memory 31 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 32 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 32 is generally used to control the overall operation of the computer device 3.
  • the processor 32 is used to run the program code or process data stored in the memory 31, for example, to run the intersection data generation system 30 or the intersection data-based federated model training system 40, so as to realize the intersection data of the first embodiment. Generate or implement the training method of a federated model based on intersection data of the second embodiment.
  • the network interface 33 may include a wireless network interface or a wired network interface, and the network interface 33 is generally used to establish a communication connection between the computer device 3 and other electronic devices.
  • the network interface 33 is used to connect the computer device 3 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 3 and the external terminal.
  • the network may be an intranet (Intranet), the Internet (Internet), a global system of mobile communication (Global System of Mobile) communication, GSM), Wideband Code Division Multiple Access (Wideband Code Division Multiple Access, WCDMA), 4G network, 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 5 only shows the computer device 3 with components 30-33, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the system 30 for generating intersection data stored in the memory 31 can also be divided into one or more program modules.
  • the one or more program modules are stored in the memory 31 and are composed of one or more program modules. Is executed by two processors (in this embodiment, the processor 32) to complete the application.
  • FIG. 3 shows a schematic diagram of program modules of the system 30 for generating intersection data according to the third embodiment of the present application.
  • the system 30 for generating intersection data can be divided into a receiving request module 300 and a response module.
  • the program module referred to in the present application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the intersection data generation system 30 in the computer device 3.
  • the specific functions of the program modules 300-310 have been described in detail in the third embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium of this embodiment is used for the generation system 30 of intersection data or the federation model training system 40 based on the intersection data. When executed by a processor, the method for generating intersection data in the first embodiment or the method for generating the intersection data in the second embodiment can be implemented when executed by a processor. Federated model training method for intersection data.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Storage Device Security (AREA)

Abstract

提供了一种交集数据生成方法,所述方法包括:接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;响应于所述ID交集请求;接收所述第二加密数据和所述第三加密数据;将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练,并将所述交集数据集和所述多个虚拟特征上传到区块链中。本方法解决了联邦学习容易造成用户信息的泄露问题,提高了用户的数据安全性。

Description

交集数据的生成方法和基于交集数据的联邦模型训练方法
本申请要求于2020年8月7日提交中国专利局、申请号为202010786660.2,发明名称为“交集数据的生成方法和基于交集数据的联邦模型训练方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据传输领域,尤其涉及一种交集数据的生成方法和基于交集数据的联邦模型训练方法、系统、计算机设备及计算机可读存储介质。
背景技术
随着大数据时代的来临,在互联网领域中数据孤岛的问题越来越突出。而联邦学习的出现,在一定程度上为解决互联网领域中数据孤岛的问题起到了至关重要的作用。发明人意识到,目前,联邦学习主要是通过对用户ID的进行交集匹配,并在匹配成功后,通过交集部分ID用户的完成联邦学习,从而解决数据孤岛问题。但是容易造成用户的非交集部分信息的泄露,存在一定的安全隐患。因此,如何可以安全可靠的保证用户信息的不泄露情况下进行联邦学习成为了当前要解决的技术问题之一。
技术问题
有鉴于此,有必要提供一种交集数据的生成方法和基于交集数据的联邦模型训练方法、系统、计算机设备及计算机可读存储介质,以解决当前联邦学习容易造成用户信息的泄露等技术问题。
技术解决方案
为实现上述目的,本申请实施例提供了一种交集数据的生成方法,所述方法步骤包括:接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;接收所述第二加密数据和所述第三加密数据;将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
为实现上述目的,本申请实施例还提供了一种基于交集数据的联邦模型训练方法,用于数据请求终端,所述方法包括:向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;接收所述第一加密数据;对所述第一加密数据进行加密处理,以得到第二加密数据;获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
为实现上述目的,本申请实施例还提供了一种交集数据的生成系统,包括:接收请求模块,用于接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;响应请求模块,用于响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;接收数据模块,用于接收所述第二加密数据和所述第三加密数据;交集处理模块,用于将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;标签处理模块,用于对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及发送数据模块,用于将所述交集数据集和所述多个虚拟特征发送到数据请求终端。
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被处理器执行时实现以下步骤:接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;接收所述第二加密数据和所述第三加密数据;将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被处理器执行时实现以下步骤:向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;接收所述第一加密数据;对所述第一加密数据进行加密处理,以得到第二加密数据;获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;接收所述第二加密数据和所述第三加密数据;将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;接收所述第一加密数据;对所述第一加密数据进行加密处理,以得到第二加密数据;获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
有益效果
本申请实施例通过对用户信息的非交集数据进行特征标签化处理,解决了联邦学习容易造成用户信息的泄露,提高了用户的数据安全性。
附图说明
图1为本申请实施例一中交集数据的生成方法的流程示意图。
图2为本申请实施例二中基于交集数据的联邦模型训练方法的流程示意图。
图3为本申请交集数据的生成系统实施例三的程序模块示意图。
图4为本申请基于交集数据的联邦模型训练系统实施例四的程序模块示意图。
图5为本申请计算机设备实施例三的硬件结构示意图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本申请的技术方案可应用于人工智能、区块链和/或大数据技术领域,以提升数据安全性。可选的,本申请涉及的数据如数据集和/或特征等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
实施例一。
参阅图1,示出了本申请实施例之交集数据的生成方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以数据提供终端为执行主体进行示例性描述,所述数据提供终端为数据提供端,可以对数据进行加密操作。具体如下。
步骤S100,接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息。
所述数据提供终端可以接收所述数据请求终端发送的ID交集请求,其中,所述ID交集请求携带有至少一个用户ID信息。
所述数据请求终端为服务请求的发起方拥有向数据提供终端发送请求的功能。所述数据提供终端可以是另一独立完全个体,拥有自身的计算能力。
数据请求终端和数据提供终端可以互相通信。请求的表达形式一般为发送数据信息,接收数据信息,传递状态命令等等。所述数据提供终端和所述数据请求终端可以是计算机、计算集群、平板个人计算机(tablet personal computer)、膝上型计算机(laptop computer)等具有数据传输功能的设备。
步骤S102,响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据。
所述数据提供终端在接收到所述ID交集请求之后,可以生成对应第一加密算法所需要的密钥,以通过所述第一加密算法的密钥对与所述用户ID信息对应的数据进行加密,以得到所述第一加密数据,并将所述第一加密数据发送到所述数据请求终端。以使所述数据请求终端根据第二加密算法对所述第一加密数据进行加密处理,以得到第二加密数据。并获取与所述用户ID信息对应的本地用户信息,并通过第二加密算法对所述本地用户信息进行加密处理,以得到第三加密数据。
在示例性的实施例中,所述步骤S102可以进一步的包括步骤S102a~ S102b,其中:步骤S102a,根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息;及步骤S102b,对所述目标用户信息进行加密处理,以得到第一加密数据。
所述目标用户信息为目标用户在所述数据提供终端中的用户信息。所述数据提供终端可以根据所述用户ID信息从所述数据提供终端中获取与所述用户ID信息对应的目标用户信息,所述目标用户信息为所述用户ID信息对应的用户在所述数据提供终端的信息。需要说明的是,同一个用户可以分别在所述数据提供终端关联的应用和所述数据请求终端关联的应用上注册账号。由于是同一个用户的信息,所以,所述数据提供终端可以根据所述用户ID信息,获取与所述用户ID信息对应的目标用户在所述数据提供终端上的所述目标用户信息。
不难理解,不同用户在不同应用中的信息会存在差异,为了保证所述目标用户在信息安全,所述数据提供终端可以在得到所述目标用户信息后,可以对所述目标用户信息进行加密处理,以得到第一加密数据。
在示例性的实施例中,所述用户ID信息包括第一ID信息;所述步骤S102a可以进一步的包括步骤S102a1~S102a2,其中:步骤S102a1,根据预设的格式转换规则对所述第一ID信息进行格式转换,以得到所述第一ID信息对应的第二ID信息;及步骤S102a2,根据所述第二ID信息获取与所述用户ID信息对应的目标用户信息。
在示例性的实施例中,目标用户可以在所述数据提供终端进行信息注册以得到第一ID信息,还可以在所述数据请求终端进行信息注册以得到第二ID信息。例如,所述第一ID信息可以是“X123”,所述第二ID信息可以是“XX123”。在所述数据提供终端得到所述用户ID信息后,可以根据所述用户ID信息携带的第一ID信息和所述格式转换规则生成与所述第一ID信息对应得到第二ID信息,然后根据所述第二ID信息从与所述数据提供终相关联的数据库中获取与所述用户ID信息对应的目标用户信息。
步骤S104,接收所述第二加密数据和所述第三加密数据。
所述数据请求终端在接收到所述数据提供终端提供的所述第一加密数据后可以对所述第一加密数据进行加密处理,以得到第二加密数据。并获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据。其中,所述本地用户信息为目标用户在所述数据请求终端的用户信息。在一些实施例中,所述数据请求终端可以通过第二加密算法对所述第一加密数据进行加密处理,以得到第二加密数据。并通过第二加密算法对所述本地用户信息进行加密处理,以得到第三加密数据。
步骤S106,将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集。
在一些实施例中,所述交集模型可以对所述第二加密数据进行解密以得到解密结果,并判断所述解密结果与所述第一加密数据是否相同,如果相同则对所述第一加密数据和所述第三加密数据进行交集处理,以得到所述第一加密数据和所述第三加密数据的所述交集数据集和所述非交集数据集。其中,所述交集模型是一种用于计算两组数据的交集的模型,例如,所述第一加密数据为[1、5、7、6、8、9],所述第三数据为为[1、2、7、8],那么交集数据集为[1、7、8],非交集数据集为[2、5、6、9]。
步骤S108,对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征。
为了保证用户在不同应用中的数据安全,所述数据提供终端可以对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征。例如,将所述非交集数据集[2、5、6、9],转换为多个虚拟特征:null、null、tag、tag。
步骤S110,将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
在所述数据提供终端得到所述交集数据集和所述多个虚拟特征后,可以将所述交集数据集和所述多个虚拟特征发送到所述数据请求终端,以使所述数据请求终端根据所述交集数据集和所述多个虚拟特征对所述联邦模型进行训练。
在示例性的实施例中,所述交集数据的生成方法还可以包括配置所述格式转换规则的步骤S112a~ S112c,其中:步骤S112a,预先获取所述数据请求终端提供的多个第一ID信息,其中,每个用户ID信息携带有该用户的用户身份信息;步骤S112b,根据所述用户身份信息确定所述每个第一ID信息对应的第二ID信息;及步骤S112c,根据每个第一ID信息和该第一ID信息对应的第二ID信息,配置所述格式转换规则。
在示例性的实施例中,每个用户可以在不同的应用中注册账号以得到对应的账号信息。例如,目标用户可以在所述数据提供终端进行信息注册以得到第一ID信息,还可以在所述数据请求终端进行信息注册以得到第二ID信息。其中,所述第一ID信息可以是“X123”,所述第二ID信息可以是“XX123”。由于所述第一ID信息和所述第二ID信息对应的是同一个用户(目标用户),所以,在所述数据提供终端和所述数据请求终端均有所述目标用户真实身份信息,即,所述第一ID信息可以根据所述目标用户真实身份信息确定对应的所述第二ID信息,并根据所述第一ID信息和所述第二ID信息,配置所述格式转换规则。例如,所述第一ID信息可以为“X123”,转换为所述第二ID信息可以为“XX123”的转换规则可以是,在“X123”前面增加一个“X”,以得到“XX123”。
在示例性的实施例中,所述交集数据的生成方法还可以包括:将所述交集数据集和多个虚拟特征上传到区块链中。
示例性的,将所述交集数据集和所述多个虚拟特征上传至区块链可保证其安全性和公正透明性。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
实施例二。
参阅图2,示出了本申请实施例之基于交集数据的联邦模型训练方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以数据请求终端为执行主体进行示例性描述,该数据请求终端可以向数据提供终端发送请求信息,以使所述数据提供终端返回对应的数据。具体如下。
步骤S200,向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据。
所述ID交集请求用于指示所述数据提供终端根据所述ID交集请求返回对应的加密数据。
所述数据请求终端可以向数据提供终端发送ID交集请求。其中,所述ID交集请求中携带有目标用户的用户ID信息。
所述数据提供终端可以根据所述用户ID信息从所述数据提供终端中获取与所述用户ID信息对应的目标用户信息,所述目标用户信息为所述用户ID信息对应的用户在所述数据提供终端的信息。并通过第一加密算法对所述目标用户信息进行加密操作,以得到第一加密数据。其中,所述数据提供终端在接收到所述ID交集请求之后,可以生成对应第一加密算法所需要的密钥。在所述数据提供终端得到所述第一加密数据后,可以将所述第一加密数据发送到所述数据请求终端。
所述数据请求终端为服务请求的发起方,拥有向数据提供终端发送请求(请求配合支持数据支持)的功能,并可以根据数据提供终端返回的数据进行联邦模型的训练。所述数据提供终端可以是另一独立完全个体,拥有自身的计算能力,可以响应所述数据请求终端发送的ID交集请求,并配合数据请求终端完成模型的联邦训练。
步骤S202,接收所述第一加密数据。
步骤S204,对所述第一加密数据进行加密处理,以得到第二加密数据。
步骤S206,获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据。
所述数据请求终端在接收到所述数据提供终端提供的所述第一加密数据后可以对所述第一加密数据进行加密处理,以得到第二加密数据。并获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据。其中,所述本地用户信息为目标用户在所述数据请求终端的用户信息。在一些实施例中,所述数据请求终端可以通过第二加密算法对所述第一加密数据进行加密处理,以得到第二加密数据。并通过第二加密算法对所述本地用户信息进行加密处理,以得到第三加密数据。
步骤S208,将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征。
在所述数据请求终端得到所述第二加密数据和所述第三加密数据后,可以将所述第二加密数据和所述第三加密数据发送到所述数据提供终端。所述数据提供终端在再接收到所述第二加密数据和所述第三加密数据后,可以将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集。在一些实施例中,所述交集模型可以对所述第二加密数据进行解密以得到解密结果,并判断所述解密结果与所述第一加密数据是否相同,如果相同则对所述第一加密数据和所述第三加密数据进行交集处理,以得到所述第一加密数据和所述第三加密数据的所述交集数据集和所述非交集数据集。其中,所述交集模型是一种用于计算两组数据的交集的模型,例如,所述第一加密数据为[1、5、7、6、8、9],所述第三数据为为[1、2、7、8],那么交集数据集为[1、7、8],非交集数据集为[2、5、6、9]。为了保证用户在不同应用中的数据安全,所述数据提供终端可以对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征。例如,将所述非交集数据集[2、5、6、9],转换为多个虚拟特征:null、null、tag、tag。在所述数据提供终端得到所述交集数据集和所述多个虚拟特征后,可以将所述交集数据集和所述多个虚拟特征发送到所述数据请求终端。
步骤S210,将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
在示例性的实施例中,所述数据请求终端可以预先获取待训练的联邦模型,并通过本地用户数据对所述待训练的联邦模型进行预训练,其中,所述待训练的联邦模型可以是LR,XGB,DNN等。在得到所述数据提供终端的所述交集数据集和所述多个虚拟特征后,可以将所述交集数据集和所述多个虚拟特征作为所述预训练联邦模型的联邦训练样本,并通过所述联邦训练样本对所述预训练联邦模型中进行训练,以得到目标联邦模型。本实施例既解决了交集部分样本无信息缺失的完成了任务,又对为交叉部分的数据进行了更好的模型训练,最终得到了一个训练好的目标联邦模型。
在本实施中,所述数据提供终端可以保证真实数据安全不出本地的情况下,配合所述述数据请求终端完成模型训练。所述数据提供终端在配合所述述数据请求终端时,可以进行中间数据的传递。所述中间数据包括明文(不加密的密钥等),也包括加密(通常为同态加密)后的模型和数据信息。
在示例性的实施例中,所述基于交集数据的联邦模型训练方法还可以包括:将所述交集数据集和多个虚拟特征上传到区块链中。
示例性的,将所述交集数据集和所述多个虚拟特征上传至区块链可保证其安全性和公正透明性。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
实施例三。
图3为本申请交集数据的生成系统实施例三的程序模块示意图。交集数据的生成系统30可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述交集数据的生成方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述交集数据的生成系统30在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能。
接收请求模块300,用于接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息。
响应请求模块302,用于响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据。
示例性的,所述响应请求模块302,还用于:根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息;及对所述目标用户信息进行加密处理,以得到第一加密数据。
示例性的,所述响应请求模块302,还用于:根据预设的格式转换规则对所述第一ID信息进行格式转换,以得到所述第一ID信息对应的第二ID信息;及根据所述第二ID信息获取与所述用户ID信息对应的目标用户信息。
接收数据模块304,用于接收所述第二加密数据和所述第三加密数据。
交集处理模块306,用于将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集。
标签处理模块308,用于对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征。
发送数据模块310,用于将所述交集数据集和所述多个虚拟特征发送到数据请求终端。
示例性的,所述交集数据的生成系统还可以包括配置模块,所述配置模块,用于:预先获取所述数据请求终端提供的多个第一ID信息,其中,每个用户ID信息携带有该用户的用户身份信息;根据所述用户身份信息确定所述每个第一ID信息对应的第二ID信息;及根据每个第一ID信息和该第一ID信息对应的第二ID信息,配置所述格式转换规则。
示例性的,所述交集数据的生成系统还可以包括上传模块,所述上传模块,用于:将所述交集数据集和多个虚拟特征上传到区块链中。
实施例四。
图4为本申请基于交集数据的联邦模型训练系统实施例四的程序模块示意图。基于交集数据的联邦模型训练系统40可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述基于交集数据的联邦模型训练方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述基于交集数据的联邦模型训练系统40在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能。
发送请求模块400,用于向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据。
接收响应模块402,用于接收所述第一加密数据。
数据加密模块404,用于对所述第一加密数据进行加密处理,以得到第二加密数据。
获取信息模块406,用于获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据。
接收数据模块408,用于将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征。
模型训练模块410,用于将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
示例性的,所述交集数据的生成系统还可以包括上传模块,所述上传模块,用于:将所述交集数据集和多个虚拟特征上传到区块链中。
实施例五。
参阅图5,是本申请实施例五之计算机设备的硬件架构示意图。本实施例中,所述计算机设备3是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备3可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图所示,所述计算机设备3至少包括,但不限于,存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,计算机程序被处理器执行时实现上述方法中的部分或全部步骤。可选的,计算机设备还可包括网络接口、交集数据的生成系统和/或基于交集数据的联邦模型训练系统。例如,可通过系统总线相互通信连接存储器31、处理器32、网络接口33、以及交集数据的生成系统30或基于交集数据的联邦模型训练系统40。
本实施例中,存储器31至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器31可以是计算机设备3的内部存储单元,例如该计算机设备3的硬盘或内存。在另一些实施例中,存储器31也可以是计算机设备3的外部存储设备,例如该计算机设备3上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器31还可以既包括计算机设备3的内部存储单元也包括其外部存储设备。本实施例中,存储器31通常用于存储安装于计算机设备3的操作系统和各类应用软件,例如实施例三的交集数据的生成系统30或实施例四的基于交集数据的联邦模型训练系统40的程序代码等。此外,存储器31还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器32在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器32通常用于控制计算机设备3的总体操作。本实施例中,处理器32用于运行存储器31中存储的程序代码或者处理数据,例如运行交集数据的生成系统30或基于交集数据的联邦模型训练系统40,以实现实施例一的交集数据的生成或实施例二的基于交集数据的联邦模型训练方法。
所述网络接口33可包括无线网络接口或有线网络接口,该网络接口33通常用于在所述计算机设备3与其他电子装置之间建立通信连接。例如,所述网络接口33用于通过网络将所述计算机设备3与外部终端相连,在所述计算机设备3与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图5仅示出了具有部件30-33的计算机设备3,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器31中的交集数据的生成系统30还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器31中,并由一个或多个处理器(本实施例为处理器32)所执行,以完成本申请。
例如,图3示出了本申请实施例三之所述实现交集数据的生成系统30的程序模块示意图,该实施例中,所述交集数据的生成系统30可以被划分为接收请求模块300、响应请求模块302、接收数据模块304、交集处理模块306、标签处理模块308和发送数据模块310。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述交集数据的生成系统30在所述计算机设备3中的执行过程。所述程序模块300-310的具体功能在实施例三中已有详细描述,在此不再赘述。
实施例六。
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于交集数据的生成系统30或基于交集数据的联邦模型训练系统40,被处理器执行时可以实现实施例一的交集数据的生成方法或实施例二的基于交集数据的联邦模型训练方法。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种交集数据的生成方法,包括:
    接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;
    响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;
    接收所述第二加密数据和所述第三加密数据;
    将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;
    对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
  2. 如权利要求1所述的交集数据的生成方法,其中,所述根据所述用户ID信息返回第一加密数据,包括;
    根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息;及
    对所述目标用户信息进行加密处理,以得到第一加密数据。
  3. 如权利要求2所述的交集数据的生成方法,其中,所述用户ID信息包括第一ID信息;
    所述根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息,包括;
    根据预设的格式转换规则对所述第一ID信息进行格式转换,以得到所述第一ID信息对应的第二ID信息;及
    根据所述第二ID信息获取与所述用户ID信息对应的目标用户信息。
  4. 如权利要求3所述的交集数据的生成方法,其中,还包括配置所述格式转换规则的步骤:
    预先获取所述数据请求终端提供的多个第一ID信息,其中,每个用户ID信息携带有该用户的用户身份信息;
    根据所述用户身份信息确定所述每个第一ID信息对应的第二ID信息;
    根据每个第一ID信息和该第一ID信息对应的第二ID信息,配置所述格式转换规则。
  5. 如权利要求1所述的交集数据的生成方法,其中,还包括:将所述交集数据集和所述多个虚拟特征上传到区块链中。
  6. 一种基于交集数据的联邦模型训练方法,用于数据请求终端,所述方法包括:
    向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;
    接收所述第一加密数据;
    对所述第一加密数据进行加密处理,以得到第二加密数据;
    获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;
    将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
  7. 如权利要求6所述的交集数据的生成方法,其中,还包括:
    将所述交集数据集和多个虚拟特征上传到区块链中。
  8. 一种交集数据的生成系统,包括:
    接收请求模块,用于接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;
    响应请求模块,用于响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;
    接收数据模块,用于接收所述第二加密数据和所述第三加密数据;
    交集处理模块,用于将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;
    标签处理模块,用于对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及
    发送数据模块,用于将所述交集数据集和所述多个虚拟特征发送到数据请求终端。
  9. 一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:
    接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;
    响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;
    接收所述第二加密数据和所述第三加密数据;
    将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;
    对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
  10. 如权利要求9所述的计算机设备,其中,所述根据所述用户ID信息返回第一加密数据时,具体实现;
    根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息;及
    对所述目标用户信息进行加密处理,以得到第一加密数据。
  11. 如权利要求10所述的计算机设备,其中,所述用户ID信息包括第一ID信息;
    所述根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息时,具体实现;
    根据预设的格式转换规则对所述第一ID信息进行格式转换,以得到所述第一ID信息对应的第二ID信息;及
    根据所述第二ID信息获取与所述用户ID信息对应的目标用户信息。
  12. 如权利要求11所述的计算机设备,其中,所述计算机程序被处理器执行时还用于实现配置所述格式转换规则的步骤:
    预先获取所述数据请求终端提供的多个第一ID信息,其中,每个用户ID信息携带有该用户的用户身份信息;
    根据所述用户身份信息确定所述每个第一ID信息对应的第二ID信息;
    根据每个第一ID信息和该第一ID信息对应的第二ID信息,配置所述格式转换规则。
  13. 一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:
    向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;
    接收所述第一加密数据;
    对所述第一加密数据进行加密处理,以得到第二加密数据;
    获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;
    将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
  14. 如权利要求13所述的计算机设备,其中,所述计算机程序被处理器执行时还用于实现:
    将所述交集数据集和多个虚拟特征上传到区块链中。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
    接收数据请求终端发送的ID交集请求,所述ID交集请求携带有至少一个用户ID信息;
    响应于所述ID交集请求,并根据所述用户ID信息返回第一加密数据,以使所述数据请求终端根据所述第一加密数据返回第二加密数据和第三加密数据;
    接收所述第二加密数据和所述第三加密数据;
    将所述第一加密数据、所述第二加密数据和所述第三加密数据输入到预先配置好的交集模型中进行交集处理,以得到交集数据集和非交集数据集;
    对所述非交集数据集中的每个非交集数据进行特征标签化处理,以生成多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征发送到数据请求终端进行联邦训练。
  16. 如权利要求15所述的计算机可读存储介质,其中,所述根据所述用户ID信息返回第一加密数据时,具体执行;
    根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息;及
    对所述目标用户信息进行加密处理,以得到第一加密数据。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述用户ID信息包括第一ID信息;
    所述根据所述用户ID信息获取与所述用户ID信息对应的目标用户信息时,具体执行;
    根据预设的格式转换规则对所述第一ID信息进行格式转换,以得到所述第一ID信息对应的第二ID信息;及
    根据所述第二ID信息获取与所述用户ID信息对应的目标用户信息。
  18. 如权利要求17所述的计算机可读存储介质,其中,所述计算机程序被至少一个处理器所执行,还用于使所述至少一个处理器执行配置所述格式转换规则的步骤:
    预先获取所述数据请求终端提供的多个第一ID信息,其中,每个用户ID信息携带有该用户的用户身份信息;
    根据所述用户身份信息确定所述每个第一ID信息对应的第二ID信息;
    根据每个第一ID信息和该第一ID信息对应的第二ID信息,配置所述格式转换规则。
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
    向数据提供终端发送ID交集请求,以使所述数据提供终端根据所述ID交集请求携带的用户ID信息返回第一加密数据;
    接收所述第一加密数据;
    对所述第一加密数据进行加密处理,以得到第二加密数据;
    获取与所述用户ID信息对应的本地用户信息,并对所述本地用户信息进行加密处理,以得到第三加密数据;
    将所述第二加密数据和所述第三加密数据发送到所述数据提供终端,以使所述数据提供终端返回对应的交集数据集和多个虚拟特征;及
    将所述交集数据集和所述多个虚拟特征作为联邦训练样本,对预先配置的预训练联邦模型中进行训练,以得到目标联邦模型。
  20. 如权利要求19所述的计算机可读存储介质,其中,所述计算机程序被至少一个处理器所执行,还用于使所述至少一个处理器执行:
    将所述交集数据集和多个虚拟特征上传到区块链中。
PCT/CN2020/135269 2020-08-07 2020-12-10 交集数据的生成方法和基于交集数据的联邦模型训练方法 WO2021139476A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010786660.2A CN111914277B (zh) 2020-08-07 2020-08-07 交集数据的生成方法和基于交集数据的联邦模型训练方法
CN202010786660.2 2020-08-07

Publications (1)

Publication Number Publication Date
WO2021139476A1 true WO2021139476A1 (zh) 2021-07-15

Family

ID=73287637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135269 WO2021139476A1 (zh) 2020-08-07 2020-12-10 交集数据的生成方法和基于交集数据的联邦模型训练方法

Country Status (2)

Country Link
CN (1) CN111914277B (zh)
WO (1) WO2021139476A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807415A (zh) * 2021-08-30 2021-12-17 中国再保险(集团)股份有限公司 联邦特征选择方法、装置、计算机设备和存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914277B (zh) * 2020-08-07 2023-09-01 平安科技(深圳)有限公司 交集数据的生成方法和基于交集数据的联邦模型训练方法
CN114764707A (zh) * 2021-01-04 2022-07-19 中国移动通信有限公司研究院 联邦学习模型训练方法和系统
CN113032840B (zh) * 2021-05-26 2021-07-30 腾讯科技(深圳)有限公司 一种数据处理方法、装置、设备及计算机可读存储介质
CN116582341B (zh) * 2023-05-30 2024-06-04 连连银通电子支付有限公司 异常检测方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399741A (zh) * 2019-07-29 2019-11-01 深圳前海微众银行股份有限公司 数据对齐方法、设备及计算机可读存储介质
CN110443067A (zh) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 基于隐私保护的联邦建模装置、方法及可读存储介质
CN110796267A (zh) * 2019-11-12 2020-02-14 支付宝(杭州)信息技术有限公司 数据共享的机器学习方法和机器学习装置
CN110942154A (zh) * 2019-11-22 2020-03-31 深圳前海微众银行股份有限公司 基于联邦学习的数据处理方法、装置、设备及存储介质
CN111177762A (zh) * 2019-12-30 2020-05-19 北京同邦卓益科技有限公司 一种数据处理方法、装置、服务器及联邦学习系统
CN111914277A (zh) * 2020-08-07 2020-11-10 平安科技(深圳)有限公司 交集数据的生成方法和基于交集数据的联邦模型训练方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165683B (zh) * 2018-08-10 2023-09-12 深圳前海微众银行股份有限公司 基于联邦训练的样本预测方法、装置及存储介质
CN109492420B (zh) * 2018-12-28 2021-07-20 深圳前海微众银行股份有限公司 基于联邦学习的模型参数训练方法、终端、系统及介质
CN110955907B (zh) * 2019-12-13 2022-03-25 支付宝(杭州)信息技术有限公司 一种基于联邦学习的模型训练方法
CN111259443B (zh) * 2020-01-16 2022-07-01 百融云创科技股份有限公司 一种基于psi技术保护联邦学习预测阶段隐私的方法
CN111402095A (zh) * 2020-03-23 2020-07-10 温州医科大学 一种基于同态加密联邦学习来检测学生行为与心理的方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399741A (zh) * 2019-07-29 2019-11-01 深圳前海微众银行股份有限公司 数据对齐方法、设备及计算机可读存储介质
CN110443067A (zh) * 2019-07-30 2019-11-12 卓尔智联(武汉)研究院有限公司 基于隐私保护的联邦建模装置、方法及可读存储介质
CN110796267A (zh) * 2019-11-12 2020-02-14 支付宝(杭州)信息技术有限公司 数据共享的机器学习方法和机器学习装置
CN110942154A (zh) * 2019-11-22 2020-03-31 深圳前海微众银行股份有限公司 基于联邦学习的数据处理方法、装置、设备及存储介质
CN111177762A (zh) * 2019-12-30 2020-05-19 北京同邦卓益科技有限公司 一种数据处理方法、装置、服务器及联邦学习系统
CN111914277A (zh) * 2020-08-07 2020-11-10 平安科技(深圳)有限公司 交集数据的生成方法和基于交集数据的联邦模型训练方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807415A (zh) * 2021-08-30 2021-12-17 中国再保险(集团)股份有限公司 联邦特征选择方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN111914277B (zh) 2023-09-01
CN111914277A (zh) 2020-11-10

Similar Documents

Publication Publication Date Title
WO2021139476A1 (zh) 交集数据的生成方法和基于交集数据的联邦模型训练方法
WO2021204040A1 (zh) 联邦学习数据处理方法、装置、设备及存储介质
WO2020233373A1 (zh) 一种应用程序的配置文件管理方法及装置
CN108734028B (zh) 基于区块链的数据管理方法、区块链节点及存储介质
CA3136622A1 (en) Systems, devices, and methods for dlt-based data management platforms and data products
WO2021139467A1 (zh) 联邦学习方法、系统、计算机设备和存储介质
US10609010B2 (en) System, methods and software application for sending secured messages on decentralized networks
CN107925660A (zh) 数据访问和所有权管理
CN111274268A (zh) 物联网数据传输方法、装置、介质及电子设备
CN112988674B (zh) 大数据文件的处理方法、装置、计算机设备及存储介质
US20180212952A1 (en) Managing exchanges of sensitive data
JP7483929B2 (ja) 共同トレーニングモデルを評価するための方法及び装置
CN111753324B (zh) 私有数据的处理方法、计算方法及所适用的设备
CN112307515B (zh) 基于数据库的数据处理方法、装置、电子设备和介质
CN113259382B (zh) 数据传输方法、装置、设备及存储介质
CN111611621A (zh) 基于区块链的分布式数据加密存储方法和电子设备
CN112231309B (zh) 纵向联邦数据统计的去重方法、装置、终端设备及介质
CN111767144A (zh) 交易数据的交易路由确定方法、装置、设备及系统
CN111291420B (zh) 一种基于区块链的分布式离链数据存储方法
CN112184444A (zh) 基于信息的特征进行信息处理的方法、装置、设备及介质
US20170249349A1 (en) Techniques to manage a remote data store for an electronic device
CN114358775A (zh) 基于Fabric与IPFS的物联网溯源方法、及其相关设备
CN117669582A (zh) 一种基于深度学习的工程咨询处理方法、装置及电子设备
US20140090032A1 (en) System and method for real time secure image based key generation using partial polygons assembled into a master composite image
CN110134387A (zh) 贷款业务中的单证模板数据处理方法、装置及计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912531

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912531

Country of ref document: EP

Kind code of ref document: A1