CN111914277A - Intersection data generation method and federal model training method based on intersection data - Google Patents

Intersection data generation method and federal model training method based on intersection data Download PDF

Info

Publication number
CN111914277A
CN111914277A CN202010786660.2A CN202010786660A CN111914277A CN 111914277 A CN111914277 A CN 111914277A CN 202010786660 A CN202010786660 A CN 202010786660A CN 111914277 A CN111914277 A CN 111914277A
Authority
CN
China
Prior art keywords
data
intersection
information
user
encrypted data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010786660.2A
Other languages
Chinese (zh)
Other versions
CN111914277B (en
Inventor
周学立
张茜
凌海挺
蔡满天
刘丽扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010786660.2A priority Critical patent/CN111914277B/en
Publication of CN111914277A publication Critical patent/CN111914277A/en
Priority to PCT/CN2020/135269 priority patent/WO2021139476A1/en
Application granted granted Critical
Publication of CN111914277B publication Critical patent/CN111914277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to the field of big data, and provides an intersection data generation method, which comprises the following steps: receiving an ID intersection request sent by a data request terminal, wherein the ID intersection request carries at least one user ID information; responding to the ID intersection request; receiving the second encrypted data and the third encrypted data; inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set; performing feature tagging on each non-intersecting data in the non-intersecting data set to generate a plurality of virtual features; and sending the intersection data set and the plurality of virtual features to a data request terminal for federal training, and uploading the intersection data set and the plurality of virtual features to a block chain. The invention solves the problem that the user information is easy to leak in the federal learning and improves the data security of the user.

Description

Intersection data generation method and federal model training method based on intersection data
Technical Field
The embodiment of the invention relates to the field of data transmission, in particular to a method for generating intersection data, a method and a system for federal model training based on the intersection data, computer equipment and a computer readable storage medium.
Background
With the advent of the big data era, the problem of data islanding in the internet field is more and more prominent. And the occurrence of federal learning plays a crucial role in solving the problem of data islanding in the internet field to a certain extent. Currently, federal learning mainly solves the data island problem by performing intersection matching on user IDs and completing federal learning of ID users in the intersection part after successful matching. But easily cause the leakage of the information of the non-intersection part of the user, and have certain potential safety hazard. Therefore, how to carry out federal learning under the condition of guaranteeing that user information is not leaked safely and reliably becomes one of the technical problems to be solved at present.
Disclosure of Invention
In view of this, it is necessary to provide a method for generating intersection data, a method, a system, a computer device, and a computer-readable storage medium for federal model training based on intersection data, so as to solve the technical problems that user information is easily leaked in the current federal learning.
In order to achieve the above object, an embodiment of the present invention provides a method for generating intersection data, where the method includes:
receiving an ID intersection request sent by a data request terminal, wherein the ID intersection request carries at least one user ID information;
responding to the ID intersection request, and returning first encrypted data according to the user ID information, so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data;
receiving the second encrypted data and the third encrypted data;
inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set;
performing feature tagging on each non-intersecting data in the non-intersecting data set to generate a plurality of virtual features; and
and sending the intersection data set and the plurality of virtual features to a data request terminal for federal training.
Illustratively, the returning of the first encrypted data according to the user ID information includes;
acquiring target user information corresponding to the user ID information according to the user ID information; and
and encrypting the target user information to obtain first encrypted data.
Illustratively, the user ID information includes first ID information;
acquiring target user information corresponding to the user ID information according to the user ID information, wherein the target user information comprises the user ID information;
carrying out format conversion on the first ID information according to a preset format conversion rule to obtain second ID information corresponding to the first ID information; and
and acquiring target user information corresponding to the user ID information according to the second ID information.
Exemplarily, the method further comprises the step of configuring the format conversion rule:
acquiring a plurality of first ID information provided by the data request terminal in advance, wherein each user ID information carries user identity information of the user;
determining second ID information corresponding to each first ID information according to the user identity information;
and configuring the format conversion rule according to each piece of first ID information and second ID information corresponding to the first ID information.
Exemplary, also include: uploading the plurality of time series data into a blockchain.
In order to achieve the above object, an embodiment of the present invention further provides a federated model training method based on intersection data, which is used for a data request terminal, and the method includes:
sending an ID intersection request to a data providing terminal so that the data providing terminal returns first encrypted data according to user ID information carried by the ID intersection request;
receiving the first encrypted data;
encrypting the first encrypted data to obtain second encrypted data;
acquiring local user information corresponding to the user ID information, and encrypting the local user information to obtain third encrypted data;
sending the second encrypted data and the third encrypted data to the data providing terminal so that the data providing terminal returns a corresponding intersection data set and a plurality of virtual features; and
and taking the intersection data set and the virtual features as federal training samples, and training a pre-configured pre-trained federal model to obtain a target federal model.
Exemplary, also include: uploading the intersection dataset and the plurality of virtual features into a blockchain.
In order to achieve the above object, an embodiment of the present invention further provides a system for generating intersection data, including:
the system comprises a receiving request module, a receiving request module and a sending module, wherein the receiving request module is used for receiving an ID intersection request sent by a data request terminal, and the ID intersection request carries at least one user ID information;
the response request module is used for responding to the ID intersection request and returning first encrypted data according to the user ID information so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data;
a data receiving module, configured to receive the second encrypted data and the third encrypted data;
the intersection processing module is used for inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set;
the tag processing module is used for performing feature tagging processing on each non-intersection data in the non-intersection data set to generate a plurality of virtual features; and
and the data sending module is used for sending the intersection data set and the plurality of virtual characteristics to a data request terminal.
To achieve the above object, an embodiment of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored on the memory and operable on the processor, and when executed by the processor, the computer program implements the steps of the intersection data generation method or the intersection data-based federal model training method as described above.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, where the computer program is executable by at least one processor, so as to cause the at least one processor to execute the steps of the intersection data generation method or the intersection data-based federated model training method as described above.
According to the intersection data generation method, the union model training method and system based on intersection data, the computer equipment and the computer readable storage medium, the problem that the user information is easy to leak due to union learning is solved by performing feature labeling processing on the non-intersection data of the user information, and the data security of the user is improved.
Drawings
Fig. 1 is a flowchart illustrating a method for generating intersection data according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of a federated model training method based on intersection data in the second embodiment of the present invention.
FIG. 3 is a schematic diagram of program modules of a system for generating intersection data according to a third embodiment of the present invention.
FIG. 4 is a schematic diagram of program modules of a fourth embodiment of the federated model training system based on intersection data.
Fig. 5 is a schematic diagram of a hardware structure of a third embodiment of the computer apparatus according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart illustrating steps of a method for generating intersection data according to an embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is given by taking a data providing terminal as an execution subject, where the data providing terminal is a data providing terminal and can perform encryption operation on data. The details are as follows.
Step S100, an ID intersection request sent by a data request terminal is received, and the ID intersection request carries at least one user ID information.
The data providing terminal may receive an ID intersection request sent by the data requesting terminal, where the ID intersection request carries at least one user ID information.
The data request terminal has a function of sending a request to the data providing terminal for an initiator of a service request. The data providing terminal can be another independent complete individual and has own computing capability.
The data requesting terminal and the data providing terminal may communicate with each other. The request is typically expressed in the form of sending a data message, receiving a data message, communicating a status command, and so forth. The data providing terminal and the data requesting terminal may be computers, computing clusters, tablet personal computers (tablets), laptop computers (laptop computers), and other devices having a data transmission function.
And step S102, responding to the ID intersection request, and returning first encrypted data according to the user ID information, so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data.
After receiving the ID intersection request, the data providing terminal may generate a key required by a first encryption algorithm, encrypt data corresponding to the user ID information with the key of the first encryption algorithm to obtain the first encrypted data, and send the first encrypted data to the data requesting terminal. And the data request terminal encrypts the first encrypted data according to a second encryption algorithm to obtain second encrypted data. And local user information corresponding to the user ID information is acquired, and the local user information is encrypted through a second encryption algorithm to obtain third encrypted data.
In an exemplary embodiment, the step S102 may further include steps S102a to S102b, wherein: step S102a, acquiring target user information corresponding to the user ID information according to the user ID information; and step S102b, encrypting the target user information to obtain first encrypted data.
The target user information is the user information of the target user in the data providing terminal. The data providing terminal may obtain, from the data providing terminal, target user information corresponding to the user ID information according to the user ID information, where the target user information is information of a user corresponding to the user ID information at the data providing terminal. It should be noted that the same user may register an account on the application associated with the data providing terminal and the application associated with the data requesting terminal, respectively. Since the information is of the same user, the data providing terminal may obtain the target user information of the target user on the data providing terminal corresponding to the user ID information according to the user ID information.
It is understood that information of different users in different applications may differ, and in order to ensure information security of the target user, the data providing terminal may encrypt the target user information after obtaining the target user information, so as to obtain first encrypted data.
In an exemplary embodiment, the user ID information includes first ID information; the step S102a may further include steps S102a1 to S102a2, wherein: step S102a1, performing format conversion on the first ID information according to a preset format conversion rule to obtain second ID information corresponding to the first ID information; and step S102a2, obtaining the target user information corresponding to the user ID information according to the second ID information.
In an exemplary embodiment, the target user may perform information registration at the data providing terminal to obtain the first ID information, and may also perform information registration at the data requesting terminal to obtain the second ID information. For example, the first ID information may be "X123", and the second ID information may be "XX 123". After the data providing terminal obtains the user ID information, second ID information corresponding to the first ID information may be generated according to first ID information carried by the user ID information and the format conversion rule to obtain second ID information, and then target user information corresponding to the user ID information may be obtained from a database associated with the data providing terminal according to the second ID information.
Step S104, receiving the second encrypted data and the third encrypted data.
The data request terminal may encrypt the first encrypted data after receiving the first encrypted data provided by the data providing terminal, so as to obtain second encrypted data. And local user information corresponding to the user ID information is acquired, and the local user information is encrypted to obtain third encrypted data. And the local user information is the user information of the target user at the data request terminal. In some embodiments, the data request terminal may encrypt the first encrypted data through a second encryption algorithm to obtain second encrypted data. And encrypting the local user information through a second encryption algorithm to obtain third encrypted data.
Step S106, inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set.
In some embodiments, the intersection model may decrypt the second encrypted data to obtain a decrypted result, determine whether the decrypted result is the same as the first encrypted data, and if so, intersect the first encrypted data and the third encrypted data to obtain the intersection data set and the non-intersection data set of the first encrypted data and the third encrypted data. Wherein the intersection model is a model for calculating an intersection of two sets of data, for example, the first encrypted data is [1, 5, 7, 6, 8, 9], the third data is [1, 2, 7, 8], then the intersection data set is [1, 7, 8], and the non-intersection data set is [2, 5, 6, 9 ].
Step S108, performing feature tagging processing on each non-intersection data in the non-intersection data set to generate a plurality of virtual features.
In order to ensure data security of users in different applications, the data providing terminal may perform feature tagging on each non-intersecting data in the non-intersecting data set to generate a plurality of virtual features. For example, the non-intersecting dataset [2, 5, 6, 9] is converted into a plurality of virtual features: null, tag.
And step S110, sending the intersection data set and the plurality of virtual characteristics to a data request terminal for federal training.
After the data providing terminal obtains the intersection data set and the plurality of virtual features, the intersection data set and the plurality of virtual features may be sent to the data requesting terminal, so that the data requesting terminal trains the federated model according to the intersection data set and the plurality of virtual features.
In an exemplary embodiment, the intersection data generating method may further include steps S112a to S112c of configuring the format conversion rule, wherein: step S112a, obtaining in advance a plurality of first ID information provided by the data request terminal, where each user ID information carries user identity information of the user; step S112b, determining second ID information corresponding to each piece of first ID information according to the user identity information; and step S112c, configuring the format conversion rule according to each first ID information and the second ID information corresponding to the first ID information.
In an exemplary embodiment, each user may register an account in a different application for corresponding account information. For example, the target user may perform information registration at the data providing terminal to obtain the first ID information, and may also perform information registration at the data requesting terminal to obtain the second ID information. Wherein the first ID information may be "X123" and the second ID information may be "XX 123". Since the first ID information and the second ID information correspond to the same user (target user), both the data providing terminal and the data requesting terminal have the real identity information of the target user, that is, the first ID information may determine the corresponding second ID information according to the real identity information of the target user, and configure the format conversion rule according to the first ID information and the second ID information. For example, the first ID information may be "X123", and the conversion rule into the second ID information may be "XX 123", in which "X" is added in front of "X123" to obtain "XX 123".
In an exemplary embodiment, the method for generating intersection data may further include: uploading the intersection dataset and the plurality of virtual features into a blockchain.
For example, uploading the intersection dataset and the plurality of virtual features to a blockchain may ensure security and fair transparency. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Example two
Referring to fig. 2, a flowchart illustrating steps of a federated model training method based on intersection data according to an embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is exemplarily described with a data request terminal as an execution subject, where the data request terminal may send request information to a data providing terminal, so that the data providing terminal returns corresponding data. The details are as follows.
Step S200, an ID intersection request is sent to a data providing terminal, so that the data providing terminal returns first encrypted data according to user ID information carried by the ID intersection request.
And the ID intersection request is used for indicating the data providing terminal to return corresponding encrypted data according to the ID intersection request.
The data requesting terminal may send an ID intersection request to the data providing terminal. And the ID intersection request carries user ID information of a target user.
The data providing terminal may obtain, from the data providing terminal, target user information corresponding to the user ID information according to the user ID information, where the target user information is information of a user corresponding to the user ID information at the data providing terminal. And carrying out encryption operation on the target user information through a first encryption algorithm to obtain first encrypted data. Wherein the data providing terminal may generate a key required for the first encryption algorithm after receiving the ID intersection request. After the data providing terminal obtains the first encrypted data, the first encrypted data may be sent to the data requesting terminal.
The data request terminal is the initiator of the service request, has the function of sending a request (request matching support data support) to the data providing terminal, and can carry out the training of the federal model according to the data returned by the data providing terminal. The data providing terminal can be another independent complete individual, has own computing power, can respond to the ID intersection request sent by the data request terminal, and is matched with the data request terminal to complete the federal training of the model.
Step S202, receiving the first encrypted data.
Step S204, performing encryption processing on the first encrypted data to obtain second encrypted data.
Step S206, obtaining local user information corresponding to the user ID information, and encrypting the local user information to obtain third encrypted data.
The data request terminal may encrypt the first encrypted data after receiving the first encrypted data provided by the data providing terminal, so as to obtain second encrypted data. And local user information corresponding to the user ID information is acquired, and the local user information is encrypted to obtain third encrypted data. And the local user information is the user information of the target user at the data request terminal. In some embodiments, the data request terminal may encrypt the first encrypted data through a second encryption algorithm to obtain second encrypted data. And encrypting the local user information through a second encryption algorithm to obtain third encrypted data.
Step S208, sending the second encrypted data and the third encrypted data to the data providing terminal, so that the data providing terminal returns the corresponding intersection data set and the plurality of virtual features.
After the data request terminal obtains the second encrypted data and the third encrypted data, the second encrypted data and the third encrypted data may be transmitted to the data providing terminal. After receiving the second encrypted data and the third encrypted data again, the data providing terminal may input the first encrypted data, the second encrypted data, and the third encrypted data to a pre-configured intersection model for intersection processing, so as to obtain an intersection data set and a non-intersection data set. In some embodiments, the intersection model may decrypt the second encrypted data to obtain a decrypted result, determine whether the decrypted result is the same as the first encrypted data, and if so, intersect the first encrypted data and the third encrypted data to obtain the intersection data set and the non-intersection data set of the first encrypted data and the third encrypted data. Wherein the intersection model is a model for calculating an intersection of two sets of data, for example, the first encrypted data is [1, 5, 7, 6, 8, 9], the third data is [1, 2, 7, 8], then the intersection data set is [1, 7, 8], and the non-intersection data set is [2, 5, 6, 9 ]. In order to ensure data security of users in different applications, the data providing terminal may perform feature tagging on each non-intersecting data in the non-intersecting data set to generate a plurality of virtual features. For example, the non-intersecting dataset [2, 5, 6, 9] is converted into a plurality of virtual features: null, tag. After the data providing terminal obtains the intersection data set and the plurality of virtual features, the intersection data set and the plurality of virtual features may be sent to the data requesting terminal.
And step S210, taking the intersection data set and the virtual features as federal training samples, and training a pre-configured pre-trained federal model to obtain a target federal model.
In an exemplary embodiment, the data requesting terminal may obtain a federal model to be trained in advance, and pre-train the federal model to be trained through local user data, where the federal model to be trained may be LR, XGB, DNN, or the like. After the intersection data set and the plurality of virtual features of the data providing terminal are obtained, the intersection data set and the plurality of virtual features may be used as a federal training sample of the pre-trained federal model, and the pre-trained federal model is trained through the federal training sample to obtain a target federal model. The method solves the problem that the samples in the intersection part complete the task without information loss, and performs better model training on the data in the intersection part, so as to finally obtain a trained target federal model.
In this implementation, the data providing terminal can complete model training in cooperation with the data requesting terminal under the condition that real data is guaranteed to be safe and not to be out of the local area. The data providing terminal can transmit intermediate data when being matched with the data request terminal. The intermediate data includes plaintext (unencrypted key, etc.), and also includes encrypted (usually homomorphic) model and data information.
In an exemplary embodiment, the federated model training method based on intersection data may further include: uploading the intersection dataset and the plurality of virtual features into a blockchain.
For example, uploading the intersection dataset and the plurality of virtual features to a blockchain may ensure security and fair transparency. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
EXAMPLE III
FIG. 3 is a schematic diagram of program modules of a system for generating intersection data according to a third embodiment of the present invention. The intersection data generation system 30 may include or be divided into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement the present invention and implement the above intersection data generation method. The program module referred to in the embodiments of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the intersection data generation system 30 in the storage medium than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
the receiving request module 300 is configured to receive an ID intersection request sent by a data request terminal, where the ID intersection request carries at least one piece of user ID information.
A response request module 302, configured to respond to the ID intersection request and return first encrypted data according to the user ID information, so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data.
Illustratively, the response request module 302 is further configured to: acquiring target user information corresponding to the user ID information according to the user ID information; and encrypting the target user information to obtain first encrypted data.
Illustratively, the response request module 302 is further configured to: carrying out format conversion on the first ID information according to a preset format conversion rule to obtain second ID information corresponding to the first ID information; and acquiring target user information corresponding to the user ID information according to the second ID information.
A receive data module 304, configured to receive the second encrypted data and the third encrypted data.
And the intersection processing module 306 is configured to input the first encrypted data, the second encrypted data, and the third encrypted data into a preconfigured intersection model for intersection processing, so as to obtain an intersection data set and a non-intersection data set.
A tag processing module 308, configured to perform feature tagging on each non-intersection data in the non-intersection data set to generate a plurality of virtual features.
A data sending module 310, configured to send the intersection data set and the plurality of virtual features to a data requesting terminal.
Illustratively, the system for generating intersection data may further include a configuration module, configured to: acquiring a plurality of first ID information provided by the data request terminal in advance, wherein each user ID information carries user identity information of the user; determining second ID information corresponding to each first ID information according to the user identity information; and configuring the format conversion rule according to each piece of first ID information and second ID information corresponding to the first ID information.
For example, the intersection data generation system may further include an upload module, where the upload module is configured to: uploading the intersection dataset and the plurality of virtual features into a blockchain.
Example four
FIG. 4 is a schematic diagram of program modules of a fourth embodiment of the federated model training system based on intersection data. Intersection data-based federated model training system 40 may include or be partitioned into one or more program modules that are stored in a storage medium and executed by one or more processors to implement the present invention and implement the intersection data-based federated model training methods described above. The program module referred to in the embodiment of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the federated model training system 40 based on intersection data in a storage medium than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
the sending request module 400 is configured to send an ID intersection request to a data providing terminal, so that the data providing terminal returns first encrypted data according to user ID information carried in the ID intersection request.
A receive response module 402 configured to receive the first encrypted data.
A data encryption module 404, configured to encrypt the first encrypted data to obtain second encrypted data.
An obtaining information module 406, configured to obtain local user information corresponding to the user ID information, and encrypt the local user information to obtain third encrypted data.
A data receiving module 408, configured to send the second encrypted data and the third encrypted data to the data providing terminal, so that the data providing terminal returns the corresponding intersection data set and the plurality of virtual features.
And a model training module 410, configured to train the intersection data set and the plurality of virtual features as federal training samples in a pre-configured pre-trained federal model to obtain a target federal model.
For example, the intersection data generation system may further include an upload module, where the upload module is configured to: uploading the intersection dataset and the plurality of virtual features into a blockchain.
EXAMPLE five
Fig. 5 is a schematic diagram of a hardware architecture of a computer device according to a fifth embodiment of the present invention. In the present embodiment, the computer device 3 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 3 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown, the computer device 3 includes, but is not limited to, at least a memory 31, a processor 32, a network interface 33, and an intersection data generation system 30 or an intersection data-based federated model training system 40, which may be communicatively coupled to each other via a system bus.
In this embodiment, the memory 31 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 31 may be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3. In other embodiments, the memory 31 may also be an external storage device of the computer device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device 3. Of course, the memory 31 may also comprise both an internal storage unit of the computer device 3 and an external storage device thereof. In this embodiment, the memory 31 is generally used to store an operating system and various types of application software installed on the computer device 3, such as program codes of the intersection data generation system 30 in the third embodiment or the intersection data-based federated model training system 40 in the fourth embodiment. Further, the memory 31 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 32 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 32 is typically used to control the overall operation of the computer device 3. In this embodiment, the processor 32 is configured to run program codes stored in the memory 31 or process data, for example, run the intersection data generation system 30 or the intersection data-based federal model training system 40, so as to implement the intersection data generation method in the first embodiment or the intersection data-based federal model training method in the second embodiment.
The network interface 33 may comprise a wireless network interface or a wired network interface, and the network interface 33 is typically used for establishing a communication connection between the computer apparatus 3 and other electronic devices. For example, the network interface 33 is used to connect the computer device 3 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 3 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 5 only shows the computer device 3 with components 30-33, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the intersection data generation system 30 stored in the memory 31 may be further divided into one or more program modules, and the one or more program modules are stored in the memory 31 and executed by one or more processors (in this embodiment, the processor 32) to complete the present invention.
For example, fig. 3 is a schematic diagram illustrating program modules of the intersection data generation system 30 according to a third embodiment of the present invention, in which the intersection data generation system 30 may be divided into a request receiving module 300, a request response module 302, a data receiving module 304, an intersection processing module 306, a tag processing module 308, and a data sending module 310. The program module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the intersection data generation system 30 in the computer device 3. The specific functions of the program modules 300 and 310 have been described in detail in the third embodiment, and are not described herein again.
EXAMPLE six
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of this embodiment is used for the intersection data generation system 30 or the intersection data-based federal model training system 40, and when being executed by a processor, the computer-readable storage medium may implement the intersection data generation method of the first embodiment or the intersection data-based federal model training method of the second embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for generating intersection data is characterized by comprising the following steps:
receiving an ID intersection request sent by a data request terminal, wherein the ID intersection request carries at least one user ID information;
responding to the ID intersection request, and returning first encrypted data according to the user ID information, so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data;
receiving the second encrypted data and the third encrypted data;
inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set;
performing feature tagging on each non-intersecting data in the non-intersecting data set to generate a plurality of virtual features; and
and sending the intersection data set and the plurality of virtual features to a data request terminal for federal training.
2. The intersection data generation method according to claim 1, wherein said returning first encrypted data based on said user ID information includes;
acquiring target user information corresponding to the user ID information according to the user ID information; and
and encrypting the target user information to obtain first encrypted data.
3. The intersection data generation method of claim 2, wherein the user ID information includes first ID information;
acquiring target user information corresponding to the user ID information according to the user ID information, wherein the target user information comprises the user ID information;
carrying out format conversion on the first ID information according to a preset format conversion rule to obtain second ID information corresponding to the first ID information; and
and acquiring target user information corresponding to the user ID information according to the second ID information.
4. The intersection data generation method according to claim 3, further comprising the step of configuring the format conversion rule:
acquiring a plurality of first ID information provided by the data request terminal in advance, wherein each user ID information carries user identity information of the user;
determining second ID information corresponding to each first ID information according to the user identity information;
and configuring the format conversion rule according to each piece of first ID information and second ID information corresponding to the first ID information.
5. The intersection data generation method of claim 1, further comprising: uploading the plurality of time series data into a blockchain.
6. A federated model training method based on intersection data is characterized in that the method is used for a data request terminal, and comprises the following steps:
sending an ID intersection request to a data providing terminal so that the data providing terminal returns first encrypted data according to user ID information carried by the ID intersection request;
receiving the first encrypted data;
encrypting the first encrypted data to obtain second encrypted data;
acquiring local user information corresponding to the user ID information, and encrypting the local user information to obtain third encrypted data;
sending the second encrypted data and the third encrypted data to the data providing terminal so that the data providing terminal returns a corresponding intersection data set and a plurality of virtual features; and
and taking the intersection data set and the virtual features as federal training samples, and training a pre-configured pre-trained federal model to obtain a target federal model.
7. The intersection data generation method of claim 6, further comprising:
uploading the intersection dataset and the plurality of virtual features into a blockchain.
8. A system for generating intersection data, comprising:
the system comprises a receiving request module, a receiving request module and a sending module, wherein the receiving request module is used for receiving an ID intersection request sent by a data request terminal, and the ID intersection request carries at least one user ID information;
the response request module is used for responding to the ID intersection request and returning first encrypted data according to the user ID information so that the data request terminal returns second encrypted data and third encrypted data according to the first encrypted data;
a data receiving module, configured to receive the second encrypted data and the third encrypted data;
the intersection processing module is used for inputting the first encrypted data, the second encrypted data and the third encrypted data into a preset intersection model for intersection processing to obtain an intersection data set and a non-intersection data set;
the tag processing module is used for performing feature tagging processing on each non-intersection data in the non-intersection data set to generate a plurality of virtual features; and
and the data sending module is used for sending the intersection data set and the plurality of virtual characteristics to a data request terminal.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the intersection data based federated model training method of any one of claims 1 to 7.
10. A computer-readable storage medium, having stored therein a computer program executable by at least one processor to cause the at least one processor to perform the steps of the intersection data-based federated model training method of any one of claims 1-7.
CN202010786660.2A 2020-08-07 2020-08-07 Intersection data generation method and federal model training method based on intersection data Active CN111914277B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010786660.2A CN111914277B (en) 2020-08-07 2020-08-07 Intersection data generation method and federal model training method based on intersection data
PCT/CN2020/135269 WO2021139476A1 (en) 2020-08-07 2020-12-10 Intersection data generation method, and federated model training method based on intersection data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010786660.2A CN111914277B (en) 2020-08-07 2020-08-07 Intersection data generation method and federal model training method based on intersection data

Publications (2)

Publication Number Publication Date
CN111914277A true CN111914277A (en) 2020-11-10
CN111914277B CN111914277B (en) 2023-09-01

Family

ID=73287637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010786660.2A Active CN111914277B (en) 2020-08-07 2020-08-07 Intersection data generation method and federal model training method based on intersection data

Country Status (2)

Country Link
CN (1) CN111914277B (en)
WO (1) WO2021139476A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032840A (en) * 2021-05-26 2021-06-25 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer readable storage medium
WO2021139476A1 (en) * 2020-08-07 2021-07-15 平安科技(深圳)有限公司 Intersection data generation method, and federated model training method based on intersection data
CN116582341A (en) * 2023-05-30 2023-08-11 连连银通电子支付有限公司 Abnormality detection method, abnormality detection device, abnormality detection apparatus, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807415A (en) * 2021-08-30 2021-12-17 中国再保险(集团)股份有限公司 Federal feature selection method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
CN110955907A (en) * 2019-12-13 2020-04-03 支付宝(杭州)信息技术有限公司 Model training method based on federal learning
CN111259443A (en) * 2020-01-16 2020-06-09 百融云创科技股份有限公司 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage
CN111402095A (en) * 2020-03-23 2020-07-10 温州医科大学 Method for detecting student behaviors and psychology based on homomorphic encrypted federated learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399741A (en) * 2019-07-29 2019-11-01 深圳前海微众银行股份有限公司 Data alignment method, equipment and computer readable storage medium
CN110443067B (en) * 2019-07-30 2021-03-16 卓尔智联(武汉)研究院有限公司 Federal modeling device and method based on privacy protection and readable storage medium
CN110796267A (en) * 2019-11-12 2020-02-14 支付宝(杭州)信息技术有限公司 Machine learning method and machine learning device for data sharing
CN110942154B (en) * 2019-11-22 2021-07-06 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium based on federal learning
CN111177762B (en) * 2019-12-30 2022-11-08 北京同邦卓益科技有限公司 Data processing method, device, server and federal learning system
CN111914277B (en) * 2020-08-07 2023-09-01 平安科技(深圳)有限公司 Intersection data generation method and federal model training method based on intersection data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
CN109492420A (en) * 2018-12-28 2019-03-19 深圳前海微众银行股份有限公司 Model parameter training method, terminal, system and medium based on federation's study
CN110955907A (en) * 2019-12-13 2020-04-03 支付宝(杭州)信息技术有限公司 Model training method based on federal learning
CN111259443A (en) * 2020-01-16 2020-06-09 百融云创科技股份有限公司 PSI (program specific information) technology-based method for protecting privacy of federal learning prediction stage
CN111402095A (en) * 2020-03-23 2020-07-10 温州医科大学 Method for detecting student behaviors and psychology based on homomorphic encrypted federated learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139476A1 (en) * 2020-08-07 2021-07-15 平安科技(深圳)有限公司 Intersection data generation method, and federated model training method based on intersection data
CN113032840A (en) * 2021-05-26 2021-06-25 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer readable storage medium
CN113032840B (en) * 2021-05-26 2021-07-30 腾讯科技(深圳)有限公司 Data processing method, device, equipment and computer readable storage medium
WO2022247576A1 (en) * 2021-05-26 2022-12-01 腾讯科技(深圳)有限公司 Data processing method and apparatus, device, and computer-readable storage medium
CN116582341A (en) * 2023-05-30 2023-08-11 连连银通电子支付有限公司 Abnormality detection method, abnormality detection device, abnormality detection apparatus, and storage medium

Also Published As

Publication number Publication date
CN111914277B (en) 2023-09-01
WO2021139476A1 (en) 2021-07-15

Similar Documents

Publication Publication Date Title
CN111914277B (en) Intersection data generation method and federal model training method based on intersection data
CN108734028B (en) Data management method based on block chain, block chain link point and storage medium
CN104081713B (en) The long-range trust identification of server and client computer in cloud computing environment and geographical location
WO2020233373A1 (en) Application configuration file management method and device
CN109474662B (en) Product data publishing method and device, computer equipment and storage medium
CN107948152B (en) Information storage method, information acquisition method, information storage device, information acquisition device and information acquisition equipment
CN107248984B (en) Data exchange system, method and device
CN109510840B (en) Method and device for sharing unstructured data, computer equipment and storage medium
CN111753324B (en) Private data processing method, private data computing method and applicable equipment
CN111880919B (en) Data scheduling method, system and computer equipment
CN113259382B (en) Data transmission method, device, equipment and storage medium
CN111586671B (en) Embedded user identification card configuration method and device, communication equipment and storage medium
CN112367164A (en) Service request processing method and device, computer equipment and storage medium
CN113434906B (en) Data query method, device, computer equipment and storage medium
CN110635900A (en) Key management method and system suitable for Internet of things system
CN114095277A (en) Power distribution network secure communication method, secure access device and readable storage medium
CN111291420B (en) Distributed off-link data storage method based on block chain
CN113038463A (en) Communication encryption authentication experimental device
CN111818087A (en) Block chain node access method, device, equipment and readable storage medium
CN115001869B (en) Encryption transmission method and system
CN114357472B (en) Data tagging method, system, electronic device and readable storage medium
CN110888716A (en) Data processing method and device, storage medium and electronic equipment
CN113392062B (en) Data storage method and device, electronic equipment and computer readable storage medium
CN111984631A (en) Production data migration method and device, computer equipment and storage medium
CN113094735A (en) Method for training privacy model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant