CN113542228A  Data transmission method and device based on federal learning and readable storage medium  Google Patents
Data transmission method and device based on federal learning and readable storage medium Download PDFInfo
 Publication number
 CN113542228A CN113542228A CN202110680161.XA CN202110680161A CN113542228A CN 113542228 A CN113542228 A CN 113542228A CN 202110680161 A CN202110680161 A CN 202110680161A CN 113542228 A CN113542228 A CN 113542228A
 Authority
 CN
 China
 Prior art keywords
 ciphertext
 plaintext
 participating node
 data
 limit value
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Granted
Links
 230000005540 biological transmission Effects 0.000 title claims abstract description 57
 238000003860 storage Methods 0.000 title claims abstract description 28
 230000000875 corresponding Effects 0.000 claims abstract description 138
 238000004891 communication Methods 0.000 claims abstract description 36
 239000011159 matrix material Substances 0.000 claims description 94
 238000007906 compression Methods 0.000 claims description 20
 238000004590 computer program Methods 0.000 claims description 20
 238000007667 floating Methods 0.000 claims description 12
 238000000034 method Methods 0.000 description 64
 238000004422 calculation algorithm Methods 0.000 description 39
 238000010586 diagram Methods 0.000 description 16
 238000005516 engineering process Methods 0.000 description 16
 238000004364 calculation method Methods 0.000 description 11
 238000010801 machine learning Methods 0.000 description 11
 230000003993 interaction Effects 0.000 description 10
 230000000996 additive Effects 0.000 description 9
 239000000654 additive Substances 0.000 description 7
 238000004220 aggregation Methods 0.000 description 7
 230000002776 aggregation Effects 0.000 description 6
 239000000203 mixture Substances 0.000 description 6
 230000000694 effects Effects 0.000 description 5
 238000007477 logistic regression Methods 0.000 description 5
 230000006399 behavior Effects 0.000 description 4
 238000011084 recovery Methods 0.000 description 4
 125000004429 atoms Chemical group 0.000 description 3
 238000003066 decision tree Methods 0.000 description 3
 239000003999 initiator Substances 0.000 description 2
 230000001537 neural Effects 0.000 description 2
 210000000988 Bone and Bones Anatomy 0.000 description 1
 COLNVLDHVKWLRTQMMMGPOBSAN Lphenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRTQMMMGPOBSAN 0.000 description 1
 230000004913 activation Effects 0.000 description 1
 230000004931 aggregating Effects 0.000 description 1
 238000009826 distribution Methods 0.000 description 1
 230000002708 enhancing Effects 0.000 description 1
 238000000605 extraction Methods 0.000 description 1
 230000000977 initiatory Effects 0.000 description 1
 238000004519 manufacturing process Methods 0.000 description 1
 230000005012 migration Effects 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 238000003058 natural language processing Methods 0.000 description 1
 230000003068 static Effects 0.000 description 1
Classifications

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L63/00—Network architectures or network communication protocols for network security
 H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
 H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
 H04L63/0442—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
 G06F16/90—Details of database functions independent of the retrieved data types
 G06F16/906—Clustering; Classification

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
 G06F21/60—Protecting data
 G06F21/602—Providing cryptographic facilities or services

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
 G06F21/60—Protecting data
 G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
 G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
 G06F21/6245—Protecting personal data, e.g. for financial or medical purposes

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N20/00—Machine learning

 H—ELECTRICITY
 H04—ELECTRIC COMMUNICATION TECHNIQUE
 H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
 H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
 H04L69/04—Protocols for data compression, e.g. ROHC
Abstract
The application discloses a data transmission method, a device and a readable storage medium based on federal learning, wherein the method comprises the following steps: the first participating node acquires a first ciphertext, performs privacy operation on the first ciphertext according to the service data belonging to the first participating node, and generates a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; performing numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext; performing addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value; and generating a target compressed ciphertext with a polynomial format according to the plaintext prediction limit value and the third ciphertext, and sending the target compressed ciphertext to the second participating node. By adopting the method and the device, the communication overhead in the federal learning task can be effectively reduced, and the running efficiency of the federal learning task is improved.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data transmission method and apparatus based on federal learning, and a readable storage medium.
Background
In the artificial intelligence era, a large amount of training data is needed for obtaining machine learning models, particularly deep learning models, as a premise, in many business scenes, the training data of the models are often dispersed in different business teams, departments and even different companies, and the data cannot be directly used due to the privacy of users, so that a socalled data island is formed. In recent years, the federal Learning technology (federal Learning) is rapidly developed, a new solution is provided for crossteam data cooperation and breaking of data islands, and a landing stage of advancing from theoretical research to batch application is started.
One of the core differences between federal learning and ordinary machine learning tasks is that training participants are changed from one party to two or even more parties, so a core problem is how to coordinate two or more parties to complete a model training task together, and protect the data security of all participants and prevent any party from knowing the data of the other party. In the prior art, homomorphic encryption is one of the most common security techniques, and the technique can enable a participant to complete a specific numerical operation without knowing the data of the other party. However, homomorphic encryption has a significant disadvantage in that homomorphic encryption causes a significant expansion of data, for example, when the key is 3072 bits, a 32bit floatingpoint number is converted into a 6144bit large integer after homomorphic encryption, which is expanded by 192 times, and the data expansion causes a huge communication overhead between the participants.
Disclosure of Invention
The embodiment of the application provides a data transmission method and device based on federal learning and a readable storage medium, which can effectively reduce communication overhead in a federal learning task and improve the running efficiency of the federal learning task.
An embodiment of the present application provides a data transmission method based on federal learning, including:
the first participating node acquires a first ciphertext, performs privacy operation on the first ciphertext according to the service data belonging to the first participating node, and generates a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
performing numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
performing addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
and generating a target compressed ciphertext with a polynomial format according to the plaintext prediction limit value and the third ciphertext, and sending the target compressed ciphertext to the second participating node.
An embodiment of the present application provides a data transmission method based on federal learning, including:
the second participating node receives the target compressed ciphertext sent by the first participating node, and decrypts the target compressed ciphertext by adopting a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to the plaintext prediction threshold value and the third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext refers to a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext refers to data obtained by the second participating node performing encryption processing on the initial plaintext;
performing bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext;
and acquiring a plaintext prediction limit value, and performing subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext.
An aspect of an embodiment of the present application provides a data transmission device based on federal learning, including:
the operation module is used for the first participating node to acquire a first ciphertext, and performing privacy operation on the first ciphertext according to the service data belonging to the first participating node to generate a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
the prediction module is used for performing numerical limit prediction on the second ciphertext to obtain a plaintext prediction limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
the offset module is used for carrying out addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
and the compression module is used for generating a target compressed ciphertext with a polynomial format according to the plaintext prediction threshold value and the third ciphertext and sending the target compressed ciphertext to the second participating node.
Wherein, the abovementioned operation module includes:
the first operation unit is used for acquiring the service data belonging to the first participating node and performing linear algebraic operation on the service data and the first ciphertext to obtain a second ciphertext; the linear algebraic operation comprises one or more of a scalar multiplication operation, a homomorphic addition operation, and a scalar addition operation.
Wherein, the estimation module comprises:
the first obtaining unit is used for obtaining a first numerical range corresponding to the initial plaintext and a second numerical range corresponding to the service data; acquiring a first data dimension of an initial plaintext and a second data dimension of service data, and determining a target data dimension in the first data dimension and the second data dimension according to the operation type of linear algebra operation;
and the first preestimation unit is used for generating a plaintext preestimation limit value corresponding to the second ciphertext based on the operation type of the linear algebra operation, the first numerical range, the second numerical range and the target data dimensionality.
Wherein, the initial plaintext and the service data are both matrixes;
the first obtaining unit is specifically configured to obtain a first matrix width and a first matrix height of the initial plaintext, and determine the first matrix width and the first matrix height as a first data dimension of the initial plaintext; acquiring a second matrix width and a second matrix height of the service data, and determining the second matrix width and the second matrix height as a second data dimension of the service data; when the linear algebra operation is scalar multiplication operation and the first matrix height is equal to the second matrix width, determining the first matrix height as a target data dimension; when the linear algebraic operation is a scalar multiplication operation and the second matrix height is equal to the first matrix width, the second matrix height is determined as the target data dimension.
The first ciphertext comprises at least two subciphertexts;
the abovementioned operation module includes:
the second operation unit is used for acquiring data characteristics corresponding to the business data belonging to the first participating node, and clustering at least two subciphertexts based on the data characteristics to obtain one or more clustering intervals; and respectively carrying out homomorphic addition operation on the subciphertexts in one or more clustering intervals to obtain a clustering subcipher text corresponding to each clustering interval, and determining one or more clustering subcipher texts as second cipher texts.
Wherein, the estimation module comprises:
the second acquisition unit is used for acquiring a numerical range and a data dimension corresponding to the initial plaintext;
and the second prediction unit is used for generating a plaintext prediction limit value corresponding to the second ciphertext based on the operation type, the numerical range and the data dimension of homomorphic addition operation.
The offset module is specifically configured to obtain an upper limit value in a plaintext prediction limit value, and perform scalar addition operation on the upper limit value and the second ciphertext to obtain a third ciphertext; and the plaintext corresponding to the third ciphertext is a nonnegative number.
The compression module is specifically configured to obtain a shift parameter for representing the estimated threshold value, and perform scalar multiplication operation on the third ciphertext based on the shift parameter to obtain at least two ciphertext polynomials; and carrying out homomorphic addition operation on at least two ciphertext monomials to obtain the target compressed ciphertext with the polynomial format.
An aspect of an embodiment of the present application provides a data transmission device based on federal learning, including:
the decryption module is used for the second participating node to receive the target compressed ciphertext sent by the first participating node and decrypt the target compressed ciphertext by adopting a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to the plaintext prediction threshold value and the third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext refers to a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext refers to data obtained by the second participating node performing encryption processing on the initial plaintext;
the decompression module is used for carrying out bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext;
and the restoration module is used for acquiring the plaintext prediction limit value and carrying out subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain the target plaintext.
Wherein, the decompression module comprises:
the decompression unit is used for carrying out bit operation on the first compressed plaintext with the polynomial format to obtain at least two decompressed plaintext with the integer format;
the decoding unit is used for acquiring the scaling factor and the exponent item parameters corresponding to the at least two decompressed plaintexts respectively, decoding each decompressed plaintexts respectively according to the scaling factor and the at least two exponent item parameters to obtain at least two subplaintexts with a floating point number format, and determining the at least two subplaintexts as a second plaintexts.
The recovery module is specifically configured to obtain an upper limit value in the plaintext prediction limit value, subtract at least two subplaintext from the upper limit value to obtain at least two recovered subplaintext, and determine at least two complex atom plaintext as a target plaintext.
An aspect of an embodiment of the present application provides a computer device, including: a processor, a memory, a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the method in the embodiment of the present application.
An aspect of the present embodiment provides a computerreadable storage medium, in which a computer program is stored, where the computer program is adapted to be loaded by a processor and to execute the method in the present embodiment.
An aspect of the embodiments of the present application provides a computer program product or a computer program, where the computer program product or the computer program includes computer instructions, the computer instructions are stored in a computerreadable storage medium, and a processor of a computer device reads the computer instructions from the computerreadable storage medium, and executes the computer instructions, so that the computer device executes the method in the embodiments of the present application.
The embodiment of the application supports a first participating node to obtain a first ciphertext sent by a second participating node, and can perform privacy operation on the first ciphertext according to service data belonging to the first participating node, so that a second ciphertext can be generated, further, numerical limit estimation can be performed on the obtained second ciphertext, a plaintext estimation limit value corresponding to the second ciphertext can be obtained, addition offset processing can be performed on the second ciphertext based on the plaintext estimation limit value, a third ciphertext can be obtained, then, a target compressed ciphertext with a polynomial format can be generated according to the plaintext estimation limit value and the third ciphertext, and finally, the target compressed ciphertext can be sent to the second participating node. Therefore, in the process of carrying out federated learning by the first participating node and the second participating node, the first participating node packs and compresses a plurality of ciphertexts together for transmission by the method provided by the embodiment of the application before sending the ciphertexts to the second participating node, so that the communication overhead caused by sending the ciphertexts can be greatly reduced, the running efficiency of the federated learning task can be obviously improved, and the availability of the federated learning is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a system architecture diagram according to an embodiment of the present application;
2 a2 b are schematic diagrams of a scenario of data transmission based on federal learning according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application;
fig. 5 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application;
fig. 6 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a data transmission apparatus based on federal learning according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data transmission device based on federal learning according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.
Federal Learning (Federal bone Learning/Federal Learning) also known as Federal machine Learning, joint Learning and alliance Learning is a machine Learning framework, and can effectively help a plurality of organizations to perform data use and machine Learning modeling under the condition of meeting the requirements of user privacy protection, data safety and government regulations. According to the distribution characteristics of data, the federal learning can be divided into three categories of horizontal federal learning, longitudinal federal learning and federal migration learning. Federal learning is used as a distributed machine learning paradigm, the problem of data island can be effectively solved, and participators are enabled to jointly model on the basis of not sharing data, so that the data island is technically broken, and AI cooperation is realized.
The scheme provided by the embodiment of the application relates to technologies such as artificial intelligence federal learning and deep learning, and the specific process is explained by the following embodiment.
Please refer to fig. 1, which is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in fig. 1, the system architecture may be a network formed by a plurality of participating nodes, and the plurality of participating nodes may include a participating node 100, a participating node 200a, participating nodes 200b, …, and a participating node 200n, where there may be a communication connection between any two participating nodes, for example, a communication connection between the participating node 100 and the participating node 200a, a communication connection between the participating node 100 and the participating node 200b, and a communication connection between the participating node 200a and the participating node 200 b. It should be understood that the communication connection is not limited to the connection manner, and may be directly or indirectly connected through a wired communication manner, may be directly or indirectly connected through a wireless communication manner, and may also be connected through other manners, and the present application is not limited herein.
It is understood that the participating nodes may be distributed in different teams, departments or companies, that is, each participating node may store data related to a respective user and having different data characteristics, and the data may be stored in a respective local cluster, for example, participating node 100 may have a basic representation of the user, mainly some static labels (such as age, gender, academic calendar, etc.), participating node 200a may have information of purchasing behavior, interests, browsing history, etc. of the user, and participating node 200b may have information of collecting and paying behavior, credit rating, etc. of the user. Further, when two or more participating nodes participate in a model training task together, the respective local data are required to be used as training data to participate in the model training, and the participating nodes can complete the model training task by adopting a federal learning scheme on the premise that the data are not exported and the data privacy is protected. For ease of understanding, taking the participating node 100 and the participating node 200a as an example, assume that during the course of twoparty vertical federal learning, the participating node 100 (e.g., belonging to an advertising company) and the participating node 200a (e.g., belonging to a social networking platform) cooperate to attempt to jointly train one or more deep learningbased personalized recommendation models, wherein the participating node 100 possesses some data features, e.g., (X)_{1}，X_{2}，…，X_{40}) There are 40 dimensional data features (e.g., age, gender, etc.), while the participating node 200a possesses another portion of the data features, e.g., (X)_{41}，X_{42}，…，X_{100}) Total 60dimensional data features (example)Such as interests, browsing history, etc.), when the participating node 100 and the participating node 200a are joined, they have more data characteristics, e.g., (X)_{1}，X_{2}，…，X_{40}) And (X)_{41}，X_{42}，…，X_{100}) There are 100dimensional data features in combination, which, as can be appreciated, greatly extends the feature dimensions of the training data. Wherein for supervised deep learning, participating node 100 and/or participating node 200a may also possess label information of the training data. After receiving the model training related instruction, the participating node 100 and the participating node 200a may each independently model locally, and then may use local training data to train the model, it should be noted that, in the process of training the models by both parties, the participating node 100 and the participating node 200a may continuously interact with intermediate calculation results (such as partial derivatives or gradients) under encryption protection, meanwhile, the owner of the label information may calculate prediction errors (i.e. loss functions) according to the label information owned by the owner, and may further continuously update respective model parameters according to the interaction results, when the model converges, the training task ends, at this time, the participating node 100 and the participating node 200a respectively hold model parameters related to their own data features in the model, and then may use the finally obtained model to provide corresponding services together, for example, use a trained personalized recommendation model, personalized ad push may be provided for platform users on a social networking platform.
It will be appreciated that the process of exchanging model information by the participating nodes is carefully designed and fully encrypted so that none of the participating nodes can guess the private data content of any other participating node, but the goal of joint modeling can be achieved, i.e. the process should not reveal any party's data information nor reverse the intermediate result of the data information, except for the model parameters ultimately produced. In addition, the system architecture provided by the embodiment of the application can be a decentralized system architecture, after data related to model training is encrypted through an encryption algorithm, direct communication is performed among all the participating nodes without depending on third parties to participate in forwarding, decentralized is achieved, the whole training process only needs to coordinate the progress of all the participating nodes, and therefore safety in practical application can be enhanced.
It should be noted that the encryption method (e.g., the dynamic encryption) adopted in the existing federal learning task often causes severe data expansion and causes huge communication overhead, so that the embodiment of the present application provides a data transmission method based on federal learning, which can effectively reduce the communication overhead in the federal learning task, and for convenience of understanding and description, the participating node 100 and the participating node 200a are still taken as examples here. Assuming that the participating node 100 owns the plaintext X and the participating node 200a owns the plaintext Y, where the plaintext X and the plaintext Y may include some data (such as gradient) generated in the model training process, at this time, both sides want to obtain a desired interaction result by performing some operation on the plaintext X and the plaintext Y, and if the participating node 100 is taken as an "initiator" of the operation, the participating node 100 may encrypt the local plaintext X by using a public key to obtain a ciphertext X ', and may further send the ciphertext X' to the participating node 200a, and after receiving the ciphertext X ', the participating node 200a may perform a desired privacy operation on the ciphertext X' according to the local plaintext Y, thereby obtaining the ciphertext Z_{1}However, since the data expansion is caused by the encryption, the participating node 200a cannot directly encrypt the ciphertext Z_{1}Sent to the participating nodes 100, but intended to be ciphertext Z_{1}The participating nodes 200a may transmit the ciphertext Z after performing the packing and compression, and in particular_{1}The numerical limit is estimated to obtain a ciphertext Z_{1}Corresponding plaintext prediction threshold W (i.e. ciphertext Z)_{1}Corresponding value range of the plaintext), then participating node 200a may predict threshold value W for ciphertext Z based on the plaintext_{1}Performing additive offset processing to obtain ciphertext Z_{2}Furthermore, the threshold value W and the ciphertext Z can be estimated according to the plaintext_{2}Generating a compressed ciphertext Z having a polynomial format_{3}And further the compressed ciphertext Z may be generated_{3}And transmitted back to participating node 100. It should be noted that, correspondingly, the subsequent participating node 100 receives the compressed ciphertext Z_{3}Thereafter, the compressed ciphertext Z may be paired with a private key (also referred to as a key)_{3}Performing decryption processing, and performing corresponding decompression and subtractionAnd (5) restoring to obtain a real operation result. It can be understood that the method provided in the embodiment of the present application can be smoothly extended to application scenarios of multiple participating nodes (more than two), and the interaction process of the method is consistent with the interaction process of two participating nodes, which is not described herein again.
In the process of performing joint learning by multiple (including two) participating nodes, the abovedescribed interaction process may continuously occur, and it can be understood that any participating node (which may be any one of the participating nodes 100, 200a, 200b, and 200 n) initiating the interaction operation may possess a private key (e.g., the abovedescribed exemplary participating node 100) for decryption, and the other participating nodes have no knowledge of the private key. The algorithm used for encryption may be a homomorphic encryption algorithm, and the encryption algorithm has one characteristic: and (4) operating the encrypted data which is encrypted in the same state, decrypting the operation result, and obtaining the same operation result as the same operation result of the unencrypted original plaintext data. By utilizing the characteristics, each participant can adopt homomorphic encryption on the relevant data of the model needing interaction and directly send the homomorphic encryption to other relevant participant nodes, the other participant completes the calculation required by training on the ciphertext and then returns the result, and the receiver can obtain the calculated result after decryption and cannot acquire the original data, so that the safety of the data is ensured.
Compared with the prior art, the method provided by the embodiment of the application can pack and compress a plurality of ciphertexts to be transmitted together before sending the ciphertexts, so that the communication overhead caused by the transmission of the ciphertexts in the federal learning task can be effectively reduced.
The participating nodes 100, 200a, 200b, …, and 200n in fig. 1 may include a tablet computer, a notebook computer, a palm computer, a mobile phone, a smart audio, a Mobile Internet Device (MID), a POS (Point Of sale) machine, a wearable device (e.g., a smart watch, a smart bracelet, etc.), and the like.
It is understood that the data transmission method provided by the embodiment of the present application may be executed by a computer device, which includes but is not limited to a participating node (which may be a terminal device or a server). The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud database, a cloud service, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, domain name service, security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal device may be a tablet computer, a notebook computer, a desktop computer, a palm computer, a smart phone, a Mobile Internet Device (MID), a wearable device (e.g., a smart watch, a smart bracelet, etc.), a smart computer, a smart vehiclemounted smart terminal, and the like. The terminal device and the server may be directly or indirectly connected in a wired or wireless manner, and the embodiment of the present application is not limited herein.
It should be noted that a participating node may also be a node on the blockchain network. The block chain is a novel application mode of computer technologies such as distributed data storage, pointtopoint transmission, a consensus mechanism and an encryption algorithm, and is mainly used for sorting data according to a time sequence and encrypting the data into an account book, so that the data cannot be falsified or forged, and meanwhile, the data can be verified, stored and updated. It is understood that a Block (Block) is a data packet carrying transaction data (i.e., transaction traffic) over a blockchain network, and is a data structure that is tagged with a timestamp and a hash value of a previous Block, which verifies and determines the transaction in the Block via a consensus mechanism of the network. It is understood that one or more intelligent contracts may be included in the blockchain system, and these intelligent contracts may refer to code that nodes (including common nodes) of the blockchain can understand and execute, and may execute any logic and obtain a result. A plurality of nodes may be included in a blockchain linked point system, which may correspond to a blockchain network (including but not limited to a blockchain network corresponding to a federation chain), and may specifically include the participating node 100, the participating node 200a, the participating nodes 200b, …, and the participating node 200n described above.
Further, please refer to fig. 2 a2 b, and fig. 2 a2 b are schematic views of a scenario of data transmission based on federal learning according to an embodiment of the present application. The computer device implementing the data transmission scenario may be a plurality of participating nodes (any two or more of participating node 100, participating node 200a, participating nodes 200b, …, and participating node 200 n) as shown in fig. 1, and in the embodiment of the present application, only the example where participating node 100 and participating node 200a perform together is described. As shown in fig. 2a, the participating node 100 owns local data 300a related to its service user, the participating node 200a owns local data 400a related to its service user, and the local data 300a and the local data 400a may be data sets with different feature dimensions. During the training process of joint modeling of the participating node 100 and the participating node 200a (i.e., both parties participate in the same federal learning task), both parties may independently deploy a local framework based on federal learning and may independently create respective models locally using the local framework, for example, the participating node 100 may create the initial model 300b and the participating node 200a may create the initial model 400 b.
Further, assuming that there is more user overlap and less user feature overlap in the local data 300a and the local data 400a in the scenario, the two data sets may be partitioned according to a longitudinal direction (i.e., feature dimension), and a portion of data that is the same for both users and has not the same user feature is extracted for training, and users that do not overlap with each other are not exposed, which is also referred to as longitudinal federal learning. For example, assuming that the participating node 100 and the participating node 200a belong to two different organizations respectively, the participating node 100 belongs to a bank in a certain place, and the participating node 200a belongs to an ecommerce in the same place, it can be understood that the user groups of both parties are likely to include most residents in the place, so the intersection of users is large, however, as the bank records mainly information such as the income and expenditure behaviors and credit ratings of the users, and the ecommerce maintains information such as browsing and purchase histories of the users, the intersection of user features is small, the longitudinal federal learning is federal learning in which different features are aggregated in an encrypted state to enhance the model capability, and at present, machine learning models such as logistic regression and decision tree are established under the longitudinal federal learning system framework. The data extraction process may specifically employ an encryptionbased user sample alignment technique (e.g., RSA, an encryption algorithm), which is not expanded. Based on this, the participating node 100 may extract corresponding data from the local data 300a to train the initial model 300b, and similarly, the participating node 200a may extract corresponding data from the local data 400a to train the initial model 400 b. Further, participating node 100 and participating node 200a may each invoke an encryption module in the local framework to encrypt data related to their model training, and then communicate directly between the two parties.
It is understood that the respective raw data (e.g., local data 300a and local data 400a) of participating node 100 and participating node 200a are not local throughout the model training process.
It should be noted that Homomorphic encryption can be divided into semiHomomorphic encryption and fully Homomorphic encryption, and if a cryptographic algorithm only satisfies multiplicative homomorphism or additive Homomorphic, it is called semiHomomorphic encryption (she (space Homomorphic encryption) or phe (partial Homomorphic encryption); a cryptographic algorithm is called Fully Homomorphic Encryption (FHE) if it satisfies both multiplicative and additive Homomorphic states. In the embodiment of the present application, a semihomomorphic encryption technique that satisfies an addition property (i.e., an addition homomorphic property, which may be specifically described in the following description in step S101 of the embodiment corresponding to fig. 3) may be used to perform a privacy operation, such as a Paillier homomorphic encryption scheme, an OkamotoUchiyama homomorphic encryption scheme, or the like.
Further, with reference to the property of the semihomomorphic encryption technology, referring to fig. 2b, assuming that the participating node 100 obtains a plaintext 300e (which may include multiple plaintext, for example, a set of multiple partial derivatives) at a certain time in the training process, and at this time, the two parties need to interact based on the plaintext 300e, the participating node 100 may generate a public key 300c and a private key 300d in an initialization stage, and send the public key 300c to the participating node 200a, that is, the public key 300c may be published, the private key 300d may not be published, and only the participating node 100 possesses the private key 300d, as shown in fig. 2 b. In turn, the participating node 100 may encrypt the plaintext 300e using the public key 300c to obtain a ciphertext 300f, which may then send the ciphertext 300f to the participating node 200 a. After receiving the ciphertext 300f, the participating node 200a may perform privacy operations on the ciphertext 300f according to the local service data 400c to obtain a ciphertext 400 d. The service data 400c may refer to some partial derivatives (or gradients) generated by the participating node 200a in the training process, or may refer to a subset of the local data 400a, which may be in a ciphertext form or a plaintext form, and may be specifically determined according to actual needs, which is not limited in this embodiment of the present application. Wherein the privacy operations may include one or more of scalar multiplication operations, homomorphic addition operations, scalar addition operations. For example, assuming that the service data 400c is some partial derivatives (plaintext), the participating node 200a and the participating node 100 wish to perform scalar addition operation on the ciphertext 300f and the service data 400c, that is, calculate the result of "the ciphertext 300f behavior behavior400 c", at this time, the participating node 200a may perform encryption operation on the service data 400c by using the public key 300c to obtain an intermediate ciphertext, and then perform homomorphic addition operation on the intermediate ciphertext and the ciphertext 300f to obtain an intermediate result (i.e., the ciphertext 400d) of the current operation.
It should be noted that, because homomorphic encryption often causes data to be severely expanded, the participating node 200a may perform packing and compression processing on the ciphertext before sending the ciphertext to the participating node 100, specifically, the participating node 200a may perform numerical limit prediction on the ciphertext 400d first, so as to obtain a plaintext prediction threshold 400e corresponding to the ciphertext 400d, where the plaintext prediction threshold 400e may be used to represent a numerical range of the plaintext corresponding to the ciphertext 400d, and it may be understood that the plaintext prediction threshold 400e includes an upper limit and a lower limit. The participating node 200a may further perform addition offset processing on the ciphertext 400d based on the plaintext prediction threshold 400e to obtain a ciphertext 400f, and then the participating node 200a may generate, according to the plaintext prediction threshold 400e and the ciphertext 400f, a ciphertext 400g (which is a compressed ciphertext) having a polynomial format by using the packing compression algorithm provided in the embodiment of the present application, and send the ciphertext 400g to the participating node 100, where a specific operation process of the packing compression algorithm may refer to step S104 in the embodiment corresponding to fig. 3 described below. It should be noted that the purpose of performing the addition offset processing is to ensure that the ciphertext to be compressed meets the requirement of the packing compression algorithm, and specifically, the plaintext corresponding to the ciphertext 400f obtained by performing the addition offset processing on the ciphertext 400d needs to be a nonnegative number.
Further, after receiving the ciphertext 400g sent by the participating node 200a, the participating node 100 needs to perform decryption, decompression, and other processing to recover a real calculation result, and the specific process may be as follows: firstly, the participating node 100 may perform a decryption process on the compressed ciphertext 400g by using the private key 300d once to obtain the plaintext 300g, and it can be understood that the plaintext 300g is still in a compressed state at this time, so the participating node 100 further needs to perform a decompression process on the plaintext 300g, specifically, may perform a bit operation on the plaintext 300g according to the decompression algorithm provided in the embodiment of the present application to obtain the plaintext 300h, and further may obtain the plaintext prediction threshold 400e, and perform a subtraction restoration process (i.e., the inverse operation of the addition offset process) on the plaintext 300h based on the plaintext prediction threshold 400e, and finally may restore to obtain the plaintext 300i, i.e., the true result of the privacy operation. The specific operation process of the decompression algorithm can be seen in step S402 in the following embodiment corresponding to fig. 6.
The embodiment of the present application does not limit the specific form (such as a matrix, an array, and the like) of any ciphertext and plaintext.
In the training process of the joint modeling of the participating node 100 and the participating node 200a, the two parties may continuously communicate through the process described in fig. 2b, and it can be understood that in this process, any party may become a party having a private key (i.e., an initiator of the private operation), which is not limited in this application, and the embodiment corresponding to fig. 2b is described by taking the example that the participating node 100 has the private key 300d as an example.
Finally, when the initial model 300b and the initial model 400b converge, the model 500 can be obtained as the finally trained model according to the final model parameter combination calculated by the two parties. When the subsequent participating node 100 and the participating node 200a use the model 500 together to provide corresponding services, the communication process between the two parties is consistent with the communication process in the training process, and details are not repeated here.
As can be seen from the above, the ciphertext packing, compressing, decrypting and decompressing method for federal learning can support any federal learning algorithm using a semihomomorphic encryption algorithm satisfying additive properties, such as a longitudinal federal LR algorithm, a longitudinal federal GBDT algorithm (also called a longitudinal federal XGB algorithm), a longitudinal federal neural network algorithm, and the like, and can be integrated into a federal learning platform as a model training and model reasoning module of a federal learning task, provide a federal learning service outside a public cloud or a private cloud, and can be applied to various scenes such as financial wind control, advertisement recommendation, crowd portrayal, information query, and the like. By adopting the method provided by the embodiment of the application, the communication overhead and the decryption overhead can be effectively reduced, the usability of the federal learning platform (or system) is improved, and the effect of the federal learning model is further improved.
Referring to fig. 3, fig. 3 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application. The data transmission method may be performed by any number of participating nodes (e.g., any two or more of participating nodes 100, 200a, 200b, …, and 200n in fig. 1). The data transmission method may include at least the following steps S101 to S104:
step S101, a first participant node acquires a first ciphertext, and performs privacy operation on the first ciphertext according to business data belonging to the first participant node to generate a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
during the training or reasoning process of the federal learning task, the first participating node and the second participating node can continuously perform data interaction under the encryption protection. Specifically, a semihomomorphic encryption technology meeting the addition property is adopted in the embodiment of the present application, and such a technology generally requires that the input plaintext is a large integer (also called a highprecision integer, which means an integer whose precision cannot be stored by a basic data type, such as a 2048bit large integer), and the plaintext used in machine learning is often a floating point number (such as a 32bit floating point number), so that, in order to support the floating point number, the second participating node needs to perform an encoding operation on the initial plaintext in a floating point number format to obtain an encoded initial plaintext, the encoded initial plaintext has a large integer format, and then the encoded initial plaintext can be encrypted by using a public key to obtain a first ciphertext, and further, the first ciphertext can be sent to the first participating node. The initial plaintext may refer to data generated by the second participating node during the training process, and is used for data interaction with the first participating node. The encoding method used in the embodiment of the present application is not limited, and in one embodiment, an encoding operation formula for encoding a floating point number X into a 2048bit large integer X is as follows:
where B is the scaling factor, a common setting is B16, e is the exponential term parameter, and round represents a rounded rounding function. It is understood that the calculation formula for coding large integers of other lengths is similar to formula (1), and is not described herein. After the encoding is completed, the large integer X is encrypted by using an encryption algorithm to obtain a ciphertext [ X ]. In the embodiment of the present application, X ═ e, X > and [ X ] ═ e, [ X ] >, that is, after the encoding operation, the ciphertext [ X ] of the floating point number X is composed of two parts, including the exponent term parameter e and the large integer ciphertext [ X ].
Further, the first participating node receives a first ciphertext (including a ciphertext corresponding to the encoded initial plaintext and a corresponding exponential parameter) sent by the second participating node, and then performs privacy operation on the first ciphertext according to the service data belonging to the first participating node, so as to generate a second ciphertext. The traffic data is plaintext, and may include, but is not limited to, local data used for model training, a gradient (or partial derivative) generated during training, and the like.
It should be noted that, the embodiments of the present application may use a semihomomorphic encryption technique that satisfies an addition property to perform the privacy operation, where the semihomomorphic encryption technique that satisfies the addition property mainly satisfies the following operations:
(a) initialization: in an initialization stage, generating a public key PK and a private key SK, wherein the public key PK can be published, and the private key SK cannot be published;
(b) encryption processing: giving a numerical value (plaintext) V, and carrying out encryption processing by adopting a public key PK to obtain a ciphertext [ V ], which can be expressed as Enc (V, PK) → [ V ];
(c) decryption processing: giving a ciphertext [ V ], performing decryption processing on a recovery value (plaintext V) by using a private key SK, wherein the recovery value can be represented as Dec ([ V ], SK) → V;
(d) homomorphic addition operation: given two ciphertexts [ U ]]And [ V ]]Obtaining new cipher text [ W ] by homomorphic addition operation]Can be represented asSatisfy Dec ([ W)]SK) → W and W ═ U + V;
(e) scalar addition operation: given a ciphertext [ U]And plaintext V, and obtaining new ciphertext W by scalar addition operation]Can be represented asSatisfy Dec ([ W)]SK) → W and W ═ U + V, it is noted that, in most cases, scalar addition proceeds through one encryption operation Enc (V, PK) → [ V → V]And a homomorphic addition operationTo complete the process;
(f) scalar multiplication operation: given a ciphertext [ U]And plaintext V, and a new ciphertext W is obtained through scalar multiplication operation]Can be represented asSatisfy Dec ([ W)]SK) → W and W ═ U × V.
In federal learning, the semihomomorphic encryption techniques described above are used in two commonly used forms, namely, a linear algebra operation and a histogram aggregation operation, which are combined with the properties of the semihomomorphic encryption techniques described above, that is, the privacy operations in the embodiments of the present application may include a linear algebra operation and a histogram aggregation operation, which may involve one or more of a homomorphic addition operation, a scalar addition operation, and a scalar multiplication operation.
It can be understood that, in this embodiment of the present application, the second participating node refers to a node that owns both a private key and a public key, and the first participating node refers to a node that owns only the public key and has no knowledge about the private key, that is, after initializing the public key and the private key, the second participating node sends the public key to the first participating node, so that the first participating node may also use the public key to perform encryption processing. It can be understood that the application supports any two or more nodes to participate in the same federal learning task, but in the same privacy operation process, only one node possesses a private key for decryption, that is, the number of the second participating nodes is only one, but the number of the first participating nodes may be one or more, and the application does not limit the number of the first participating nodes.
S102, carrying out numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
specifically, in a machine learning task, the value tends to fall within a relatively small range, for example, when a logistic loss (loss function used in logistic regression) is used as a loss function, the calculation formula of the partial derivative is δ ═ sigmoid (y ') y, where sigmoid is an activation function, the output is between 0 and 1, y' is a model prediction value, y is a sample label, and y ∈ {0,1}, and thus it can be seen that the range of δ is between1 and 1, that is, δ has upper and lower bounds. Based on this, it can be understood that, in the federal learning process, the intermediate calculation result (i.e., the second ciphertext generated after the privacy operation) also has upper and lower bounds, and the upper and lower bounds are predictable. Therefore, the first participating node can obtain the numerical range and the data dimension corresponding to the initial plaintext and the numerical range and the data dimension corresponding to the service data, and further can generate the plaintext prediction limit value corresponding to the second ciphertext based on the operation type of the privacy operation, the corresponding numerical range and the data dimension.
The plaintext prediction threshold may include an upper threshold (i.e., a maximum value of the plaintext prediction threshold) and a lower threshold (i.e., a minimum value of the plaintext prediction threshold), and is used to represent a value range of the plaintext corresponding to the second ciphertext. The numerical limit estimation is used as a premise of a subsequent packing and compressing step in the embodiment of the application, and can be used in other scenes in which the upper and lower limits of the numerical value need to be estimated, for example, when some operation results are possible to exceed a preset maximum upper limit, the numerical limit estimation is carried out on the operation results, a certain protection effect can be achieved, and the operation results are not expanded.
It will be appreciated that in practical applications, the above mentioned value ranges and data dimensions are public information, i.e. all participating nodes know the information, and therefore, the value limit estimation has no possibility of privacy disclosure. The numerical limit estimation in two forms of linear algebra operation and histogram aggregation will be discussed separately later, and will not be expanded first.
Step S103, based on the plaintext prediction threshold value, performing addition offset processing on the second ciphertext to obtain a third ciphertext;
in connection with step S102, the true number corresponding to the ciphertext (i.e. the plaintext corresponding to the ciphertext) falls within a smaller range, so that, for the large integer obtained after the encoding operation, although the highest number can support 2048 or 3072 bits, only a smaller space is actually used, for example, in connection with formula (1) in step S101, for the floating point y, when y is greater than or equal to 0 and less than or equal to 2^{32}When B is 16 and e is 8, the large integer Y with length of 2048 bits obtained by encoding corresponds to the following numerical range: 0. ltoreq. Y ═ round (Y × B)^{e})≤2^{64}That is, although the effective space of the encryption algorithm is as high as 2048 bits, the actual value of each ciphertext in federal learning is only 64 bits at the maximum. It should be noted that although the encoded large integer reaches only 64 bits at most, the encrypted large integer is larger than 64 bits because the encryption operation involves complicated modular exponentiation, which does not affect the feasibility of the scheme. Based on this, the embodiments of the present application provide a polynomialbased packing compression algorithm for ciphertext, where the algorithm requires that a plaintext corresponding to a ciphertext to be compressed is a nonnegative number, and it is assumed that a second ciphertext is defined as [ v [ ]_{i}]＝<e_{i},[V_{i}]>I1, 2, …, M, i.e. the second ciphertext contains M ciphertexts (M is a positive integer greater than 1), the packing compression algorithm requires v_{i}More than or equal to 0, therefore, the step needs to perform the additive offset processing on the second ciphertext before performing the packing compression, and specifically, it is assumed that the plaintext prediction threshold corresponding to the second ciphertext obtained by the step S102 is [  Φ, Φ]Wherein, the upper limit value is phi, the lower limit value isphi, the first participating node can obtain the upper limit value phi in the plaintext prediction limit value, and further can respectively carry out scalar addition operation on the upper limit value phi and each ciphertext in the second ciphertext to obtain a third ciphertext [ u ]_{i}]＝<e_{i},[U_{i}]>I is 1, 2, …, M, and its calculation formula is as follows:
wherein, the above calculation process involves encryption operation and homomorphic addition operation, the third ciphertext obtained at this time also includes M ciphertexts, and the ciphertext [ u ] can be guaranteed_{i}]Corresponding plaintext u_{i}Is a nonnegative number.
And step S104, generating a target compressed ciphertext with a polynomial format according to the plaintext prediction threshold value and the third ciphertext, and sending the target compressed ciphertext to the second participating node.
Specifically, the first participating node may first obtain a shift parameter for representing the estimated threshold value, and then may perform scalar multiplication operation on the third ciphertext based on the shift parameter to obtain at least two ciphertext monomials, and further may perform homomorphic addition operation on the at least two ciphertext monomials, so that a target compressed ciphertext with a polynomial format may be obtained, and finally the target compressed ciphertext may be sent to the second participating node.
In connection with the above step S103, for a given third ciphertext [ u ] containing M ciphertexts_{i}]＝<e_{i},[U_{i}]>I ═ 1, 2, …, M, the packing compression algorithm provided by embodiments of the present application can use scalar multiplication operations and homomorphic addition operations to perform the following operations:
wherein the shift parameter T is used to represent the estimated threshold value [  φ, φ]That is, the shift parameter T can be determined by the value range of the large integer plaintext corresponding to the second ciphertext, for example, if the ciphertext [ V ] in step S103_{i}]If the corresponding real value is only up to 64 bits, T may be set to 64. Wherein the ciphertext [ U ] is compressed_{pack}]Contains M polynomials [ U ] respectively_{1}]、After homomorphic addition operation and arrangement are carried out on the M polynomials, the final target compressed ciphertext [ u ] can be obtained_{pack}]＝<e_{1},e_{2},…,e_{M},[U_{pack}]>In addition, the index term parameter (e)_{1},e_{2},…,e_{M}And) does not participate in the above operation due to the exponential term parameter (e)_{1},e_{2},…,e_{M}C) the memory required is much smaller than that required for the ciphertext, so the target compresses the ciphertext u_{pack}]The required memory size is close to 1/M of the original memory required by M ciphertexts, that is, cipher text [ u [ ]_{1}]Ciphertext [ u ]_{2}]… ciphertext [ u ]_{M}]By performing the abovementioned packing and compressing process, M ciphertexts can be compressed into one cipher text [ u ]_{pack}]. Finally, the first participating node may compress the target ciphertext u_{pack}]And sending the data to the second participating node, so that the communication overhead can be reduced to 1/M. Wherein, M is aboveThe limit is related to the key length, which in practical applications is typically 2048 bits or 3072 bits, in which case M may be 64 or 128. It can be understood that in many scenarios with high security requirements, the key length may take 3072 bits (the corresponding ciphertext length is 6144 bits), in which case the large integer obtained by encoding may reach 3072 bits, but the method provided in the embodiment of the present application is still applicable, and at this time, the relevant parameters in formula (1) and formula (3) need to be adjusted accordingly.
To sum up, the embodiment of the present application provides a cipher text packing and compressing technology based on a polynomial, which supports a first participating node to obtain a first cipher text sent by a second participating node, and can perform privacy operation on the first cipher text according to service data belonging to the first participating node, so as to generate a second cipher text, further, can perform numerical boundary prediction on the obtained second cipher text, obtain a plaintext prediction limit value corresponding to the second cipher text, further, can perform additive offset processing on the second cipher text based on the plaintext prediction limit value, obtain a third cipher text, then can generate a target compressed cipher text with a polynomial format according to the plaintext prediction limit value and the third cipher text, and finally can send the target compressed cipher text to the second participating node. Therefore, in the process of carrying out federated learning by the first participating node and the second participating node, before the first participating node sends the ciphertext to the second participating node, the plurality of ciphertexts are packed and compressed together for transmission by the polynomialbased packing and compression algorithm provided by the embodiment of the application, so that the communication overhead caused by sending the ciphertext can be greatly reduced, the running efficiency of the federated learning task can be obviously improved, and the availability of federated learning is improved.
Further, please refer to fig. 4, where fig. 4 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application. As shown in fig. 4, the process of the data transmission method includes the following steps S201 to S203, and the steps S201 to S203 are a specific embodiment of the steps S101 to S102 in the embodiment corresponding to fig. 3, and the data transmission process may include the following steps for a scenario of linear algebraic operation:
step S201, acquiring service data belonging to a first participating node, and performing linear algebraic operation on the service data and a first ciphertext to obtain a second ciphertext;
specifically, after acquiring a first ciphertext sent by a second participating node, a first participating node may acquire service data stored locally, and may further perform linear algebraic operation on the service data and the first ciphertext to obtain a second ciphertext, where the linear algebraic operation includes one or more of scalar multiplication operation, homomorphic addition operation, and scalar addition operation.
It should be noted that linear algebra operation based on homomorphic encryption is applied to many federal learning algorithms, such as Logistic Regression (LR) algorithm, neural network algorithm, and the like. In one embodiment, it is assumed that the initial plaintext and the service data are both matrices, and the initial plaintext owned by the second participating node is the matrix m_{1}The service data owned by the first participating node comprises a matrix m_{2}And matrix m_{3}Matrix m_{1}Is IN size of IN × H, matrix m_{2}Is of size H × OUT, matrix m_{3}Is IN × OUT. Suppose that participating parties wish to compute m_{4}＝m_{1}×m_{2}+m_{3}(belonging to linear algebra operation), and simultaneously, the matrix of the self is hoped to be protected from being known by the other side, and the flow can be as follows:
(a) the second participating node initializes a public key PK and a private key SK and sends the public key PK to the first participating node;
(b) second participating node encrypts matrix m using public key PK_{1}Obtain the matrix m_{1}]And will matrix [ m ]_{1}]Is sent to the first participating node, where the matrix m_{1}]Refers to a matrix composed of a plurality of ciphertexts, and wherein each cipher text is a matrix m_{1}Ciphertext corresponding to a numerical value at a position, i.e. the matrix m_{1}]The ciphertext at row i and column j is m_{1}]^{(ij)}＝Enc(m_{1} ^{(ij)},PK)；
(c) The first participating node receives the matrix m_{1}]Then, calculateWherein the matrix [ m_{4}]The ciphertext of row i and column j (i.e., the second ciphertext) may be calculated as follows:the process involves scalar multiplication operations, homomorphic addition operations, and scalar addition operations. After the calculation is finished, if the first participating node directly sends the matrix m_{4}]Sent to the second participating node, the second participating node may use the key SK pair matrix m_{4}]Decrypting to obtain the matrix m_{4}Obviously, m_{4}＝m_{1}×m_{2}+m_{3}。
Step S202, a first numerical range corresponding to the initial plaintext and a second numerical range corresponding to the service data are obtained; acquiring a first data dimension of an initial plaintext and a second data dimension of service data, and determining a target data dimension in the first data dimension and the second data dimension according to the operation type of linear algebra operation;
specifically, the first participating node may initiate a first numerical range corresponding to the plaintext and a second numerical range corresponding to the service data, and may obtain a first data dimension of the initial plaintext and a second data dimension of the service data, so as to determine a target data dimension in the first data dimension and the second data dimension according to an operation type of linear algebraic operation. Optionally, in an implementation, when both the initial plaintext and the service data are matrices, the first participating node may obtain a first matrix width and a first matrix height of the initial plaintext, and determine the first matrix width and the first matrix height as a first data dimension of the initial plaintext, and similarly, may obtain a second matrix width and a second matrix height of the service data, and determine the second matrix width and the second matrix height as a second data dimension of the service data. Further, when the linear algebra operation is a scalar multiplication operation and the first matrix height is equal to the second matrix width, the first matrix height may be determined as the target data dimension, or, when the linear algebra operation is a scalar multiplication operation and the second matrix height is equal to the first matrix width, the second matrix height may be determined as the target data dimension. It will be appreciated that the target data dimension depends on the specific operational procedure of the linear algebra operation.
In conjunction with step S201 above, the matrix m_{1}For the initial plaintext, the matrix m_{2}And matrix m_{3}For traffic data, assume matrix m_{1}All elements in (A) are in the numerical range of [  φ [  ]_{1},φ_{1}]Inner, matrix m_{2}All elements in (A) are in the numerical range of [  φ [  ]_{2},φ_{2}]Inner, matrix m_{3}All elements in (A) are in the numerical range of [  φ [  ]_{3},φ_{3}]If so, the first value range obtained by the first participating node is [  φ_{1},φ_{1}]The second numerical range includes [  φ_{2}，φ_{2}]And [  φ_{3},φ_{3}]Accordingly, the first data dimension includes a first matrix width IN and a first matrix height H, and the second data dimension includes a second matrix width H, a second matrix height OUT, and a second matrix width IN and a second matrix height OUT. Wherein phi is_{1}、φ_{2}、φ_{3}IN, H, OUT are publicly known data. Since the linear algebra operation in step S201 is as followsThe second matrix width H can thus be determined as the target data dimension.
Step S203, based on the operation type of the linear algebra operation, the first numerical range, the second numerical range, and the target data dimension, a plaintext prediction threshold corresponding to the second ciphertext is generated.
Specifically, the matrix [ m ] obtained by the calculation is combined with the steps S201 to S202_{4}]As a second ciphertext, the matrix [ m ]_{4}]The corresponding plaintext is the matrix m_{4}Matrix m_{4}All elements in (A) are in the numerical range [  φ, φ]Inner, that is, matrix [ m ]_{4}]The corresponding plaintext prediction threshold is [ phi, phi [ phi [ ]]Wherein phi is H × phi_{1}×φ_{2}+φ_{3}。
The embodiment of the application supports a first participating node to obtain a first ciphertext sent by a second participating node, and can perform linear algebraic operation on the first ciphertext according to service data belonging to the first participating node, so that a second ciphertext can be generated, further, a plaintext estimation limit value corresponding to the second ciphertext can be generated based on the operation type of the linear algebraic operation, a first numerical range corresponding to an initial plaintext, a second numerical range corresponding to the service data and the dimensionality of target data, then, the second ciphertext can be subjected to additive offset processing based on the plaintext estimation limit value to obtain a third ciphertext, then, a target compressed ciphertext with a polynomial format can be generated according to the plaintext estimation limit value and the third ciphertext, and finally, the target compressed ciphertext can be sent to the second participating node. Therefore, in the process of carrying out federated learning by the first participating node and the second participating node, the first participating node packages and compresses a plurality of ciphertexts together for transmission based on the plaintext estimation limit value by estimating the plaintext estimation limit value corresponding to the second ciphertexts before sending the ciphertexts to the second participating node, so that the communication overhead caused by sending the ciphertexts can be greatly reduced, the running efficiency of the federated learning task can be obviously improved, and the availability of federated learning is improved.
Further, please refer to fig. 5, where fig. 5 is a schematic flow chart of a data transmission method based on federal learning according to an embodiment of the present application. As shown in fig. 5, the process of the data transmission method includes the following steps S301 to S303, where the steps S301 to S303 are a specific embodiment of the steps S101 to S102 in the embodiment corresponding to fig. 3, and the data transmission process may include the following steps for a scene of histogram aggregation:
step S301, acquiring data characteristics corresponding to the business data belonging to the first participating node, and clustering at least two subciphertexts based on the data characteristics to obtain one or more clustering intervals; performing homomorphic addition operation on the subciphertexts in one or more clustering intervals respectively to obtain clustering subciphertexts corresponding to each clustering interval, and determining one or more clustering subciphertexts as second ciphertexts;
specifically, after the first participating node obtains the first ciphertext (assuming that the first ciphertext includes at least two subciphertexts) sent by the second participating node, the first participating node may obtain data characteristics corresponding to the locally stored service data, and may further cluster the at least two subciphertexts based on the data characteristics, so as to obtain one or more clustering sections. Further, homomorphic addition operation can be performed on the subciphertexts in one or more clustering intervals respectively to obtain clustering subciphertexts corresponding to each clustering interval, and finally one or more clustering subciphertexts can be determined as second ciphertexts.
It should be noted that another common privacy operation in machine learning is to aggregate gradients (or partial derivatives) for subsequent computation flows, and many algorithms such as a Decision Tree algorithm, a Gradient Boosting Decision Tree (GBDT, also called XGB) algorithm, and the like all include histogram aggregation operations. Assuming that there are N training samples, N is an integer greater than 1, the second participating node possesses the partial derivative t of these samples_{1}，t_{2}，…，t_{N}(i.e., the original plaintext), the first participating node possesses data characteristics of the samples (i.e., data characteristics corresponding to the traffic data, such as age, gender, etc.). Assuming that the second participating node wants to aggregate the partial derivatives based on the data characteristics of the first participating node, for example, dividing the age into four clustering sections of 18 years old or less, 18 years old to 35 years old, 35 years old to 60 years old, and 60 years old or more, and aggregating the partial derivatives of the samples in each clustering section, and meanwhile, wanting to protect the data of both parties from being known by the other party, the flow may be:
(a) the second participating node initializes a public key PK and a private key SK and sends the public key PK to the first participating node;
(b) the second participating node uses the public key PK to pair the partial derivatives t_{1}，t_{2}，…，t_{N}Encrypting to obtain partial derivative cipher text t_{1}]，[t_{2}]，…，[t_{N}](namely a first ciphertext consisting of the N subciphertexts) and sending the first ciphertext to the first participating node;
(c) the first participating node performs homomorphic addition operation on the partial derivative ciphertext of the samples falling into the same clustering interval based on the data characteristics of the first participating node, and the ciphertext [ s ] of the histogram can be obtained_{1}]，[s_{2}]，…，[s_{q}](i.e., a second ciphertext comprising q subciphertexts to cluster), where q is the number of clustering intervals, and q is a positive integer, and q is less than or equal to N, e.g., ciphertext [ s [ ]_{1}]Is the sum of partial derivative ciphertexts of all samples under the age of 18. After the calculation is finished, if the first participating node directly sends the ciphertext [ s ]_{1}]，[s_{2}]，…，[s_{q}]Sent to the second participating node, the second participating node may then use the key SK to pair the ciphertext [ s_{1}]，[s_{2}]，…，[s_{q}]Decryption is performed to obtain the result of histogram aggregation, i.e. s_{1}，s_{2}，…，s_{q}。
Step S302, obtaining a numerical range and a data dimension corresponding to an initial plaintext;
specifically, in conjunction with step S301, assume that the partial derivative t owned by the second participating node_{1}，t_{2}，…，t_{N}All in the numerical range of [  φ ', φ']And if the first participating node obtains the numerical value range of [  φ ', φ']The data dimension is N (i.e., the number of samples), where both phi' and N are publicly known data.
Step S303, a plaintext prediction threshold corresponding to the second ciphertext is generated based on the operation type, the numerical range, and the data dimension of the homomorphic addition operation.
Specifically, in combination with the above steps S301 to S302, the partial derivative t is calculated_{1}，t_{2}，…，t_{N}Ciphertext [ s ] obtained by histogram aggregation_{1}]，[s_{2}]，…，[s_{q}]As second ciphertext, the corresponding plaintext (including s)_{1}，s_{2}，…，s_{q}) Mean numerical range [ phi, phi [ phi [ ]]Inner, that is, the plaintext prediction threshold corresponding to the second ciphertext is [  φ, φ]Where, Φ is N × Φ'.
To sum up, the embodiment of the present application supports a first participating node to obtain a first ciphertext that includes at least two subciphertexts and is sent by a second participating node, and can cluster the at least two subciphertexts according to data characteristics corresponding to service data belonging to the first participating node, so that a second ciphertext that includes one or more clustered subciphertexts can be generated, further, a plaintext prediction threshold corresponding to the second ciphertext can be generated based on an operation type, a numerical range corresponding to an initial plaintext, and a data dimension in a clustering process, a plaintext prediction threshold corresponding to the second ciphertext can be subsequently performed on the second ciphertext based on the plaintext prediction threshold, a third ciphertext can be obtained, a target compressed ciphertext having a polynomial format can be generated according to the plaintext prediction threshold and the third ciphertext, and finally, the target compressed ciphertext can be sent to the second participating node. Therefore, in the process of carrying out federated learning by the first participating node and the second participating node, the first participating node packages and compresses a plurality of ciphertexts together for transmission based on the plaintext estimation limit value by estimating the plaintext estimation limit value corresponding to the second ciphertexts before sending the ciphertexts to the second participating node, so that the communication overhead caused by sending the ciphertexts can be greatly reduced, the running efficiency of the federated learning task can be obviously improved, and the availability of federated learning is improved.
Referring to fig. 6, fig. 6 is a schematic flowchart of a data transmission method based on federal learning according to an embodiment of the present application. The data transmission method may be performed by any number of participating nodes (e.g., any two or more of participating nodes 100, 200a, 200b, …, and 200n in fig. 1). The data transmission method may include at least the following steps S401 to S403:
step S401, a second participating node receives a target compressed ciphertext sent by a first participating node, and decrypts the target compressed ciphertext by using a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node;
specifically, in the training or reasoning process of the federal learning task, after the second participating node receives the target compressed ciphertext sent by the first participating node, the second participating node may decrypt the target compressed ciphertext by using a private key, so as to obtain a first compressed plaintext, where the first compressed plaintext has a polynomial format, that is, is in a compressed state. The target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to a plaintext prediction threshold value and the third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext refers to a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext refers to data obtained by the second participating node performing encryption processing on the initial plaintext.
For example, in conjunction with step S104 in the embodiment corresponding to fig. 3, the second participating node receives the target compressed ciphertext u_{pack}]＝<e_{1},e_{2},…,e_{M},[U_{pack}]>Then, it needs to be decrypted and decompressed to recover the real value v_{1}，v_{2}，…，v_{M}. In this step, the second participating node need only use the private key to compress the ciphertext [ u ] for the target_{pack}]The target compressed ciphertext u can be decrypted once_{pack}]Decipher into large integer U_{pack}(i.e., the first compressed plaintext).
Step S402, performing bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext;
specifically, the second participant node may perform a bit operation on the first compressed plaintext having the polynomial format to obtain at least two decompressed plaintext having the integer format. For example, combining step S401 with step S104 in the embodiment corresponding to fig. 3, the large integer U can be known_{pack}＝U_{1}+2^{T}×(U_{2}+2^{T}×(U_{3}+ …)), so that the bitarithmetic operation can be used to obtain M large integers U_{1}，U_{2}，…，U_{M}(i.e., M decompressed plaintext), for example, when the shift parameter T in equation (3) is 64, the large integer U_{pack}164 of (A) is a large integer U_{1}The 65 th to 128 th bits are large integers U_{2}And by analogy, M large integers can be obtained by decompression.
As can be appreciated, the first and second,specifically, the second participating node may obtain a scaling factor and at least two exponent item parameters corresponding to the decompressed plaintext, and then decode each decompressed plaintext according to the scaling factor and the at least two exponent item parameters, so as to obtain at least two subplaintexts having a floating point number format, and determine the at least two subplaintexts as the second plaintext. For example, in combination with the formula (1) in step S101 in the embodiment corresponding to fig. 3, the large integers U are respectively processed_{1}，U_{2}，…，U_{M}Perform a decoding operation, i.e. can<e_{i},U_{i}>Decoding into floatingpoint number u_{i}(where i is 1, 2, …, M), a floating point number u is finally obtained_{1}，u_{2}，…，u_{M}(i.e., M subplaintexts), this process requires a total of M decoding operations to be performed.
Step S403, obtaining a plaintext prediction limit value, and performing subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext.
Specifically, the second participating node may obtain an upper limit value in the plaintext prediction limit value, further subtract at least two subplaintexts from the upper limit value to obtain at least two recovered subplaintexts, and finally determine the at least two complex atom plaintexts as a target plaintexts (i.e., a plaintexts corresponding to the second ciphertext, which is a true result of the privacy operation). For example, with reference to the formula (2) in step S103 in the embodiment corresponding to fig. 3, the upper limit value Φ in the plaintext prediction limit value is obtained, and v is calculated_{i}＝u_{i}1, 2, …, M, so that M recovery subplaintext v can be recovered_{1}，v_{2}，…，v_{M}。
As can be seen from the above, the conventional ciphertext decryption process usually involves complex operations (such as chinese remainder theorem, etc.) to ensure the robustness of the ciphertext, but this results in a timeconsuming single decryption operation, and therefore, when the number of ciphertexts to be decrypted is large, the decryption overhead is huge. Compared with the method for directly decrypting M ciphertexts, the decryption decompression algorithm corresponding to the packing compression algorithm provided by the embodiment of the application only needs to perform decryption operation once, and the time consumption of the decoding operation and the plaintext addition and subtraction operation is far lower than that of the decryption operation, so that the cipher text decryption overhead can be reduced to 1/M of the original cipher text decryption overhead.
In view of the above, the embodiment of the present application provides a cipher text decryption and decompression technology based on a polynomial, which supports a second participating node to receive a target compressed cipher text sent by a first participating node, and decrypts the target compressed cipher text by using a private key to obtain a first compressed plaintext, further, performs bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext, then obtains a plaintext prediction limit value, and performs subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to finally obtain the target plaintext, that is, a true result of the first participating node performing privacy operation. Therefore, in the process of performing federal learning on the first participating node and the second participating node, at the stage of decrypting the second participating node, the decryption decompression algorithm based on the polynomial provided by the embodiment of the application can decompress a plurality of packed and compressed ciphertexts at one time through onetime decryption operation, so that the decryption overhead caused by ciphertext decryption can be greatly reduced, the running efficiency of a federal learning task can be obviously improved, and the availability of the federal learning is improved.
In summary, the data transmission method based on federated learning provided by the present application can be applied to any federated learning algorithm using a semihomomorphic encryption technology satisfying additive properties, wherein the data transmission method mainly includes numerical value limit estimation in federated learning, packing compression algorithm based on polynomial and decryption decompression algorithm based on polynomial.
Please refer to fig. 7, which is a schematic structural diagram of a data transmission apparatus based on federal learning according to an embodiment of the present application. The federal learning based data transfer device can be a computer program (including program code) running on a computer apparatus, such as an application software; the device can be used for executing corresponding steps in the data transmission method based on the federal learning provided by the embodiment of the application. As shown in fig. 7, the federally learned data transmission 1 may include: the device comprises an operation module 11, an estimation module 12, an offset module 13 and a compression module 14;
the operation module 11 is configured to obtain a first ciphertext by a first participating node, perform privacy operation on the first ciphertext according to service data belonging to the first participating node, and generate a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
the estimation module 12 is configured to perform numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
the offset module 13 is configured to perform addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
the offset module 13 is specifically configured to obtain an upper limit value in the plaintext prediction limit value, and perform scalar addition operation on the upper limit value and the second ciphertext to obtain a third ciphertext; the plaintext corresponding to the third ciphertext is a nonnegative number;
the compression module 14 is configured to generate a target compressed ciphertext with a polynomial format according to the plaintext prediction threshold and the third ciphertext, and send the target compressed ciphertext to the second participating node;
the compression module 14 is specifically configured to obtain a shift parameter for representing the estimated threshold value, and perform scalar multiplication operation on the third ciphertext based on the shift parameter to obtain at least two ciphertext polynomials; and carrying out homomorphic addition operation on at least two ciphertext monomials to obtain the target compressed ciphertext with the polynomial format.
The specific functional implementation manner of the operation module 11 may refer to step S101 in the embodiment corresponding to fig. 3, the specific functional implementation manner of the estimation module 12 may refer to step S102 in the embodiment corresponding to fig. 3, the specific functional implementation manner of the offset module 13 may refer to step S103 in the embodiment corresponding to fig. 3, and the specific functional implementation manner of the compression module 14 may refer to step S104 in the embodiment corresponding to fig. 3, which is not described herein again.
Referring to fig. 7, the operation module 11 may include: a first arithmetic unit 111 and a second arithmetic unit 112;
the first arithmetic unit 111 is configured to obtain service data belonging to a first participating node, and perform linear algebraic operation on the service data and the first ciphertext to obtain a second ciphertext; the linear algebraic operation comprises one or more of scalar multiplication operation, homomorphic addition operation and scalar addition operation;
in one embodiment, the first ciphertext includes at least two subciphertexts;
the second operation unit 112 is configured to obtain data characteristics corresponding to the service data belonging to the first participating node, and cluster the at least two subciphertexts based on the data characteristics to obtain one or more clustering sections; and respectively carrying out homomorphic addition operation on the subciphertexts in one or more clustering intervals to obtain a clustering subcipher text corresponding to each clustering interval, and determining one or more clustering subcipher texts as second cipher texts.
The specific implementation of the function of the first operation unit 111 may refer to step S201 in the embodiment corresponding to fig. 4, and the specific implementation of the function of the second operation unit 112 may refer to step S301 in the embodiment corresponding to fig. 5, which is not described herein again.
Referring to fig. 7, the estimation module 12 may include: a first obtaining unit 121, a first estimating unit 122, a second obtaining unit 123, and a second estimating unit 124;
a first obtaining unit 121, configured to obtain a first numerical range corresponding to the initial plaintext and a second numerical range corresponding to the service data; acquiring a first data dimension of an initial plaintext and a second data dimension of service data, and determining a target data dimension in the first data dimension and the second data dimension according to the operation type of linear algebra operation;
in one embodiment, the initial plaintext and the service data are both matrices;
the first obtaining unit 121 is specifically configured to obtain a first matrix width and a first matrix height of the initial plaintext, and determine the first matrix width and the first matrix height as a first data dimension of the initial plaintext; acquiring a second matrix width and a second matrix height of the service data, and determining the second matrix width and the second matrix height as a second data dimension of the service data; when the linear algebra operation is scalar multiplication operation and the first matrix height is equal to the second matrix width, determining the first matrix height as a target data dimension; when the linear algebra operation is scalar multiplication operation and the height of the second matrix is equal to the width of the first matrix, determining the height of the second matrix as the dimension of target data;
the first preestimation unit 122 is configured to generate a plaintext preestimation limit value corresponding to the second ciphertext based on an operation type of linear algebraic operation, the first numerical range, the second numerical range, and a target data dimension;
a second obtaining unit 123, configured to obtain a numerical range and a data dimension corresponding to the initial plaintext;
the second prediction unit 124 is configured to generate a plaintext prediction threshold corresponding to the second ciphertext based on the operation type, the value range, and the data dimension of the homomorphic addition operation.
The specific functional implementation manner of the first obtaining unit 121 may refer to step S202 in the embodiment corresponding to fig. 4, the specific functional implementation manner of the first estimating unit 122 may refer to step S203 in the embodiment corresponding to fig. 4, the specific functional implementation manner of the second obtaining unit 123 may refer to step S302 in the embodiment corresponding to fig. 5, and the specific functional implementation manner of the second estimating unit 124 may refer to step S303 in the embodiment corresponding to fig. 5, which is not described herein again.
The embodiment of the application provides a ciphertext packing and compressing technology based on a polynomial, which supports a first participating node to obtain a first ciphertext sent by a second participating node, and can perform privacy operation on the first ciphertext according to service data belonging to the first participating node, so that a second ciphertext can be generated, further, numerical limit estimation can be performed on the obtained second ciphertext, a plaintext estimation limit value corresponding to the second ciphertext is obtained, then, additive offset processing can be performed on the second ciphertext according to the plaintext estimation limit value, a third ciphertext is obtained, then, a target compressed ciphertext with a polynomial format can be generated according to the plaintext estimation limit value and the third ciphertext, and finally, the target compressed ciphertext can be sent to the second participating node. Therefore, in the process of carrying out federated learning by the first participating node and the second participating node, before the first participating node sends the ciphertext to the second participating node, the plurality of ciphertexts are packed and compressed together for transmission by the polynomialbased packing and compression algorithm provided by the embodiment of the application, so that the communication overhead caused by sending the ciphertext can be greatly reduced, the running efficiency of the federated learning task can be obviously improved, and the availability of federated learning is improved.
Please refer to fig. 8, which is a schematic structural diagram of a data transmission apparatus based on federal learning according to an embodiment of the present application. The federal learning based data transfer device can be a computer program (including program code) running on a computer apparatus, such as an application software; the device can be used for executing corresponding steps in the data transmission method based on the federal learning provided by the embodiment of the application. As shown in fig. 8, the federal learning based data transmission 2 may include: a decryption module 21, a decompression module 22 and a restoration module 23;
the decryption module 21 is configured to receive, by the second participating node, the target compressed ciphertext sent by the first participating node, and decrypt the target compressed ciphertext by using a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to the plaintext prediction threshold value and the third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext refers to a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext refers to data obtained by the second participating node performing encryption processing on the initial plaintext;
a decompression module 22, configured to perform a bit operation on the first compressed plaintext with a polynomial format to obtain a second plaintext;
the restoration module 23 is configured to obtain a plaintext prediction limit value, and perform subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext;
the restoration module 23 is specifically configured to obtain an upper limit value in the plaintext prediction limit value, subtract the upper limit value from at least two subplaintexts to obtain at least two restored subplaintexts, and determine at least two complex atom plaintexts as a target plaintexts.
The specific implementation of the function of the decryption module 21 may refer to step S401 in the embodiment corresponding to fig. 6, the specific implementation of the function of the decompression module 22 may refer to step S402 in the embodiment corresponding to fig. 6, and the specific implementation of the function of the restoration module 23 may refer to step S403 in the embodiment corresponding to fig. 6, which is not described herein again.
Referring to fig. 8, the decompression module 22 may include: a decompression unit 221, a decoding unit 222;
the decompression unit 221 is configured to perform a bit operation on the first compressed plaintext with the polynomial format to obtain at least two decompressed plaintext with the integer format;
the decoding unit 222 is configured to obtain a scaling factor and at least two exponent item parameters corresponding to the decompressed plaintext, perform a decoding operation on each decompressed plaintext according to the scaling factor and the at least two exponent item parameters, obtain at least two subplaintext with a floating point format, and determine the at least two subplaintext as a second plaintext.
The specific functional implementation manners of the decompression unit 221 and the decoding unit 222 may refer to step S402 in the embodiment corresponding to fig. 6, which is not described herein again.
The embodiment of the application provides a ciphertext decryption and decompression technology based on a polynomial, a second participating node is supported to receive a target compressed ciphertext sent by a first participating node, a private key is adopted to decrypt the target compressed ciphertext to obtain a first compressed plaintext, further, bit operation can be carried out on the first compressed plaintext with the polynomial format to obtain a second plaintext, then a plaintext estimation limit value is obtained, subtraction restoration processing can be carried out on the second plaintext based on the plaintext estimation limit value, and finally the target plaintext is obtained, namely the real result of privacy operation carried out on the first participating node. Therefore, in the process of performing federal learning on the first participating node and the second participating node, at the stage of decrypting the second participating node, the decryption decompression algorithm based on the polynomial provided by the embodiment of the application can decompress a plurality of packed and compressed ciphertexts at one time through onetime decryption operation, so that the decryption overhead caused by ciphertext decryption can be greatly reduced, the running efficiency of a federal learning task can be obviously improved, and the availability of the federal learning is improved.
Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 9, the computer apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, and the computer apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WIFI interface). The memory 1004 may be a highspeed RAM memory or a nonvolatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 9, a memory 1005, which is a kind of computerreadable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 1000 shown in fig. 9, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:
the first participating node acquires a first ciphertext, performs privacy operation on the first ciphertext according to the service data belonging to the first participating node, and generates a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
performing numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction threshold value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
performing addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
and generating a target compressed ciphertext with a polynomial format according to the plaintext prediction limit value and the third ciphertext, and sending the target compressed ciphertext to the second participating node.
It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the data transmission method based on the federal learning in any of the embodiments corresponding to fig. 3, fig. 4, and fig. 5, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer device 2000 may include: the processor 2001, the network interface 2004 and the memory 2005, the computer device 2000 may further include: a user interface 2003, and at least one communication bus 2002. The communication bus 2002 is used to implement connection communication between these components. The user interface 2003 may include a Display (Display) and a Keyboard (Keyboard), and the optional user interface 2003 may further include a standard wired interface and a standard wireless interface. The network interface 2004 may optionally include a standard wired interface, a wireless interface (e.g., WIFI interface). The memory 2004 may be a highspeed RAM memory or a nonvolatile memory, such as at least one disk memory. The memory 2005 may optionally also be at least one memory device located remotely from the aforementioned processor 2001. As shown in fig. 10, the memory 2005, which is a type of computerreadable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 2000 shown in fig. 10, the network interface 2004 may provide a network communication function; and the user interface 2003 is primarily used to provide an interface for user input; and processor 2001 may be used to invoke the device control application stored in memory 2005 to implement:
the second participating node receives the target compressed ciphertext sent by the first participating node, and decrypts the target compressed ciphertext by adopting a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to the plaintext prediction threshold value and the third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext refers to a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext refers to data obtained by the second participating node performing encryption processing on the initial plaintext;
performing bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext;
and acquiring a plaintext prediction limit value, and performing subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext.
It should be understood that the computer device 2000 described in this embodiment of the present application may perform the description of the data transmission method based on the federal learning in the embodiment corresponding to fig. 6, and therefore, the description thereof is omitted here. In addition, the beneficial effects of the same method are not described in detail.
Further, here, it is to be noted that: an embodiment of the present application further provides a computerreadable storage medium, where the computerreadable storage medium stores a computer program executed by the aforementioned data transmission apparatus 1 based on federal learning and the data transmission apparatus 2 based on federal learning, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data transmission method based on federal learning in any one of the embodiments corresponding to fig. 3, fig. 4, fig. 5, and fig. 6 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computerreadable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application.
The computerreadable storage medium may be the federal learning based data transmission apparatus provided in any of the foregoing embodiments, or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plugin hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computerreadable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computerreadable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Further, here, it is to be noted that: embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computerreadable storage medium. The processor of the computer device reads the computer instructions from the computerreadable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided by any one of the embodiments corresponding to fig. 3, fig. 4, fig. 5, and fig. 6.
The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover nonexclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computerreadable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computerreadable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.
Claims (15)
1. A data transmission method based on federal learning is characterized by comprising the following steps:
a first participating node acquires a first ciphertext, and performs privacy operation on the first ciphertext according to service data belonging to the first participating node to generate a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
performing numerical limit estimation on the second ciphertext to obtain a plaintext estimation limit value corresponding to the second ciphertext; the plaintext prediction limit value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
performing addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
and generating a target compressed ciphertext with a polynomial format according to the plaintext prediction limit value and the third ciphertext, and sending the target compressed ciphertext to the second participating node.
2. The method of claim 1, wherein the performing a privacy operation on the first ciphertext according to the service data belonging to the first participating node to generate a second ciphertext comprises:
acquiring service data belonging to the first participating node, and performing linear algebraic operation on the service data and the first ciphertext to obtain a second ciphertext; the linear algebraic operation comprises one or more of a scalar multiplication operation, a homomorphic addition operation, and a scalar addition operation.
3. The method according to claim 2, wherein the performing numerical limit prediction on the second ciphertext to obtain a plaintext prediction limit value corresponding to the second ciphertext comprises:
acquiring a first numerical range corresponding to the initial plaintext and a second numerical range corresponding to the service data;
acquiring a first data dimension of the initial plaintext and a second data dimension of the service data, and determining a target data dimension in the first data dimension and the second data dimension according to the operation type of the linear algebra operation;
and generating a plaintext prediction threshold value corresponding to the second ciphertext based on the operation type of the linear algebraic operation, the first numerical range, the second numerical range and the target data dimension.
4. The method of claim 3, wherein the initial plaintext and the traffic data are matrices;
the obtaining a first data dimension of the initial plaintext and a second data dimension of the service data, and determining a target data dimension in the first data dimension and the second data dimension according to an operation type of the linear algebra operation includes:
acquiring a first matrix width and a first matrix height of the initial plaintext, and determining the first matrix width and the first matrix height as a first data dimension of the initial plaintext;
acquiring a second matrix width and a second matrix height of the service data, and determining the second matrix width and the second matrix height as a second data dimension of the service data;
when the linear algebraic operation is a scalar multiplication operation and the first matrix height is equal to the second matrix width, determining the first matrix height as a target data dimension;
when the linear algebraic operation is a scalar multiplication operation and the second matrix height is equal to the first matrix width, determining the second matrix height as a target data dimension.
5. The method of claim 1, wherein the first ciphertext comprises at least two subciphertexts; the performing privacy operation on the first ciphertext according to the service data belonging to the first participating node to generate a second ciphertext includes:
acquiring data characteristics corresponding to the business data belonging to the first participating node, and clustering the at least two subciphertexts based on the data characteristics to obtain one or more clustering intervals;
and performing homomorphic addition operation on the subciphertexts in the one or more clustering intervals respectively to obtain a clustering subcipher text corresponding to each clustering interval, and determining the one or more clustering subcipher texts as second cipher texts.
6. The method according to claim 5, wherein the performing numerical limit prediction on the second ciphertext to obtain a plaintext prediction limit value corresponding to the second ciphertext comprises:
acquiring a numerical range and a data dimension corresponding to the initial plaintext;
and generating a plaintext prediction limit value corresponding to the second ciphertext based on the operation type of the homomorphic addition operation, the numerical range and the data dimension.
7. The method of claim 1, wherein the adding an offset to the second ciphertext based on the plaintext prediction threshold to obtain a third ciphertext comprises:
obtaining an upper limit value in the plaintext prediction limit value, and performing scalar addition operation on the upper limit value and the second ciphertext to obtain a third ciphertext; and the plaintext corresponding to the third ciphertext is a nonnegative number.
8. The method of claim 1, wherein generating the target compressed ciphertext having a polynomial format based on the plaintext prediction threshold and the third ciphertext comprises:
obtaining a shift parameter for representing the estimated limit value, and performing scalar multiplication operation on the third ciphertext based on the shift parameter to obtain at least two ciphertext monomials;
and carrying out homomorphic addition operation on the at least two ciphertext monomials to obtain the target compressed ciphertext with the polynomial format.
9. A data transmission method based on federal learning is characterized by comprising the following steps:
the second participating node receives the target compressed ciphertext sent by the first participating node, and decrypts the target compressed ciphertext by adopting a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to a plaintext prediction threshold value and a third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and the plaintext prediction limit value is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext is a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext is data obtained by the second participating node performing encryption processing on an initial plaintext;
performing bit operation on the first compressed plaintext with a polynomial format to obtain a second plaintext;
and acquiring the plaintext prediction limit value, and performing subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext.
10. The method of claim 9, wherein said performing a bit operation on said first compressed plaintext having a polynomial format to obtain a second plaintext comprises:
performing bit operation on the first compressed plaintext with a polynomial format to obtain at least two decompressed plaintext with an integer format;
the method comprises the steps of obtaining a scaling factor and at least two exponent item parameters corresponding to decompressed plaintext respectively, decoding each decompressed plaintext according to the scaling factor and the at least two exponent item parameters respectively to obtain at least two subplaintexts with floating point number formats, and determining the at least two subplaintexts as second plaintexts.
11. The method according to claim 10, wherein the obtaining the plaintext prediction threshold and performing subtraction restoration processing on the second plaintext based on the plaintext prediction threshold to obtain a target plaintext comprises:
and acquiring an upper limit value in the plaintext prediction limit value, subtracting the upper limit value from the at least two subplaintexts respectively to obtain at least two recovered subplaintexts, and determining the at least two recovered subplaintexts as target plaintexts.
12. A data transmission apparatus based on federal learning, comprising:
the operation module is used for the first participating node to obtain a first ciphertext, and the first ciphertext is subjected to privacy operation according to the service data belonging to the first participating node to generate a second ciphertext; the first ciphertext refers to data obtained by encrypting the initial plaintext by the second participating node; the second participating node is a node performing federated learning with the first participating node;
the preestimation module is used for preestimating the numerical limit of the second ciphertext to obtain a plaintext preestimation limit value corresponding to the second ciphertext; the plaintext prediction limit value is used for representing the numerical range of the plaintext corresponding to the second ciphertext;
the offset module is used for carrying out addition offset processing on the second ciphertext to obtain a third ciphertext based on the plaintext prediction threshold value;
and the compression module is used for generating a target compressed ciphertext with a polynomial format according to the plaintext prediction limit value and the third ciphertext and sending the target compressed ciphertext to the second participating node.
13. A data transmission apparatus based on federal learning, comprising:
the decryption module is used for the second participating node to receive the target compressed ciphertext sent by the first participating node and decrypt the target compressed ciphertext by adopting a private key to obtain a first compressed plaintext; the first participating node is a node performing federated learning with the second participating node; the target compressed ciphertext is a ciphertext with a polynomial format generated by the first participating node according to a plaintext prediction threshold value and a third ciphertext; the third ciphertext is a ciphertext obtained by performing addition offset processing on the second ciphertext by the first participating node based on the plaintext prediction limit value; the plaintext prediction limit value is obtained by performing numerical limit prediction on the second ciphertext by the first participating node, and the plaintext prediction limit value is used for representing the numerical range of the plaintext corresponding to the second ciphertext; the second ciphertext is a ciphertext generated by the first participating node performing privacy operation on the first ciphertext according to the service data belonging to the first participating node, and the first ciphertext is data obtained by the second participating node performing encryption processing on an initial plaintext;
the decompression module is used for carrying out bit operation on the first compressed plaintext with the polynomial format to obtain a second plaintext;
and the restoration module is used for acquiring the plaintext prediction limit value, and performing subtraction restoration processing on the second plaintext based on the plaintext prediction limit value to obtain a target plaintext.
14. A computer device, comprising: a processor, a memory, and a network interface;
the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 111.
15. A computerreadable storage medium, in which a computer program is stored which is adapted to be loaded by a processor and to carry out the method of any one of claims 1 to 11.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202110680161.XA CN113542228B (en)  20210618  20210618  Data transmission method and device based on federal learning and readable storage medium 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202110680161.XA CN113542228B (en)  20210618  20210618  Data transmission method and device based on federal learning and readable storage medium 
Publications (2)
Publication Number  Publication Date 

CN113542228A true CN113542228A (en)  20211022 
CN113542228B CN113542228B (en)  20220812 
Family
ID=78125137
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202110680161.XA Active CN113542228B (en)  20210618  20210618  Data transmission method and device based on federal learning and readable storage medium 
Country Status (1)
Country  Link 

CN (1)  CN113542228B (en) 
Cited By (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN113965314A (en) *  20211222  20220121  深圳市洞见智慧科技有限公司  Homomorphic encryption processing method and related equipment 
CN113987559A (en) *  20211224  20220128  支付宝(杭州)信息技术有限公司  Method and device for jointly processing data by two parties for protecting data privacy 
CN114006689A (en) *  20211228  20220201  北京瑞莱智慧科技有限公司  Data processing method, device and medium based on federal learning 
CN115086399A (en) *  20220728  20220920  深圳前海环融联易信息科技服务有限公司  Federal learning method and device based on hyper network and computer equipment 
Citations (12)
Publication number  Priority date  Publication date  Assignee  Title 

CN109740376A (en) *  20181221  20190510  哈尔滨工业大学（深圳）  Location privacy protection method, system, equipment and medium based on NN Query 
CN110601814A (en) *  20190924  20191220  深圳前海微众银行股份有限公司  Federal learning data encryption method, device, equipment and readable storage medium 
CN111507481A (en) *  20200417  20200807  腾讯科技（深圳）有限公司  Federated learning system 
CN111898137A (en) *  20200630  20201106  深圳致星科技有限公司  Private data processing method, equipment and system for federated learning 
US20200358599A1 (en) *  20190507  20201112  International Business Machines Corporation  Private and federated learning 
CN111931253A (en) *  20200915  20201113  腾讯科技（深圳）有限公司  Data processing method, system, device and medium based on node group 
CN112183730A (en) *  20201014  20210105  浙江大学  Neural network model training method based on shared learning 
CN112182595A (en) *  20190703  20210105  北京百度网讯科技有限公司  Model training method and device based on federal learning 
CN112583575A (en) *  20201204  20210330  华侨大学  Homomorphic encryptionbased federated learning privacy protection method in Internet of vehicles 
CN112668046A (en) *  20201224  20210416  深圳前海微众银行股份有限公司  Feature interleaving method, apparatus, computerreadable storage medium, and program product 
CN112818374A (en) *  20210302  20210518  深圳前海微众银行股份有限公司  Joint training method, device, storage medium and program product of model 
CN112905187A (en) *  20210220  20210604  深圳前海微众银行股份有限公司  Compiling method, compiling device, electronic equipment and storage medium 

2021
 20210618 CN CN202110680161.XA patent/CN113542228B/en active Active
Patent Citations (12)
Publication number  Priority date  Publication date  Assignee  Title 

CN109740376A (en) *  20181221  20190510  哈尔滨工业大学（深圳）  Location privacy protection method, system, equipment and medium based on NN Query 
US20200358599A1 (en) *  20190507  20201112  International Business Machines Corporation  Private and federated learning 
CN112182595A (en) *  20190703  20210105  北京百度网讯科技有限公司  Model training method and device based on federal learning 
CN110601814A (en) *  20190924  20191220  深圳前海微众银行股份有限公司  Federal learning data encryption method, device, equipment and readable storage medium 
CN111507481A (en) *  20200417  20200807  腾讯科技（深圳）有限公司  Federated learning system 
CN111898137A (en) *  20200630  20201106  深圳致星科技有限公司  Private data processing method, equipment and system for federated learning 
CN111931253A (en) *  20200915  20201113  腾讯科技（深圳）有限公司  Data processing method, system, device and medium based on node group 
CN112183730A (en) *  20201014  20210105  浙江大学  Neural network model training method based on shared learning 
CN112583575A (en) *  20201204  20210330  华侨大学  Homomorphic encryptionbased federated learning privacy protection method in Internet of vehicles 
CN112668046A (en) *  20201224  20210416  深圳前海微众银行股份有限公司  Feature interleaving method, apparatus, computerreadable storage medium, and program product 
CN112905187A (en) *  20210220  20210604  深圳前海微众银行股份有限公司  Compiling method, compiling device, electronic equipment and storage medium 
CN112818374A (en) *  20210302  20210518  深圳前海微众银行股份有限公司  Joint training method, device, storage medium and program product of model 
NonPatent Citations (1)
Title 

王健宗等: "《联邦学习算法综述》", 《大数据》 * 
Cited By (7)
Publication number  Priority date  Publication date  Assignee  Title 

CN113965314A (en) *  20211222  20220121  深圳市洞见智慧科技有限公司  Homomorphic encryption processing method and related equipment 
CN113987559A (en) *  20211224  20220128  支付宝(杭州)信息技术有限公司  Method and device for jointly processing data by two parties for protecting data privacy 
CN113987559B (en) *  20211224  20220408  支付宝(杭州)信息技术有限公司  Method and device for jointly processing data by two parties for protecting data privacy 
CN114006689A (en) *  20211228  20220201  北京瑞莱智慧科技有限公司  Data processing method, device and medium based on federal learning 
CN114006689B (en) *  20211228  20220412  北京瑞莱智慧科技有限公司  Data processing method, device and medium based on federal learning 
CN115086399A (en) *  20220728  20220920  深圳前海环融联易信息科技服务有限公司  Federal learning method and device based on hyper network and computer equipment 
CN115086399B (en) *  20220728  20221206  深圳前海环融联易信息科技服务有限公司  Federal learning method and device based on hyper network and computer equipment 
Also Published As
Publication number  Publication date 

CN113542228B (en)  20220812 
Similar Documents
Publication  Publication Date  Title 

CN113542228B (en)  Data transmission method and device based on federal learning and readable storage medium  
CN110399742B (en)  Method and device for training and predicting federated migration learning model  
US20210105256A1 (en)  Secure Analytics Using Homomorphic and Injective FormatPreserving Encryption  
US10972251B2 (en)  Secure web browsing via homomorphic encryption  
US11196541B2 (en)  Secure machine learning analytics using homomorphic encryption  
CN110189192B (en)  Information recommendation model generation method and device  
CN113505882A (en)  Data processing method based on federal neural network model, related equipment and medium  
CN111563267A (en)  Method and device for processing federal characteristic engineering data  
CN113055153B (en)  Data encryption method, system and medium based on fully homomorphic encryption algorithm  
CN114401079A (en)  Multiparty joint information value calculation method, related equipment and storage medium  
CN112199697A (en)  Information processing method, device, equipment and medium based on shared root key  
CN112347500B (en)  Machine learning method, device, system, equipment and storage medium of distributed system  
CN111428887A (en)  Model training control method, device and system based on multiple computing nodes  
CN112183759A (en)  Model training method, device and system  
CN112989399B (en)  Data processing system and method  
CN111523556A (en)  Model training method, device and system  
CN110874481A (en)  GBDT modelbased prediction method and device  
CN111061720B (en)  Data screening method and device and electronic equipment  
CN111523673B (en)  Model training method, device and system  
CN114726524B (en)  Target data sorting method and device, electronic equipment and storage medium  
CN114817970B (en)  Data analysis method and system based on data source protection and related equipment  
CN114912146B (en)  Data information defense method and system under vertical federal architecture, electronic equipment and storage medium  
CN113055184A (en)  Data encryption and decryption method and device  
CN114448598A (en)  Ciphertext compression method, ciphertext decompression method, device, equipment and storage medium  
CN114764724A (en)  User attribute prediction method, device, computer equipment and storage medium 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 