CN117579215B - Longitudinal federal learning differential privacy protection method and system based on tag sharing - Google Patents

Longitudinal federal learning differential privacy protection method and system based on tag sharing Download PDF

Info

Publication number
CN117579215B
CN117579215B CN202410068146.3A CN202410068146A CN117579215B CN 117579215 B CN117579215 B CN 117579215B CN 202410068146 A CN202410068146 A CN 202410068146A CN 117579215 B CN117579215 B CN 117579215B
Authority
CN
China
Prior art keywords
guest
model
party
data
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410068146.3A
Other languages
Chinese (zh)
Other versions
CN117579215A (en
Inventor
张亮
曹晓光
刘涛
李娇娇
郝春辉
李艾功
徐建忠
吴志刚
李慧珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shiping Information & Technology Co ltd
Original Assignee
Hangzhou Shiping Information & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shiping Information & Technology Co ltd filed Critical Hangzhou Shiping Information & Technology Co ltd
Priority to CN202410068146.3A priority Critical patent/CN117579215B/en
Publication of CN117579215A publication Critical patent/CN117579215A/en
Application granted granted Critical
Publication of CN117579215B publication Critical patent/CN117579215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04KSECRET COMMUNICATION; JAMMING OF COMMUNICATION
    • H04K1/00Secret communication
    • H04K1/02Secret communication by adding a second signal to make the desired signal unintelligible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L25/00Baseband systems
    • H04L25/02Details ; arrangements for supplying electrical power along data transmission lines
    • H04L25/03Shaping networks in transmitter or receiver, e.g. adaptive shaping networks
    • H04L25/03828Arrangements for spectral shaping; Arrangements for providing signals with specified spectral properties
    • H04L25/03866Arrangements for spectral shaping; Arrangements for providing signals with specified spectral properties using scrambling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Power Engineering (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A longitudinal federal learning differential privacy protection method and system based on label sharing, the method comprises the steps of performing differential privacy disturbing on label data of a main party, and sharing the disturbed label data to each guest party; each guest party performs local model parameter training by using the received tag data; after the local model is trained, each guest party carries out differential privacy noise adding disturbance on the local model parameters and sends the differential privacy noise adding disturbance to a master party; and the main party performs aggregation training by combining the local model parameters of each guest party and own model data to obtain a learning model. The invention also discloses a longitudinal federal learning differential privacy protection system based on tag sharing, electronic equipment and a computer readable storage medium. According to the invention, the noise meeting the differential privacy is added on the data tag and the convergence gradient, so that the calculation and memory expenditure is reduced. Noise is added to the gradient information in the model parameter aggregation process so as to prevent information leakage, and meanwhile, the communication times and the communication cost are reduced.

Description

Longitudinal federal learning differential privacy protection method and system based on tag sharing
Technical Field
The invention belongs to the technical field of data privacy and security, and particularly relates to a longitudinal federal learning differential privacy protection method and system based on label sharing.
Background
Nowadays, the development of artificial intelligence technology is on the sun and widely applied to the aspects of life, and the development of artificial intelligence technology is spread over various fields of daily life, industrial production, medical treatment and health and the like. However, the development of artificial intelligence technology requires a tremendous amount of data to push, and the higher the accuracy, the more excellent the model, the more often the large amount of data is required to train. Thus, it can be said that huge data is the basis of machine learning model training. In practice, however, the data may be distributed across various mobile devices or across different owners hands. The amount of data that a separate institution has is limited and it is difficult to train an ideal model. The need for data sharing arises. However, with increasing emphasis on privacy and security of data, the value of data is also increasing, and the willingness of data owners, especially individuals, to share sensitive and private data is decreasing. For companies, it is not possible to easily share private data with other companies. In terms of law, privacy data protection issues are also implemented into laws and regulations. Then how to train a model legally together becomes a problem under the conditions that the security of the private data is ensured and the wish of each participant is met.
In order to solve the problem, in 2016, google has proposed federal learning, and under the framework of federal learning, the problem that different data owners cooperate without exchanging data is solved by building a global model.
In federal learning privacy protection, longitudinal federal learning differs from transverse federal learning in that transverse federal learning has complete data characteristics of a model, including data tags, but data samples are insufficient. With complete data labels, a complete model can be trained using normal machine learning algorithms. The data characteristics of longitudinal federal learning are scattered and distributed on each participant, only a limited number of participants have data labels, and the participants without the data labels cannot independently train a complete model or can only adopt an unsupervised learning method. Therefore, encryption exchange is generally adopted to achieve data sharing, but multiple communications are needed, so that the communication cost is high and the calculation and memory overhead is high. Longitudinal federal learning can also be said to be federal learning according to feature division, and the user groups of all parties are similar, but the obtained user features are different. Longitudinal federal learning allows participants to co-train machine learning models using the scatter signature after aligning the samples without exposing their own raw data.
Although longitudinal federal learning ensures security to some extent, since the parties are not trusted with each other, encryption sharing is selected when parameter sharing is performed in order to ensure the security of the respective data, longitudinal federal learning based on encryption exchange often requires multiple communications, which can generate expensive communication costs and encryption can bring high calculation costs.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides a longitudinal federal learning differential privacy protection method and system based on tag sharing.
In order to achieve the above purpose, the present invention has the following technical scheme:
in a first aspect, a method for differential privacy protection for longitudinal federal learning based on tag sharing is provided, including:
performing differential privacy disturbing on the label data of the main party to obtain disturbed label data, and sharing the disturbed label data to each guest party;
each guest party performs local model parameter training by using the received tag data;
after the local model parameters are trained, each guest party carries out differential privacy noise adding disturbance on the local model parameters and sends the differential privacy noise adding disturbance to a master party;
and the main party performs aggregation training by combining the local model parameters after the variance of each guest is divided into privacy and noise plus disturbance and the model data of the main party, so as to obtain a learning model after the variance of each guest is divided into privacy and noise plus disturbance.
As a preferred scheme, the global data set consisting of all data of the main party and each guest party is as followsIn the followingnFor the number of samples, assume sample +.>Is distributed in (1)mIn the individual guest's prescription, i.eOnly one master has the tag +.>When a main party combines local model parameters after each guest variance division privacy noise adding disturbance and own model data to carry out aggregation training to obtain a learning model after differential privacy noise adding disturbance, the main party and each guest party form participants, wherein each participant is a part of the model>All having local data set +.>By all guest parties->Is combined with main formula->Collaborative training of a modelIn the formula->Refers to parameters of the trained model;
the training process expression is:
in the method, in the process of the invention,is a loss function, +.>Is a regular term.
As a preferred scheme, performing differential privacy scrambling on tag data of a master party to obtain tag data after perturbation, and sharing the tag data after perturbation to each guest party includes:
the label data is divided into barrels, and the primary original data label distribution is counted; usingkRR local differential privacy tamper-on data tag, assuming tagHas the following componentskAs a result, for arbitrary inputRTo->Probability response to a real outcome ∈>ToProbability response to the rest ofk-1 outcome->The method comprises the steps of carrying out a first treatment on the surface of the Set sharing to guest partyjThe final disturbance result of (2) is->The method comprises the steps of carrying out a first treatment on the surface of the Wherein:
if, the firstSample of individual guest>The unique identifier of (2) is->The master will have the sample dataSend to the other guest parties, guest party +.>Based on the identifier pairing, get ∈ ->;/>For consumed privacy costs, by setting privacy costs +.>Size control privacy protection degree of +.>
As a preferred solution, the local model parameter training by each guest party using the received tag data includes:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,for guest prescription->Parameter of->For guest prescription->In the first placetModel function of wheel>For guest prescription->Received noisy tag data +.>Is the square of the F-norm of the loss function, +.>Representing a gradient derivative function;
the parameter is updated and the parameters are updated,the learning rate is:
in the method, in the process of the invention,is a guest prescriptionjIn the first placetParameters of +1 round, ++>Is a guest prescriptionjIn the first placetParameters of the wheel->Is the firsttThe learning rate of the wheel;
until convergence or the set maximum number of stages is reached, to obtain a modelWherein->Is a guest prescriptionjIs a model representation of->For the parameter +.>Is a model function of (a).
As a preferred solution, after the training of the local model parameters, each guest party performs differential privacy noise adding and disturbing on the local model parameters and sends the differential privacy adding and disturbing to the host party, which includes:
model parameters trained on guest partiesRegular clipping is carried out, and a clipping threshold value is introduced>So that->The method comprises the following steps:
in the method, in the process of the invention,is a guest prescriptionjParameters after regular clipping ++>For model parameters->Is a first range of (2);
adding noise satisfying the laplace mechanism, namely:
in the method, in the process of the invention,is a participantjParameters after noise addition, ++>For Laplace mechanism function, ++>As a privacy cost;
obtaining model parametersAnd sends it to the master, wherein +_>Is a guest prescriptionjIs a model representation of the model(s),for adding noise parameter->Is a model function of (a).
As a preferred scheme, the master performs aggregate training by combining local model parameters after privacy and noise adding disturbance of each guest Fang Chafen and own model data, and the learning model after differential privacy and noise adding disturbance is obtained includes:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,is the firsttLoss function of wheel->Parameters trained for model->For convenience, for the characteristic data of party 1 +.>Denoted as->,/>The model trained locally for all guest partners represents the result,/->To at the firsttTraining parameters in wheelwIs a model function of->Real tag data as a master;
gradient information protection using gaussian cut-off mechanism, where noise is of the magnitude,/>The learning rate is:
in the method, in the process of the invention,to at the firsttGradient of the wheel after regular clipping +.>Is the firsttA two-dimensional form of the wheel gradient,is the firsttGradient after wheel noise ++>Mean value 0 and variance +.>Is a Gaussian function of->Is a unit vector>Is the firsttParameters of +1 round;
outputting the noise learning model parameters of each guest partyNoise-added learning model parameters of main formula +.>
In a second aspect, a longitudinal federal learning differential privacy protection system based on tag sharing is provided, comprising:
the label sharing module is used for carrying out differential privacy interference on the label data of the main party to obtain the label data after disturbance, and sharing the label data after disturbance to each guest party;
the guest side local model training module is used for carrying out local model parameter training by each guest side by utilizing the received tag data;
the guest party model parameter transmission module is used for transmitting the local model parameters to a master after differential privacy noise adding disturbance after the local model parameters are trained;
the master model aggregation training module is used for performing aggregation training by combining the local model parameters after the variance of each guest is divided into privacy and noise plus disturbance and the model data of the master model, so as to obtain a learning model after the variance of each guest is divided into privacy and noise plus disturbance.
In a third aspect, there is provided an electronic device comprising:
a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the longitudinal federal learning differential privacy protection method based on tag sharing.
In a fourth aspect, a computer-readable storage medium having stored therein at least one instruction for execution by a processor in an electronic device to implement the tag sharing based longitudinal federal learning differential privacy protection method is provided.
Compared with the prior art, the invention has at least the following beneficial effects:
because the parties are mutually not trusted, encryption sharing is selected when parameter sharing is carried out in order to ensure the security of the respective data, but longitudinal federal learning based on encryption exchange often needs multiple times of communication, which can generate expensive communication cost and encryption can bring high calculation cost. The differential privacy protection method is suitable for longitudinal federal learning of the data model, and the calculation and memory expenditure is reduced by adding the noise meeting the differential privacy on the data labels and the convergence gradient. In order to prevent reconstruction attacks of attackers in the model parameter aggregation process, noise is added to gradient information to prevent information leakage. The method of training the complete local model and adding the noise parameters to share the local model is adopted, so that the communication times are reduced, and the communication cost is further reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of a longitudinal federal learning differential privacy protection method based on tag sharing in an embodiment of the present invention;
fig. 2 is a flowchart of a longitudinal federal learning differential privacy protection method based on tag sharing in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, an embodiment of the present invention proposes a longitudinal federal learning differential privacy protection method based on tag sharing, in a specific application scenario, multiple participants have data with different characteristics, but only one participant has a data tag, and a model needs to be trained jointly by combining data of all the participants. The data tag sharing, parameter sharing and aggregation parameter processes are all protected by differential privacy schemes. The main process of the differential privacy scheme for sharing the data labels is to divide the data labels into barrels according to the distribution of the data labels, and randomly respond to the data labels after dividing the barrels so that the result meets the differential privacy. The main process of differential privacy of parameter sharing is to add noise to the parameters and transmit to the master. The process of the differential privacy scheme of the aggregation training is that the gradient is subjected to regular clipping, noise meeting the requirement of relaxation differential privacy is added to the clipped gradient, and then parameter updating is carried out.
For the longitudinal federal learning algorithm, because the participants are mutually not trusted, encryption sharing can be selected when parameter sharing is performed in order to ensure the security of the respective data, but the longitudinal federal learning based on encryption exchange often needs multiple times of communication, which can generate expensive communication cost and encryption can bring high calculation cost.
In order to reduce communication costs and computation costs, a learning framework for reducing the number of communication times needs to be designed by differential privacy. And differential privacy is less computationally expensive than encryption and secure multiparty computation.
The main problem of differential privacy application to longitudinal federal learning schemes is how to share tag data with user privacy protected to accomplish local training and how to update shared parameters with security protected parameters. The lack of data labels locally and the inability to perform model training locally generally results in the adoption of encryption-based data sharing schemes, such as homomorphic encryption-based longitudinal federal learning schemes. And a trusted third party exists, a homomorphic encryption public key and a private key are generated, the public key is sent to a main party and a guest party, the guest party calculates a forward transmission result at a local party, encryption is performed by using the public key and is sent to the main party, the main party receives all forward transmission results of the guest party, and the forward transmission results of the guest party and the local data are accumulated to calculate a homomorphic encrypted prediction tag value. When back propagation is carried out, the main party calculates the gradient of the loss function relative to the label value, encrypts the calculated gradient homomorphism and then sends the encrypted gradient homomorphism to each guest party, and the guest party and the main party update parameters by utilizing the received gradient.
However, the scheme needs to perform multiple communication and homomorphic encryption calculation, has high communication cost and calculation cost, consumes a large amount of memory, and needs a trusted third party, so that the problems can be well solved by differential privacy.
In federal learning privacy protection, longitudinal federal learning differs from transverse federal learning in that transverse federal learning has complete data characteristics of a model, including data tags, but data samples are insufficient. With complete data labels, a complete model can be trained using normal machine learning algorithms. The data characteristics of longitudinal federal learning are scattered and distributed on each participant, only a limited number of participants have data labels, and the participants without the data labels cannot independently train a complete model or can only adopt an unsupervised learning method. Encryption exchange is generally adopted to achieve data sharing, but multiple communication is needed, and the communication cost is high and the calculation and memory overhead is high. Whereas differential privacy techniques can solve this problem, the commonly used relaxed differential privacy scheme is a gaussian mechanism, i.e. adding gaussian (), compliant noise to the gradient, the commonly used multi-classification local differential privacy scheme is a random response mechanism, which randomly responds to variables with multiple candidates.
In order to solve the technical problems, the longitudinal federal learning differential privacy protection method based on label sharing provided by the embodiment of the invention adopts the following scheme: and firstly, carrying out local differential privacy noise on the label data of the main party, and sharing the label data to other parties by combining with the unique id. And then, training by each guest party by utilizing a neural network according to the received label data and the received data, obtaining a trained model, and carrying out noise adding on the model parameters and sending the model parameters to a master party. The master receives the model parameters of each guest party, combines the model data of the master to carry out aggregation training, adds noise meeting differential privacy to the gradient in the training process, and finally obtains a learning model.
As shown in fig. 2, the longitudinal federal learning differential privacy protection method based on tag sharing in the embodiment of the invention includes:
performing differential privacy disturbing on the label data of the main party to obtain disturbed label data, and sharing the disturbed label data to each guest party;
each guest party performs local model parameter training by using the received tag data;
after the local model parameters are trained, each guest party carries out differential privacy noise adding disturbance on the local model parameters and sends the differential privacy noise adding disturbance to a master party;
and the main party performs aggregation training by combining the local model parameters after the variance of each guest is divided into privacy and noise plus disturbance and the model data of the main party, so as to obtain a learning model after the variance of each guest is divided into privacy and noise plus disturbance.
The global data set composed of all data of the main party and each guest party is as followsIn the followingnFor the number of samples, assume sample +.>Is distributed in (1)mIn the individual guest prescription, i.e.)>Only one master has a labelWhen a main party combines local model parameters after each guest variance division privacy noise adding disturbance and own model data to carry out aggregation training to obtain a learning model after differential privacy noise adding disturbance, the main party and each guest party form participants, wherein each participant is a part of the model>All having local data set +.>Through all guestsIs combined with main formula->Collaborative training of a model->In the formula->Refers to parameters of the trained model;
the training process expression is:
in the method, in the process of the invention,is a loss function, +.>Is a regular term.
In a possible implementation manner, the step of performing differential privacy scrambling on the tag data of the master party to obtain the tag data after the perturbation, and sharing the tag data after the perturbation to each guest party specifically includes:
the label data is divided into barrels, and the primary original data label distribution is counted; usingkRR local differential privacy tamper-on data tag, assuming tagHas the following componentskAs a result, for arbitrary inputRTo->Probability response to a real outcome ∈>ToProbability response to the rest ofk-1 outcome->The method comprises the steps of carrying out a first treatment on the surface of the Set sharing to guest partyjThe final disturbance result of (2) is->The method comprises the steps of carrying out a first treatment on the surface of the Wherein:
if, the firstSample of individual guest>The unique identifier of (2) is->The master will have the sample dataSend to the other guest parties, guest party +.>Based on the identifier pairing, get ∈ ->;/>For consumed privacy costs, by setting privacy costs +.>Size control privacy protection degree of +.>
In a possible implementation manner, the step of performing local model parameter training by using the received tag data by each guest party specifically includes:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,for guest prescription->Parameter of->For guest prescription->In the first placetModel function of wheel>For guest prescription->Received noisy tag data +.>Is the square of the F-norm of the loss function, +.>Representing a gradient derivative function;
the parameter is updated and the parameters are updated,the learning rate is:
in the method, in the process of the invention,is a guest prescriptionjIn the first placetParameters of +1 round, ++>Is a guest prescriptionjIn the first placetParameters of the wheel->Is the firsttThe learning rate of the wheel;
until convergence or the set maximum number of stages is reached, to obtain a modelWherein->Is a guest prescriptionjIs a model representation of->For the parameter +.>Is a model function of (a).
In a possible implementation manner, after the local model parameters are trained, the step that each guest party sends the local model parameters to the host after performing differential privacy plus noise disturbance includes:
model parameters trained on guest partiesRegular clipping is carried out, and a clipping threshold value is introduced>So that->The method comprises the following steps:
in the method, in the process of the invention,is a guest prescriptionjParameters after regular clipping ++>For model parameters->Is a first range of (2);
adding noise satisfying the laplace mechanism, namely:
in the method, in the process of the invention,is a participantjParameters after noise addition, ++>For Laplace mechanism function, ++>As a privacy cost;
obtaining model parametersAnd sends it to the master, wherein +_>Is a guest prescriptionjIs a model representation of the model(s),for adding noise parameter->Is a model function of (a).
In a possible implementation manner, the step of performing aggregate training by combining the local model parameters after each guest variance and privacy noise disturbance and own model data to obtain the learning model after the differential privacy noise disturbance specifically includes:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,is the firsttLoss function of wheel->Parameters trained for model->For convenience, for the characteristic data of party 1 +.>Denoted as->,/>The model trained locally for all guest partners represents the result,/->To at the firsttTraining parameters in wheelwIs a model function of->Real tag data as a master;
gradient information protection using gaussian cut-off mechanism, where noise is of the magnitude,/>The learning rate is:
in the method, in the process of the invention,to at the firsttGradient of the wheel after regular clipping +.>Is the firsttA two-dimensional form of the wheel gradient,is the firsttGradient after wheel noise ++>Mean value 0 and variance +.>Is a Gaussian function of->Is a unit vector>Is the firsttParameters of +1 round;
outputting the noise learning model parameters of each guest partyNoise-added learning model parameters of main formula +.>
In the resulting learning model: the model function is determined by the parameters, so the learning model finally obtained refers to the parameters obtained by training the final convergence. Thus, different participants may receive different training parameters, including one for each guestjAll obtain the own learning model parametersThe other is the principle scienceModel parameters of study>
Another embodiment of the present invention further proposes a longitudinal federal learning differential privacy protection system based on tag sharing, including:
the label sharing module is used for carrying out differential privacy interference on the label data of the main party to obtain the label data after disturbance, and sharing the label data after disturbance to each guest party;
the guest side local model training module is used for carrying out local model parameter training by each guest side by utilizing the received tag data;
the guest party model parameter transmission module is used for transmitting the local model parameters to a master after differential privacy noise adding disturbance after the local model parameters are trained;
the master model aggregation training module is used for performing aggregation training by combining the local model parameters after the variance of each guest is divided into privacy and noise plus disturbance and the model data of the master model, so as to obtain a learning model after the variance of each guest is divided into privacy and noise plus disturbance.
Another embodiment of the present invention also proposes an electronic device including: a memory storing at least one instruction; and the processor executes the instructions stored in the memory to realize the longitudinal federal learning differential privacy protection method based on tag sharing.
Another implementation of the present invention also proposes a computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the tag-sharing based longitudinal federal learning differential privacy protection method.
The instructions stored in the memory may be partitioned into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the tag-sharing based longitudinal federal learning differential privacy preserving method of the present invention, for example. The one or more modules/units may be a series of computer readable instruction segments capable of performing a specified function, which describes the execution of the computer program in a server.
The electronic equipment can be a smart phone, a notebook computer, a palm computer, a cloud server and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the electronic device may also include more or fewer components, or may combine certain components, or different components, e.g., the electronic device may also include input and output devices, network access devices, buses, etc.
The processor may be a central processing unit (CentraL Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DigitaL SignaL Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (fierld-ProgrammabLe Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash Card (FLash Card) or the like, which are provided on the server. Further, the memory may also include both an internal storage unit and an external storage device of the server. The memory is used to store the computer readable instructions and other programs and data required by the server. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, because the content of information interaction and execution process between the above module units is based on the same concept as the method embodiment, specific functions and technical effects thereof may be referred to in the method embodiment section, and details thereof are not repeated herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the system is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (5)

1. The longitudinal federal learning differential privacy protection method based on label sharing is characterized by comprising the following steps of:
performing differential privacy disturbing on the label data of the main party to obtain disturbed label data, and sharing the disturbed label data to each guest party;
each guest party performs local model parameter training by using the received tag data;
after the local model parameters are trained, each guest party carries out differential privacy noise adding disturbance on the local model parameters and sends the differential privacy noise adding disturbance to a master party;
the main party combines the local model parameters after each guest variance division privacy noise adding disturbance and the model data of the main party to carry out aggregation training to obtain a learning model after differential privacy noise adding disturbance;
performing differential privacy disturbing on the label data of the main party to obtain disturbed label data, and sharing the disturbed label data to each guest party comprises the following steps:
the label data is divided into barrels, and the primary original data label distribution is counted; usingkRR local differential privacy tamper-on data tag, assuming tagHas the following componentskAs a result, for arbitrary inputRTo->Probability response to a real outcome ∈>To->Probability response to the rest ofk-1 outcome->The method comprises the steps of carrying out a first treatment on the surface of the Set sharing to guest partyjThe final disturbance result of (2) is->The method comprises the steps of carrying out a first treatment on the surface of the Wherein:
if, the firstSample of individual guest>The unique identifier of (2) is->The master will own sample data +.>Send to the other guest parties, guest party +.>Based on the identifier pairing, get ∈ ->;/>For consumed privacy costs, by setting privacy costs +.>Size control privacy protection degree of +.>
Each guest party utilizing the received tag data to perform local model parameter training comprises:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,for guest prescription->Parameter of->For guest prescription->In the first placetModel function of wheel>For guest prescription->Received noisy tag data +.>Is the square of the F-norm of the loss function, +.>Representing a gradient derivative function;
the parameter is updated and the parameters are updated,the learning rate is:
in the method, in the process of the invention,is a guest prescriptionjIn the first placetParameters of +1 round, ++>Is a guest prescriptionjIn the first placetParameters of the wheel->Is the firsttThe learning rate of the wheel;
until convergence or the set maximum number of stages is reached, to obtain a modelWherein->Is a guest prescriptionjIs a model representation of->For the parameter +.>Is a model function of (2);
after the training of the local model parameters, each guest party carries out differential privacy noise adding disturbance on the local model parameters and then sends the differential privacy noise adding disturbance to the host party, which comprises the following steps:
model parameters trained on guest partiesRegular clipping is carried out, and a clipping threshold value is introduced>So that->The method comprises the following steps:
in the method, in the process of the invention,is a guest prescriptionjParameters after regular clipping ++>For model parameters->Is a first range of (2);
adding noise satisfying the laplace mechanism, namely:
in the method, in the process of the invention,is a participantjParameters after noise addition, ++>For Laplace mechanism function, ++>As a privacy cost;
obtaining model parametersAnd sends it to the master, wherein +_>Is a guest prescriptionjIs a model representation of->For adding noise parameter->Is a model function of (2);
the main principle of the method combines the local model parameters after each guest variance partition privacy noise adding disturbance and the model data of the main principle to carry out aggregation training, and the learning model after the differential privacy noise adding disturbance is obtained comprises the following steps:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,is the firsttLoss function of wheel->Parameters trained for model->For convenience, for the characteristic data of party 1 +.>Denoted as->,/>The model trained locally for all guest partners represents the result,/->To at the firsttTraining parameters in wheelwIs a model function of->Real tag data as a master;
gradient information protection using gaussian cut-off mechanism, where noise is of the magnitude,/>The learning rate is:
in the method, in the process of the invention,to at the firsttGradient of the wheel after regular clipping +.>Is the firsttTwo-range of wheel gradients, +.>Is the firsttGradient after wheel noise ++>Mean value 0 and variance +.>Is a Gaussian function of->Is a unit vector of the number of units,is the firsttParameters of +1 round;
outputting the noise learning model parameters of each guest partyNoise-added learning model parameters of main formula +.>
2. According to claim 1The longitudinal federal learning differential privacy protection method based on label sharing is characterized in that a global data set consisting of all data of a main party and all guest parties is as followsIn the followingnFor the number of samples, assume samplesIs distributed in (1)mIn the individual guest prescription, i.e.)>Only one master has the tag +.>When a main party combines local model parameters after each guest variance division privacy noise adding disturbance and own model data to carry out aggregation training to obtain a learning model after differential privacy noise adding disturbance, the main party and each guest party form participants, wherein each participant is a part of the model>All having local data set +.>By all guest parties->With the main formulaCollaborative training of a model->In the formula->Refers to parameters of the trained model;
the training process expression is:
in the method, in the process of the invention,is a loss function, +.>Is a regular term.
3. A longitudinal federal learning differential privacy protection system based on tag sharing, comprising:
the label sharing module is used for carrying out differential privacy interference on the label data of the main party to obtain the label data after disturbance, and sharing the label data after disturbance to each guest party;
the guest side local model training module is used for carrying out local model parameter training by each guest side by utilizing the received tag data;
the guest party model parameter transmission module is used for transmitting the local model parameters to a master after differential privacy noise adding disturbance after the local model parameters are trained;
the master model aggregation training module is used for performing aggregation training by combining the local model parameters after the variance division privacy noise adding disturbance of each guest and the model data of the master model to obtain a learning model after the differential privacy noise adding disturbance;
performing differential privacy disturbing on the label data of the main party to obtain disturbed label data, and sharing the disturbed label data to each guest party comprises the following steps:
the label data is divided into barrels, and the primary original data label distribution is counted; usingkRR local differential privacy tamper-on data tag, assuming tagHas the following componentskAs a result, forArbitrary inputRTo->Probability response to a real outcome ∈>To->Probability response to the rest ofk-1 outcome->The method comprises the steps of carrying out a first treatment on the surface of the Set sharing to guest partyjThe final disturbance result of (2) is->The method comprises the steps of carrying out a first treatment on the surface of the Wherein:
if, the firstSample of individual guest>The unique identifier of (2) is->The master will own sample data +.>Send to the other guest parties, guest party +.>Based on the identifier pairing, get ∈ ->;/>For consumed privacy costs, by setting privacy costs +.>Size control privacy protection degree of +.>
Each guest party utilizing the received tag data to perform local model parameter training comprises:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,for guest prescription->Parameter of->For guest prescription->In the first placetModel function of wheel>For guest prescription->Received noisy tag data +.>Is the square of the F-norm of the loss function, +.>Representing a gradient derivative function;
the parameter is updated and the parameters are updated,the learning rate is:
in the method, in the process of the invention,is a guest prescriptionjIn the first placetParameters of +1 round, ++>Is a guest prescriptionjIn the first placetParameters of the wheel->Is the firsttThe learning rate of the wheel;
until convergence or the set maximum number of stages is reached, to obtain a modelWherein->Is a guest prescriptionjIs a model representation of->For the parameter +.>Is a model function of (2);
after the training of the local model parameters, each guest party carries out differential privacy noise adding disturbance on the local model parameters and then sends the differential privacy noise adding disturbance to the host party, which comprises the following steps:
model parameters trained on guest partiesRegular clipping is carried out, and a clipping threshold value is introduced>So that->The method comprises the following steps:
in the method, in the process of the invention,is a guest prescriptionjParameters after regular clipping ++>For model parameters->Is a first range of (2);
adding noise satisfying the laplace mechanism, namely:
in the method, in the process of the invention,is a participantjParameters after noise addition, ++>For Laplace mechanism function, ++>As a privacy cost;
obtaining model parametersAnd sends it to the master, wherein +_>Is a guest prescriptionjIs a model representation of->For adding noise parameter->Is a model function of (2);
the main principle of the method combines the local model parameters after each guest variance partition privacy noise adding disturbance and the model data of the main principle to carry out aggregation training, and the learning model after the differential privacy noise adding disturbance is obtained comprises the following steps:
calculating a loss functionAnd gradient->
In the method, in the process of the invention,is the firsttLoss function of wheel->Parameters trained for model->For convenience, for the characteristic data of party 1 +.>Denoted as->,/>The model trained locally for all guest partners represents the result,/->To at the firsttTraining parameters in wheelwIs a model function of->Real tag data as a master;
gradient information protection using gaussian cut-off mechanism, where noise is of the magnitude,/>The learning rate is:
in the method, in the process of the invention,to at the firsttGradient of the wheel after regular clipping +.>Is the firsttTwo-range of wheel gradients, +.>Is the firsttGradient after wheel noise ++>Mean value 0 and variance +.>Is a Gaussian function of->Is a unit vector of the number of units,is the firsttParameters of +1 round;
outputting the noise learning model parameters of each guest partyNoise-added learning model parameters of main formula +.>
4. An electronic device, comprising:
a memory storing at least one instruction; and
A processor executing instructions stored in the memory to implement a tag-sharing based longitudinal federal learning differential privacy protection method according to any of claims 1 to 2.
5. A computer-readable storage medium, characterized by: the computer-readable storage medium has stored therein at least one instruction for execution by a processor in an electronic device to implement the tag-sharing-based longitudinal federal learning differential privacy protection method of any of claims 1-2.
CN202410068146.3A 2024-01-17 2024-01-17 Longitudinal federal learning differential privacy protection method and system based on tag sharing Active CN117579215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410068146.3A CN117579215B (en) 2024-01-17 2024-01-17 Longitudinal federal learning differential privacy protection method and system based on tag sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410068146.3A CN117579215B (en) 2024-01-17 2024-01-17 Longitudinal federal learning differential privacy protection method and system based on tag sharing

Publications (2)

Publication Number Publication Date
CN117579215A CN117579215A (en) 2024-02-20
CN117579215B true CN117579215B (en) 2024-03-29

Family

ID=89884904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410068146.3A Active CN117579215B (en) 2024-01-17 2024-01-17 Longitudinal federal learning differential privacy protection method and system based on tag sharing

Country Status (1)

Country Link
CN (1) CN117579215B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866869A (en) * 2020-07-07 2020-10-30 兰州交通大学 Federal learning indoor positioning privacy protection method facing edge calculation
CN114757361A (en) * 2022-03-25 2022-07-15 中国铁道科学研究院集团有限公司 Multi-mode intermodal transportation data sharing method and system based on federal learning
CN115829063A (en) * 2022-12-07 2023-03-21 兰州交通大学 Wireless positioning differential privacy federation learning method based on dynamic privacy budget
WO2023040429A1 (en) * 2021-09-15 2023-03-23 京东科技信息技术有限公司 Data processing method, apparatus, and device for federated feature engineering, and medium
CN116595584A (en) * 2023-05-19 2023-08-15 西安体育学院 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568061B2 (en) * 2020-04-29 2023-01-31 Robert Bosch Gmbh Private model utility by minimizing expected loss under noise
CN114186694A (en) * 2021-11-16 2022-03-15 浙江大学 Efficient, safe and low-communication longitudinal federal learning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866869A (en) * 2020-07-07 2020-10-30 兰州交通大学 Federal learning indoor positioning privacy protection method facing edge calculation
WO2023040429A1 (en) * 2021-09-15 2023-03-23 京东科技信息技术有限公司 Data processing method, apparatus, and device for federated feature engineering, and medium
CN114757361A (en) * 2022-03-25 2022-07-15 中国铁道科学研究院集团有限公司 Multi-mode intermodal transportation data sharing method and system based on federal learning
CN115829063A (en) * 2022-12-07 2023-03-21 兰州交通大学 Wireless positioning differential privacy federation learning method based on dynamic privacy budget
CN116595584A (en) * 2023-05-19 2023-08-15 西安体育学院 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周俊 ; 方国英 ; 吴楠 ; .联邦学习安全与隐私保护研究综述.西华大学学报(自然科学版).2020,(04),全文. *

Also Published As

Publication number Publication date
CN117579215A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
Dhawan et al. Analysis of various data security techniques of steganography: A survey
Liu et al. Oblivious neural network predictions via minionn transformations
US20210004718A1 (en) Method and device for training a model based on federated learning
Denis et al. Hybrid data encryption model integrating multi-objective adaptive genetic algorithm for secure medical data communication over cloud-based healthcare systems
CN108520181A (en) data model training method and device
CN113221183B (en) Method, device and system for realizing privacy protection of multi-party collaborative update model
US11410081B2 (en) Machine learning with differently masked data in secure multi-party computing
Kalapaaking et al. Blockchain-based federated learning with SMPC model verification against poisoning attack for healthcare systems
CN112199706B (en) Tree model training method and business prediction method based on multi-party safety calculation
CN113051239A (en) Data sharing method, use method of model applying data sharing method and related equipment
Beguier et al. Safer: Sparse secure aggregation for federated learning
Li et al. Ubiquitous intelligent federated learning privacy-preserving scheme under edge computing
CN116561787A (en) Training method and device for visual image classification model and electronic equipment
CN113792890A (en) Model training method based on federal learning and related equipment
CN117579215B (en) Longitudinal federal learning differential privacy protection method and system based on tag sharing
CN116432040B (en) Model training method, device and medium based on federal learning and electronic equipment
CN117349685A (en) Clustering method, system, terminal and medium for communication data
CN116415267A (en) Iterative updating method, device and system for joint learning model and storage medium
CN117478305B (en) Fully homomorphic encryption method, system, terminal and medium based on two-party security cooperation
CN112765898B (en) Multi-task joint training model method, system, electronic equipment and storage medium
CN114912146B (en) Data information defense method and system under vertical federal architecture, electronic equipment and storage medium
US11962562B2 (en) Anonymous message board server verification
CN116451275B (en) Privacy protection method based on federal learning and computing equipment
Lee et al. On Vertically-Drifted First Arrival Position Distribution in Diffusion Channels
Tran et al. Secure Inference via Deep Learning as a Service without Privacy Leakage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant