CN115859367A - Multi-mode federal learning privacy protection method and system - Google Patents

Multi-mode federal learning privacy protection method and system Download PDF

Info

Publication number
CN115859367A
CN115859367A CN202310121251.4A CN202310121251A CN115859367A CN 115859367 A CN115859367 A CN 115859367A CN 202310121251 A CN202310121251 A CN 202310121251A CN 115859367 A CN115859367 A CN 115859367A
Authority
CN
China
Prior art keywords
data
word
text
image
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310121251.4A
Other languages
Chinese (zh)
Other versions
CN115859367B (en
Inventor
李昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Youkegu Technology Co ltd
Original Assignee
Guangzhou Youkegu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Youkegu Technology Co ltd filed Critical Guangzhou Youkegu Technology Co ltd
Priority to CN202310121251.4A priority Critical patent/CN115859367B/en
Publication of CN115859367A publication Critical patent/CN115859367A/en
Application granted granted Critical
Publication of CN115859367B publication Critical patent/CN115859367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Storage Device Security (AREA)
  • Image Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a privacy protection method and a privacy protection system for multi-mode federal learning, which comprises the following steps: for the client only containing image data, adopting a countermeasure network algorithm based on differential privacy generation to process the image data to obtain the image characteristics of the image dataF v And uploading to a server; for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF t And uploading to a server; for simultaneous inclusion of image numbersAccording to the client side of the text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to the middle layers of the first automatic encoder and the second automatic encoderF' v Text featureF' t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after adding noiseF' v Text featureF' t And uploading to a server.

Description

Multi-mode federal learning privacy protection method and system
Technical Field
The invention relates to the technical field of federal learning, in particular to a multi-mode federal learning privacy protection method and system.
Background
With the promotion of the national big data strategy, the machine learning technology developed by relying on big data is widely applied to the fields of Internet of things, traffic and the like. The data mining technology which takes deep learning as the first thing is continuously upgraded and iterated, so that the correlation analysis result is more accurate, the applicable data types are continuously expanded, and various fusion analysis technologies which take multi-modal learning as typical representatives are generated. Each source or form of information is referred to, academically, as a modality, including image, audio, text, and sensor data. The multi-modal learning refers to processing and understanding multi-source modal information through a machine learning method, and the technology eliminates redundancy among the modalities by utilizing complementarity among multi-modal data so as to learn better feature representation. Multimodal learning has been applied in the fields of unmanned driving, video analysis, emotion recognition, and the like. However, multimodal learning encounters two core key issues in big data application promotions: firstly, the traditional multi-modal learning mode needs centralized training after raw data of a user is collected by a server. But the user raw data is closely related to the user individual and may directly contain sensitive information such as the individual's age, sex, etc. More seriously, the multi-modal learning can be associated to analyze more private information. Secondly, all participants in multi-modal learning are reluctant to directly share original data, and the problem of data island exists. The central server cannot collect enough data, thus hindering the development of multimodal techniques.
In the prior art, a multi-mode federated learning model is designed in the face of challenges of privacy security and data islanding in multi-mode learning, all modal data are subjected to modal alignment and modal fusion in a client and submitted to parameter information of a server multi-mode model, however, the scheme requires that the data in each client are distributed in the same way and contain all modal data. In the second prior art, an alignment, integration and mapping network is designed, a multi-mode federal learning framework is realized, visual and text features extracted from images are converted into fine-grained image representations through an attention mechanism, and the client side directly uploads the image features to a server, so that privacy and safety cannot be guaranteed.
The defects of the prior art are as follows: 1) The traditional federated learning architecture is applied to multi-mode federated learning, data in each client are required to be distributed identically and all modal data are contained, the condition assumption is too strong, and the client data cannot be federated if the modalities are uncoordinated; 2) The server is used for assisting the client to align and fuse different modes to perform multi-mode federal learning, so that original data of a user can be inferred through uploaded features although direct data sharing is avoided, and privacy safety cannot be guaranteed.
Disclosure of Invention
The invention provides a multi-mode federal learning privacy protection method, aiming at solving the technical defects that the condition assumption of the federal learning architecture provided by the prior art is too strong and the privacy safety can not be ensured.
In order to realize the purpose, the technical scheme is as follows:
a privacy protection method for multi-modal federated learning comprises the following steps:
S1. the server publishes each client participating in training, wherein the client only contains image data or only contains text data or simultaneously contains the image data and the text data;
S2. for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF v And uploading to a server;
S3. for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF t And uploading to a server;
S4. for the client containing image data and text data simultaneouslyThe client aligns the image data and the text data through the first automatic encoder and the second automatic encoder respectively, and generates image characteristics to the middle layers of the first automatic encoder and the second automatic encoderF' v Text featureF' t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after adding noiseF' v Text featureF' t Uploading to a server;
S5. server learns image characteristics uploaded by client by using characteristic fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Inter-modal characteristics of (a); obtaining a multi-modal model;
S6. the server publishes the multimodal models to the various clients.
Preferably, for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF v The method specifically comprises the following steps:
S21. client side generates random vector by using random generatorR=(r 1 ,…,r k ),kRepresenting random vectorsRDimension (d); inputting the random vector into a generator neural network for generating a countermeasure network to obtain false datad' v
S22. Image data of clientd v And false datad' v Respectively input into discriminator neural network for generating antagonistic network, and respectively outputM(d v ) AndM(d' v ),M(d v )、M(d' v ) Respectively representing the result of the neural network output of the discriminator ifM(d v ) AndM(d' v ) If the following condition is satisfied, outputting dummy datad' v Reception ofCarrying out the stepsS24; otherwise, executing the stepS23;
Figure SMS_1
Wherein γ is a privacy parameter;
Figure SMS_2
representing the probability of the same result being output by the arbiter neural network;
S23. adding (epsilon, delta) -difference privacy protection to the gradient theta by the discriminator, returning the gradient theta to the generator neural network for generating the countermeasure network, and regenerating false data by the generator neural network for generating the countermeasure networkd' v Then the step is executedS22;
S24. For output false datad' v Input it toCNNObtaining image characteristics of image data in a networkF v
Preferably, the determinator adds (e, δ) -differential privacy protection to the gradient θ, specifically:
Figure SMS_3
wherein epsilon is a first privacy budget and delta is a second privacy budget;R() Is a first disturbance function, S is expressed as a disturbance result obtained after the gradient theta is disturbed,Pr[R(θ)∈S)]expressed as findingR(theta) the probability of being revealed,
Figure SMS_4
representing a set of gradient parameters within the theta neighborhood.
Preferably, for a client only containing text data, the text data is processed by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF t The method specifically comprises the following steps:
S31. client-side construction of sensitive attribute dictionaryD Attr
S32. Candidate word dictionary generation by using synonym word stockD Cand Calculating a candidate word dictionaryD Cand Each word in the dictionary and sensitive attribute dictionaryD Attr The euclidean distance of each word in;
S33. replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy;
S34. for each word in the text data after the sensitive word replacementW i Deriving vectors using word embeddingw i =Embed (W i )Will vectorw i Is inputted intoLSTMIn a network, obtaining text featuresF t
Preferably, a candidate word dictionary is calculatedD Cand Dictionary of each word and sensitive attributeD Attr The euclidean distance of each word in (a) specifically includes:
Figure SMS_5
wherein ,vec 1 vec 2 dictionary of candidate words respectivelyD Cand Sensitive attribute dictionaryD Attr The vector of the word in (1) is,vec 1 =(x 1 ,…,x n ),vec 2 =(y 1 ,…,y n ),x i y i are respectively asvec 1 vec 2 The word (a) in (b),i∈[1,n],nrepresenting the dimensions of the word vector.
Preferably, all the sensitive words in the text data are replaced with candidate words, and the replacement probability satisfies the random response probability of the sensitive word replacement algorithm based on the localized differential privacy, which specifically includes:
Figure SMS_6
wherein ,G() Representing a second perturbation function;xa sensitive word is represented that is,x' represents a candidate word, G (x) = y represents a result y obtained by subjecting the sensitive word x to a perturbation function G (),Pr[G(x)=y]representing the probability of finding G (x) = y; in the second perturbation function, each input sensitive word x is kept unchanged with probability p and used from the candidate word dictionary with probability qD Cand The perturbation probability can be described as:
Figure SMS_7
where y = x indicates no word replacement, K indicates the size of each word candidate dictionary, and is composed of the first a candidate words with the shortest euclidean distance for each word.
Preferably, for image data and text data, the loss function of the first automatic encoder and the second automatic encoder is described as follows:
Figure SMS_8
wherein λ is a weight parameter, wherein,L v as a function of the loss of the first auto-encoder,L t as a function of the loss of the second auto-encoder,L c is a function for measuring the correlation loss between the image modality and the text modality;X v in order to be the image data,X t in the case of text data, the text data,X' v X' t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f v f t a non-linear feature extractor respectively representing an image mode and a text mode,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Vpotential nulls representing text modalitiesThe matrix representation form of (1).
Preferably, the server learns the image features uploaded by the client by using the feature fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Is specifically expressed as:
Figure SMS_9
wherein ,Fusion()in order to feature the converged network,F m are multi-modal features.
Meanwhile, the invention also provides a multi-mode federal learning privacy protection system, and the specific scheme is as follows:
a multi-mode privacy protection system for federal learning comprises a server and a plurality of clients; and when the privacy protection system carries out privacy protection, the method steps of the multi-mode federal learning privacy protection method are executed.
Compared with the prior art, the invention has the beneficial effects that:
(1) According to the multi-mode federated learning privacy protection method provided by the invention, different multi-mode pre-training modes are adopted for training clients with different modes, the client does not need to be assumed to contain all mode data, the method is more suitable for actual scene requirements, and the practicability of the method is higher.
(2) According to the multi-mode federal learning privacy protection method, the characteristics obtained by training the client are subjected to privacy protection and then uploaded to the server, so that the server is ensured not to deduce the privacy information of the user through the characteristics uploaded by the client, and the privacy and the safety of data of each participant are improved.
(3) According to the invention, for the client only containing image data, the image data is processed by adopting a differential privacy generation countermeasure network algorithm (DPGAN algorithm), the false features which are not similar to the original image features are obtained by the differential privacy generation countermeasure network algorithm, and then the false features are uploaded to the server for aggregation, so that the privacy leakage of the original image data is effectively prevented, and meanwhile, the usability of the image features is improved.
(4) The method only comprises the client of the text data, processes the text data by adopting a sensitive word replacement algorithm (UTLDP algorithm) based on the localized differential privacy, replaces the sensitive words contained in the text, and then carries out the replacement by the sensitive wordsLSTMThe text features obtained by the network are uploaded to the server for aggregation, so that the privacy of the original text data is effectively prevented from being leaked, and the usability of the text features is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a specific implementation of a privacy protection method for multi-modal federal learning.
FIG. 2 is a diagram of image characterization for a client containing both image data and text dataF' v Text featureF' t The implementation of extraction is schematically shown.
Fig. 3 is a schematic diagram of an implementation of a challenge network generation algorithm based on differential privacy.
Fig. 4 is a schematic diagram of an implementation of a sensitive word replacement algorithm based on localized differential privacy.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Fig. 1 is a schematic implementation diagram of a multi-modal federated learning privacy protection method provided by the present invention, and as shown in fig. 1, the multi-modal federated learning privacy protection method provided by the present invention includes the following steps:
S1. the server publishes each client participating in training, wherein the client only contains image data or only contains text data or simultaneously contains the image data and the text data;
S2. for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF v And uploading to a server;
S3. for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF t And uploading to a server;
S4. for a client side simultaneously containing image data and text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to intermediate layers of the first automatic encoder and the second automatic encoderF' v Text featureF' t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after noise additionF' v Text featureF' t Uploading to a server;
S5. server learns image characteristics uploaded by client by using characteristic fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Inter-modal characteristics of (a); obtaining a multi-modal model;
S6. the server publishes the multimodal models to the various clients.
In particular toIn the implementation process, as shown in fig. 3, for a client only containing image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain image features of the image dataF v The method specifically comprises the following steps:
S21. client side generates random vector by using random generatorR=(r 1 ,…,r k ),kRepresenting random vectorsRDimension (d); inputting the random vector into a generator neural network for generating a countermeasure network to obtain false datad' v
S22. Image data of clientd v And false datad' v Respectively input into discriminator neural network for generating countermeasure network, and respectively outputM(d v ) AndM(d' v ),M(d v )、M(d' v ) Respectively representing the result of the neural network output of the discriminator ifM(d v ) AndM(d' v ) If the following condition is satisfied, outputting dummy datad' v Performing the stepS24; otherwise, executing the stepS23;
Figure SMS_10
Wherein γ is a privacy parameter;
Figure SMS_11
representing the probability of the same result output by the neural network of the discriminator; the closer gamma is to 1, the more the input data cannot be distinguished through the output result, and the stronger the indistinguishability of the input data is;Pr[y=M(d v )]representation findingM(d v ) The probability of being compromised;
S23. the arbiter adds (epsilon, delta) -differential privacy protection to the gradient theta and returns to the producer neural network that produced the countermeasure networkRegenerating false data by a generator neural networkd' v Then the step is executedS22;
S24. For output false datad' v Inputting it intoCNNObtaining image characteristics of image data in a networkF v
In a specific implementation process, the judger adds (e, δ) -differential privacy protection to the gradient θ, specifically:
Figure SMS_12
wherein epsilon is a first privacy budget and delta is a second privacy budget;R() Is a first disturbance function, S is expressed as a disturbance result obtained after the gradient theta is disturbed,Pr[R(θ)∈S)]expressed as findingR(theta) the probability of being revealed,
Figure SMS_13
representing a set of gradient parameters within a neighborhood of θ; epsilon is used to control the privacy protection level, the smaller epsilon, the greater the privacy protection capability provided. δ represents the probability that the tolerable privacy budget exceeds ε.
The gradient theta is the derivative of the objective function in the neural network training process, and the gradient value can reflect the variation trend of input data to improve the model precision and optimize the objective function value. Because the differential privacy has a post-processing characteristic, the differential privacy is added to the gradient, and theoretically guaranteed differential privacy protection can be provided for input data. Therefore, the synthesized fake data can capture the rich semantics of the original data and can also meet the differential privacy mechanism.
In a specific implementation process, as shown in fig. 4, for a client that only includes text data, the text data is processed by using a sensitive word replacement algorithm based on localized differential privacy to obtain text featuresF t The method specifically comprises the following steps:
S31. client-side construction of sensitive attribute dictionaryD Attr Including username, gender, sensitive locationSensitive verbs, sensitive nouns, etc.;
S32. candidate word dictionary generation by using synonym word stockD Cand Calculating a candidate word dictionaryD Cand Dictionary of each word and sensitive attributeD Attr The euclidean distance of each word in (a);
S33. replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy;
S34. for each word in the text data after the sensitive word replacementW i Deriving vectors using word embeddingw i =Embed (W i )Will vectorw i Is inputted intoLSTMIn the network, obtaining text featuresF t
Word embedding (Embed) is a type representation of a word. It means that a high-dimensional space with the number of dimensions of all words is embedded into a low-dimensional continuous vector space, and each word or phrase is mapped to a vector on a real number domain.w i =Embed(W i )What is meant is thatW i Low-dimensional vector obtained by embedding words into low-dimensional continuous vector spacew i
In a specific implementation process, a candidate word dictionary is calculatedD Cand Dictionary of each word and sensitive attributeD Attr The euclidean distance of each word in (a) specifically includes:
Figure SMS_14
wherein ,vec 1 vec 2 dictionary of candidate words respectivelyD Cand Sensitive attribute dictionaryD Attr The vector of the word in (1) is,vec 1 =(x 1 ,…,x n ),vec 2 =(y 1 ,…,y n ),x i y i are respectively asvec 1 vec 2 The word (a) in (b),i∈[1,n],nrepresenting the dimensions of the word vector.
In a specific implementation process, all sensitive words in text data are replaced by candidate words, and the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy, which specifically includes:
Figure SMS_15
wherein ,G() Representing a second perturbation function;xa sensitive word is represented that is,x' represents a candidate word, G (x) = y represents a result y obtained by subjecting the sensitive word x to a perturbation function G (),Pr[G(x)=y]representing the probability of finding G (x) = y; in the second perturbation function, each input sensitive word x is kept unchanged with probability p and used from the candidate word dictionary with probability qD Cand The perturbation probability can be described as:
Figure SMS_16
wherein y = x represents that the word is not replaced, K represents the size of each word candidate dictionary, and is composed of the first a candidate words with the shortest euclidean distance of each word, and the perturbation mode of the G () perturbation function replaces the sensitive word x with a certain probability q, and does not replace the sensitive word x with a probability p. Observing the word replacement probability q, the candidate words with smaller Euclidean distance have higher probability to be replaced. Through the disturbance, an attacker cannot judge whether the sensitive words are replaced, and original text information is reserved to the maximum extent, so that text characteristic information is better reserved.
In a specific implementation process, as shown in fig. 2, for image data and text data, the loss functions of the first and second automatic encoders are described as follows:
Figure SMS_17
wherein λ is a weight parameter, wherein,L v as a function of the loss of the first auto-encoder,L t as a function of the loss of the second auto-encoder,L c is a function for measuring the correlation loss between the image modality and the text modality;X v as the data of the image is to be displayed,X t in the case of text data, the text data,X' v X' t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f v f t a non-linear feature extractor respectively representing an image mode and a text mode,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Va matrix representation of the underlying space representing the text modality.
In a specific implementation process, the server learns the image characteristics uploaded by the client by using the characteristic fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Is specifically expressed as:
Figure SMS_18
wherein ,Fusion()in order to feature the converged network,F m are multi-modal features.
Example 2
The embodiment provides a multi-modal federated learning privacy protection system, as shown in fig. 1, the specific scheme is as follows:
a multi-mode privacy protection system for federal learning comprises a server and a plurality of clients; and when the privacy protection system carries out privacy protection, executing the method steps of the multi-mode federal learning privacy protection method in the embodiment 1.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributing to the prior art or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method described in the embodiments of the present inventionAll or part of the steps of (a). And the aforementioned storage medium includes:Udisk, portable hard disk, read-only memory: (ROMRead-OnlyMemory) (ii) a random access memoryRAMRandomAccessMemory) Various media that can store program code, such as a magnetic disk or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A multi-mode federal learning privacy protection method is characterized in that: the method comprises the following steps:
S1. the server publishes each client participating in training, wherein the client only contains image data or only contains text data or simultaneously contains the image data and the text data;
S2. for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF v And uploading to a server;
S3. for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF t And uploading to a server;
S4. for a client side simultaneously containing image data and text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to intermediate layers of the first automatic encoder and the second automatic encoderF' v Text featureF' t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after noise additionF' v Text featureF' t Uploading the data to a server;
S5. server learns image characteristics uploaded by client by using characteristic fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Inter-modal characteristics of (a); obtaining a multi-modal model;
S6. the server publishes the multimodal models to the various clients.
2. The method for privacy protection for multi-modal federated learning of claim 1, wherein: for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF v The method specifically comprises the following steps:
S21. client side generates random vector by using random generatorR=(r 1 ,…,r k ),kRepresenting random vectorsRDimension of (d); inputting the random vector into a generator neural network for generating a countermeasure network to obtain false datad' v
S22. Image data of clientd v And false datad' v Respectively input into discriminator neural network for generating antagonistic network, and respectively outputM(d v ) AndM(d' v ),M(d v )、M(d' v ) Respectively representing the result of the neural network output of the discriminator ifM(d v ) AndM(d' v ) If the following condition is satisfied, outputting dummy datad' v Performing the stepS24; otherwise, executing the stepS23;
Figure QLYQS_1
Wherein γ is a privacy parameter;
Figure QLYQS_2
representing the probability of the same result output by the neural network of the discriminator;
S23. adding (epsilon, delta) -difference privacy protection to the gradient theta by the discriminator, returning the gradient theta to the generator neural network for generating the countermeasure network, and regenerating false data by the generator neural network for generating the countermeasure networkd' v Then the step is executedS22;
S24. For output false datad' v Input it toCNNObtaining image characteristics of image data in a networkF v
3. The privacy preserving method of multi-modal federal learning as in claim 2, wherein: the judger adds (epsilon, delta) -differential privacy protection to the gradient theta, specifically:
Figure QLYQS_3
wherein epsilon is a first privacy budget and delta is a second privacy budget;R() Is a first disturbance function, S is expressed as a disturbance result obtained after the gradient theta is disturbed,Pr[R(θ)∈S)]expressed as findingR(θ) probability of being compromised;
Figure QLYQS_4
representing a set of gradient parameters within the theta neighborhood.
4. The privacy preserving method of multimodal federated learning as claimed in claim 1, wherein: for the client only containing text data, the sensitive word replacement algorithm based on the localized differential privacy is adopted to perform the methodProcessing the text data to obtain text characteristicsF t The method specifically comprises the following steps:
S31. client-side construction of sensitive attribute dictionaryD Attr
S32. Candidate word dictionary generation by using synonym word stockD Cand Calculating a candidate word dictionaryD Cand Dictionary of each word and sensitive attributeD Attr The euclidean distance of each word in;
S33. replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy;
S34. for each word in the text data after the sensitive word replacementW i Deriving vectors using word embeddingw i =Embed(W i )Will vectorw i Is inputted intoLSTMIn the network, obtaining text featuresF t
5. The method for privacy protection for multi-modal federated learning of claim 4, wherein: computing candidate word dictionaryD Cand Each word in the dictionary and sensitive attribute dictionaryD Attr The euclidean distance of each word in (a) specifically includes:
Figure QLYQS_5
wherein ,vec 1 vec 2 dictionary of candidate words respectivelyD Cand Sensitive attribute dictionaryD Attr The vector of the word in (1) is,vec 1 =(x 1 ,…,x n ),vec 2 =(y 1 ,…,y n ),x i y i are respectively asvec 1 vec 2 The word (a) in (b),i∈[1,n],nrepresenting the dimensions of the word vector.
6. The privacy preserving method of multi-modal federal learning as in claim 5, wherein: replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy, and the method specifically comprises the following steps:
Figure QLYQS_6
wherein ,G() Representing a second perturbation function;xa sensitive word is represented that is,x' represents a candidate word, G (x) = y represents a result y of the sensitive word x through a perturbation function G (),Pr[G(x)=y]representing the probability of finding G (x) = y; in the second perturbation function, each input sensitive word x is kept unchanged with probability p and used from the candidate word dictionary with probability qD Cand The perturbation probability can be described as:
Figure QLYQS_7
where y = x indicates no word replacement, K indicates the size of each word candidate dictionary, and is composed of the first a candidate words with the shortest euclidean distance for each word.
7. The method for privacy protection for multi-modal federated learning according to any of claims 1-6, wherein: for image data and text data, the loss functions of the first and second automatic encoders are described as follows:
Figure QLYQS_8
wherein λ is a weight parameter, wherein,L v as a function of the loss of the first auto-encoder,L t as a function of the loss of the second auto-encoder,L c a correlation loss function for measuring the correlation between an image modality and a text modality;X v as the data of the image is to be displayed,X t in the case of text data, the text data,X' v X' t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f v f t a non-linear feature extractor respectively representing an image modality and a text modality,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Va matrix representation of the underlying space representing the text modality.
8. The method for privacy protection for multi-modal federated learning of claim 7, wherein: server learns image characteristics uploaded by client by using characteristic fusion networkF v Text featureF t Image characteristicsF' v Text featureF' t Is specifically expressed as:
Figure QLYQS_9
wherein ,Fusion()in order to feature the converged network,F m are multi-modal features.
9. A privacy preserving system for multimodal federated learning, comprising: the system comprises a server and a plurality of clients; when the privacy protection system performs privacy protection, the method steps of the multi-modal federal learning privacy protection method claimed in any one of claims 1-8 are executed.
CN202310121251.4A 2023-02-16 2023-02-16 Privacy protection method and system for multi-mode federal learning Active CN115859367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310121251.4A CN115859367B (en) 2023-02-16 2023-02-16 Privacy protection method and system for multi-mode federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310121251.4A CN115859367B (en) 2023-02-16 2023-02-16 Privacy protection method and system for multi-mode federal learning

Publications (2)

Publication Number Publication Date
CN115859367A true CN115859367A (en) 2023-03-28
CN115859367B CN115859367B (en) 2023-05-16

Family

ID=85658185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310121251.4A Active CN115859367B (en) 2023-02-16 2023-02-16 Privacy protection method and system for multi-mode federal learning

Country Status (1)

Country Link
CN (1) CN115859367B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595587A (en) * 2023-07-14 2023-08-15 江西通友科技有限公司 Document steganography method and document management method based on secret service
CN118228196A (en) * 2024-05-22 2024-06-21 徐州医科大学 Federal multi-mode data mining method and system based on multi-security policy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111966883A (en) * 2020-08-13 2020-11-20 成都考拉悠然科技有限公司 Zero sample cross-mode retrieval method combining automatic encoder and generation countermeasure network
US20220083911A1 (en) * 2019-01-18 2022-03-17 Huawei Technologies Co., Ltd. Enhanced Privacy Federated Learning System
CN114861817A (en) * 2022-05-26 2022-08-05 中国海洋大学 Multi-source heterogeneous data fusion method based on federal learning
US20220327809A1 (en) * 2021-07-12 2022-10-13 Beijing Baidu Netcom Science Technology Co., Ltd. Method, device and storage medium for training model based on multi-modal data joint learning
US20220398343A1 (en) * 2021-06-10 2022-12-15 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Dynamic differential privacy to federated learning systems
WO2022257730A1 (en) * 2021-06-11 2022-12-15 支付宝(杭州)信息技术有限公司 Methods and apparatus for multiple parties to collaboratively update model while protecting privacy, and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220083911A1 (en) * 2019-01-18 2022-03-17 Huawei Technologies Co., Ltd. Enhanced Privacy Federated Learning System
CN111966883A (en) * 2020-08-13 2020-11-20 成都考拉悠然科技有限公司 Zero sample cross-mode retrieval method combining automatic encoder and generation countermeasure network
US20220398343A1 (en) * 2021-06-10 2022-12-15 Hong Kong Applied Science And Technology Research Institute Co., Ltd. Dynamic differential privacy to federated learning systems
WO2022257730A1 (en) * 2021-06-11 2022-12-15 支付宝(杭州)信息技术有限公司 Methods and apparatus for multiple parties to collaboratively update model while protecting privacy, and system
US20220327809A1 (en) * 2021-07-12 2022-10-13 Beijing Baidu Netcom Science Technology Co., Ltd. Method, device and storage medium for training model based on multi-modal data joint learning
CN114861817A (en) * 2022-05-26 2022-08-05 中国海洋大学 Multi-source heterogeneous data fusion method based on federal learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭作文 等: "机器学习隐私保护研究综述" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116595587A (en) * 2023-07-14 2023-08-15 江西通友科技有限公司 Document steganography method and document management method based on secret service
CN116595587B (en) * 2023-07-14 2023-09-22 江西通友科技有限公司 Document steganography method and document management method based on secret service
CN118228196A (en) * 2024-05-22 2024-06-21 徐州医科大学 Federal multi-mode data mining method and system based on multi-security policy

Also Published As

Publication number Publication date
CN115859367B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN115859367A (en) Multi-mode federal learning privacy protection method and system
Zhang et al. Deep relation embedding for cross-modal retrieval
Ohishi et al. Trilingual semantic embeddings of visually grounded speech with self-attention mechanisms
CN113779225B (en) Training method of entity link model, entity link method and device
WO2022187063A1 (en) Cross-modal processing for vision and language
Gao et al. A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective
Fu et al. Contrastive transformer based domain adaptation for multi-source cross-domain sentiment classification
Dai et al. Analysis of multimodal data fusion from an information theory perspective
JP2023530893A (en) Data processing and trading decision system
Huang et al. An effective multimodal representation and fusion method for multimodal intent recognition
Quan et al. Multimodal Sentiment Analysis Based on Cross‐Modal Attention and Gated Cyclic Hierarchical Fusion Networks
Pande et al. Development and deployment of a generative model-based framework for text to photorealistic image generation
Chang et al. Disentangling audio content and emotion with adaptive instance normalization for expressive facial animation synthesis
Wu et al. Deconfounded and explainable interactive vision-language retrieval of complex scenes
Liu et al. A multimodal approach for multiple-relation extraction in videos
CN116578738B (en) Graph-text retrieval method and device based on graph attention and generating countermeasure network
Sun et al. Rumour detection technology based on the BiGRU_capsule network
Yi et al. Vlp2msa: expanding vision-language pre-training to multimodal sentiment analysis
Wang et al. A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning
Liu et al. PrimitiveNet: decomposing the global constraints for referring segmentation
Zheng et al. DJMF: A discriminative joint multi-task framework for multimodal sentiment analysis based on intra-and inter-task dynamics
Singh et al. Visual content generation from textual description using improved adversarial network
Wu et al. Deep adversarial domain adaptation network
Chen et al. Multi-level, multi-modal interactions for visual question answering over text in images
Kiran et al. Getting around the semantics challenge in hateful memes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant