CN115859367A

CN115859367A - Multi-mode federal learning privacy protection method and system

Info

Publication number: CN115859367A
Application number: CN202310121251.4A
Authority: CN
Inventors: 李昕
Original assignee: Guangzhou Youkegu Technology Co ltd
Current assignee: Guangzhou Youkegu Technology Co ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-03-28
Anticipated expiration: 2043-02-16
Also published as: CN115859367B

Abstract

The invention relates to a privacy protection method and a privacy protection system for multi-mode federal learning, which comprises the following steps: for the client only containing image data, adopting a countermeasure network algorithm based on differential privacy generation to process the image data to obtain the image characteristics of the image dataF _v And uploading to a server; for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF _t And uploading to a server; for simultaneous inclusion of image numbersAccording to the client side of the text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to the middle layers of the first automatic encoder and the second automatic encoderF' _v Text featureF' _t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after adding noiseF' _v Text featureF' _t And uploading to a server.

Description

Multi-mode federal learning privacy protection method and system

Technical Field

The invention relates to the technical field of federal learning, in particular to a multi-mode federal learning privacy protection method and system.

Background

With the promotion of the national big data strategy, the machine learning technology developed by relying on big data is widely applied to the fields of Internet of things, traffic and the like. The data mining technology which takes deep learning as the first thing is continuously upgraded and iterated, so that the correlation analysis result is more accurate, the applicable data types are continuously expanded, and various fusion analysis technologies which take multi-modal learning as typical representatives are generated. Each source or form of information is referred to, academically, as a modality, including image, audio, text, and sensor data. The multi-modal learning refers to processing and understanding multi-source modal information through a machine learning method, and the technology eliminates redundancy among the modalities by utilizing complementarity among multi-modal data so as to learn better feature representation. Multimodal learning has been applied in the fields of unmanned driving, video analysis, emotion recognition, and the like. However, multimodal learning encounters two core key issues in big data application promotions: firstly, the traditional multi-modal learning mode needs centralized training after raw data of a user is collected by a server. But the user raw data is closely related to the user individual and may directly contain sensitive information such as the individual's age, sex, etc. More seriously, the multi-modal learning can be associated to analyze more private information. Secondly, all participants in multi-modal learning are reluctant to directly share original data, and the problem of data island exists. The central server cannot collect enough data, thus hindering the development of multimodal techniques.

In the prior art, a multi-mode federated learning model is designed in the face of challenges of privacy security and data islanding in multi-mode learning, all modal data are subjected to modal alignment and modal fusion in a client and submitted to parameter information of a server multi-mode model, however, the scheme requires that the data in each client are distributed in the same way and contain all modal data. In the second prior art, an alignment, integration and mapping network is designed, a multi-mode federal learning framework is realized, visual and text features extracted from images are converted into fine-grained image representations through an attention mechanism, and the client side directly uploads the image features to a server, so that privacy and safety cannot be guaranteed.

The defects of the prior art are as follows: 1) The traditional federated learning architecture is applied to multi-mode federated learning, data in each client are required to be distributed identically and all modal data are contained, the condition assumption is too strong, and the client data cannot be federated if the modalities are uncoordinated; 2) The server is used for assisting the client to align and fuse different modes to perform multi-mode federal learning, so that original data of a user can be inferred through uploaded features although direct data sharing is avoided, and privacy safety cannot be guaranteed.

Disclosure of Invention

The invention provides a multi-mode federal learning privacy protection method, aiming at solving the technical defects that the condition assumption of the federal learning architecture provided by the prior art is too strong and the privacy safety can not be ensured.

In order to realize the purpose, the technical scheme is as follows:

a privacy protection method for multi-modal federated learning comprises the following steps:

S1. the server publishes each client participating in training, wherein the client only contains image data or only contains text data or simultaneously contains the image data and the text data;

S2. for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF _v And uploading to a server;

S3. for a client only containing text data, processing the text data by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF _t And uploading to a server;

S4. for the client containing image data and text data simultaneouslyThe client aligns the image data and the text data through the first automatic encoder and the second automatic encoder respectively, and generates image characteristics to the middle layers of the first automatic encoder and the second automatic encoderF' _v Text featureF' _t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after adding noiseF' _v Text featureF' _t Uploading to a server;

S5. server learns image characteristics uploaded by client by using characteristic fusion networkF _v Text featureF _t Image characteristicsF' _v Text featureF' _t Inter-modal characteristics of (a); obtaining a multi-modal model;

S6. the server publishes the multimodal models to the various clients.

Preferably, for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF _v The method specifically comprises the following steps:

S21. client side generates random vector by using random generatorR=(r ₁ ,…,r _k )，kRepresenting random vectorsRDimension (d); inputting the random vector into a generator neural network for generating a countermeasure network to obtain false datad' _v ；

S22. Image data of clientd _v And false datad' _v Respectively input into discriminator neural network for generating antagonistic network, and respectively outputM(d _v ) AndM(d' _v )，M(d _v )、M(d' _v ) Respectively representing the result of the neural network output of the discriminator ifM(d _v ) AndM(d' _v ) If the following condition is satisfied, outputting dummy datad' _v Reception ofCarrying out the stepsS24; otherwise, executing the stepS23；

。

Wherein γ is a privacy parameter;

representing the probability of the same result being output by the arbiter neural network;

S23. adding (epsilon, delta) -difference privacy protection to the gradient theta by the discriminator, returning the gradient theta to the generator neural network for generating the countermeasure network, and regenerating false data by the generator neural network for generating the countermeasure networkd' _v Then the step is executedS22；

S24. For output false datad' _v Input it toCNNObtaining image characteristics of image data in a networkF _v 。

Preferably, the determinator adds (e, δ) -differential privacy protection to the gradient θ, specifically:

。

wherein epsilon is a first privacy budget and delta is a second privacy budget;R() Is a first disturbance function, S is expressed as a disturbance result obtained after the gradient theta is disturbed,Pr[R(θ)∈S)]expressed as findingR(theta) the probability of being revealed,

representing a set of gradient parameters within the theta neighborhood.

Preferably, for a client only containing text data, the text data is processed by adopting a sensitive word replacement algorithm based on localized differential privacy to obtain text characteristicsF _t The method specifically comprises the following steps:

S31. client-side construction of sensitive attribute dictionaryD _Attr ；

S32. Candidate word dictionary generation by using synonym word stockD _Cand Calculating a candidate word dictionaryD _Cand Each word in the dictionary and sensitive attribute dictionaryD _Attr The euclidean distance of each word in;

S33. replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy;

S34. for each word in the text data after the sensitive word replacementW _i Deriving vectors using word embeddingw _i =Embed (W _i )Will vectorw _i Is inputted intoLSTMIn a network, obtaining text featuresF _t 。

Preferably, a candidate word dictionary is calculatedD _Cand Dictionary of each word and sensitive attributeD _Attr The euclidean distance of each word in (a) specifically includes:

。

wherein ,vec ₁ 、vec ₂ dictionary of candidate words respectivelyD _Cand Sensitive attribute dictionaryD _Attr The vector of the word in (1) is,vec ₁ =(x ₁ ,…,x _n )，vec ₂ =(y ₁ ,…,y _n )，x _i 、y _i are respectively asvec ₁ 、vec ₂ The word (a) in (b),i∈[1,n]，nrepresenting the dimensions of the word vector.

Preferably, all the sensitive words in the text data are replaced with candidate words, and the replacement probability satisfies the random response probability of the sensitive word replacement algorithm based on the localized differential privacy, which specifically includes:

。

wherein ,G() Representing a second perturbation function;xa sensitive word is represented that is,x' represents a candidate word, G (x) = y represents a result y obtained by subjecting the sensitive word x to a perturbation function G (),Pr[G(x)=y]representing the probability of finding G (x) = y; in the second perturbation function, each input sensitive word x is kept unchanged with probability p and used from the candidate word dictionary with probability qD _Cand The perturbation probability can be described as:

。

where y = x indicates no word replacement, K indicates the size of each word candidate dictionary, and is composed of the first a candidate words with the shortest euclidean distance for each word.

Preferably, for image data and text data, the loss function of the first automatic encoder and the second automatic encoder is described as follows:

。

wherein λ is a weight parameter, wherein,L _v as a function of the loss of the first auto-encoder,L _t as a function of the loss of the second auto-encoder,L _c is a function for measuring the correlation loss between the image modality and the text modality;X _v in order to be the image data,X _t in the case of text data, the text data,X' _v 、X' _t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f _v 、f _t a non-linear feature extractor respectively representing an image mode and a text mode,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Vpotential nulls representing text modalitiesThe matrix representation form of (1).

Preferably, the server learns the image features uploaded by the client by using the feature fusion networkF _v Text featureF _t Image characteristicsF' _v Text featureF' _t Is specifically expressed as:

。

wherein ,Fusion()in order to feature the converged network,F _m are multi-modal features.

Meanwhile, the invention also provides a multi-mode federal learning privacy protection system, and the specific scheme is as follows:

a multi-mode privacy protection system for federal learning comprises a server and a plurality of clients; and when the privacy protection system carries out privacy protection, the method steps of the multi-mode federal learning privacy protection method are executed.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the multi-mode federated learning privacy protection method provided by the invention, different multi-mode pre-training modes are adopted for training clients with different modes, the client does not need to be assumed to contain all mode data, the method is more suitable for actual scene requirements, and the practicability of the method is higher.

(2) According to the multi-mode federal learning privacy protection method, the characteristics obtained by training the client are subjected to privacy protection and then uploaded to the server, so that the server is ensured not to deduce the privacy information of the user through the characteristics uploaded by the client, and the privacy and the safety of data of each participant are improved.

(3) According to the invention, for the client only containing image data, the image data is processed by adopting a differential privacy generation countermeasure network algorithm (DPGAN algorithm), the false features which are not similar to the original image features are obtained by the differential privacy generation countermeasure network algorithm, and then the false features are uploaded to the server for aggregation, so that the privacy leakage of the original image data is effectively prevented, and meanwhile, the usability of the image features is improved.

(4) The method only comprises the client of the text data, processes the text data by adopting a sensitive word replacement algorithm (UTLDP algorithm) based on the localized differential privacy, replaces the sensitive words contained in the text, and then carries out the replacement by the sensitive wordsLSTMThe text features obtained by the network are uploaded to the server for aggregation, so that the privacy of the original text data is effectively prevented from being leaked, and the usability of the text features is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a specific implementation of a privacy protection method for multi-modal federal learning.

FIG. 2 is a diagram of image characterization for a client containing both image data and text dataF' _v Text featureF' _t The implementation of extraction is schematically shown.

Fig. 3 is a schematic diagram of an implementation of a challenge network generation algorithm based on differential privacy.

Fig. 4 is a schematic diagram of an implementation of a sensitive word replacement algorithm based on localized differential privacy.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Fig. 1 is a schematic implementation diagram of a multi-modal federated learning privacy protection method provided by the present invention, and as shown in fig. 1, the multi-modal federated learning privacy protection method provided by the present invention includes the following steps:

S4. for a client side simultaneously containing image data and text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to intermediate layers of the first automatic encoder and the second automatic encoderF' _v Text featureF' _t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after noise additionF' _v Text featureF' _t Uploading to a server;

S6. the server publishes the multimodal models to the various clients.

In particular toIn the implementation process, as shown in fig. 3, for a client only containing image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain image features of the image dataF _v The method specifically comprises the following steps:

S22. Image data of clientd _v And false datad' _v Respectively input into discriminator neural network for generating countermeasure network, and respectively outputM(d _v ) AndM(d' _v )，M(d _v )、M(d' _v ) Respectively representing the result of the neural network output of the discriminator ifM(d _v ) AndM(d' _v ) If the following condition is satisfied, outputting dummy datad' _v Performing the stepS24; otherwise, executing the stepS23；

。

Wherein γ is a privacy parameter;

representing the probability of the same result output by the neural network of the discriminator; the closer gamma is to 1, the more the input data cannot be distinguished through the output result, and the stronger the indistinguishability of the input data is;Pr[y=M(d _v )]representation findingM(d _v ) The probability of being compromised;

S23. the arbiter adds (epsilon, delta) -differential privacy protection to the gradient theta and returns to the producer neural network that produced the countermeasure networkRegenerating false data by a generator neural networkd' _v Then the step is executedS22；

S24. For output false datad' _v Inputting it intoCNNObtaining image characteristics of image data in a networkF _v 。

In a specific implementation process, the judger adds (e, δ) -differential privacy protection to the gradient θ, specifically:

。

representing a set of gradient parameters within a neighborhood of θ; epsilon is used to control the privacy protection level, the smaller epsilon, the greater the privacy protection capability provided. δ represents the probability that the tolerable privacy budget exceeds ε.

The gradient theta is the derivative of the objective function in the neural network training process, and the gradient value can reflect the variation trend of input data to improve the model precision and optimize the objective function value. Because the differential privacy has a post-processing characteristic, the differential privacy is added to the gradient, and theoretically guaranteed differential privacy protection can be provided for input data. Therefore, the synthesized fake data can capture the rich semantics of the original data and can also meet the differential privacy mechanism.

In a specific implementation process, as shown in fig. 4, for a client that only includes text data, the text data is processed by using a sensitive word replacement algorithm based on localized differential privacy to obtain text featuresF _t The method specifically comprises the following steps:

S31. client-side construction of sensitive attribute dictionaryD _Attr Including username, gender, sensitive locationSensitive verbs, sensitive nouns, etc.;

S32. candidate word dictionary generation by using synonym word stockD _Cand Calculating a candidate word dictionaryD _Cand Dictionary of each word and sensitive attributeD _Attr The euclidean distance of each word in (a);

S34. for each word in the text data after the sensitive word replacementW _i Deriving vectors using word embeddingw _i =Embed (W _i )Will vectorw _i Is inputted intoLSTMIn the network, obtaining text featuresF _t 。

Word embedding (Embed) is a type representation of a word. It means that a high-dimensional space with the number of dimensions of all words is embedded into a low-dimensional continuous vector space, and each word or phrase is mapped to a vector on a real number domain.w _i =Embed(W _i )What is meant is thatW _i Low-dimensional vector obtained by embedding words into low-dimensional continuous vector spacew _i 。

In a specific implementation process, a candidate word dictionary is calculatedD _Cand Dictionary of each word and sensitive attributeD _Attr The euclidean distance of each word in (a) specifically includes:

。

In a specific implementation process, all sensitive words in text data are replaced by candidate words, and the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy, which specifically includes:

。

。

wherein y = x represents that the word is not replaced, K represents the size of each word candidate dictionary, and is composed of the first a candidate words with the shortest euclidean distance of each word, and the perturbation mode of the G () perturbation function replaces the sensitive word x with a certain probability q, and does not replace the sensitive word x with a probability p. Observing the word replacement probability q, the candidate words with smaller Euclidean distance have higher probability to be replaced. Through the disturbance, an attacker cannot judge whether the sensitive words are replaced, and original text information is reserved to the maximum extent, so that text characteristic information is better reserved.

In a specific implementation process, as shown in fig. 2, for image data and text data, the loss functions of the first and second automatic encoders are described as follows:

。

wherein λ is a weight parameter, wherein,L _v as a function of the loss of the first auto-encoder,L _t as a function of the loss of the second auto-encoder,L _c is a function for measuring the correlation loss between the image modality and the text modality;X _v as the data of the image is to be displayed,X _t in the case of text data, the text data,X' _v 、X' _t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f _v 、f _t a non-linear feature extractor respectively representing an image mode and a text mode,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Va matrix representation of the underlying space representing the text modality.

In a specific implementation process, the server learns the image characteristics uploaded by the client by using the characteristic fusion networkF _v Text featureF _t Image characteristicsF' _v Text featureF' _t Is specifically expressed as:

。

Example 2

The embodiment provides a multi-modal federated learning privacy protection system, as shown in fig. 1, the specific scheme is as follows:

a multi-mode privacy protection system for federal learning comprises a server and a plurality of clients; and when the privacy protection system carries out privacy protection, executing the method steps of the multi-mode federal learning privacy protection method in the embodiment 1.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partly contributing to the prior art or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the method described in the embodiments of the present inventionAll or part of the steps of (a). And the aforementioned storage medium includes:Udisk, portable hard disk, read-only memory: (ROM，Read-OnlyMemory) (ii) a random access memoryRAM，RandomAccessMemory) Various media that can store program code, such as a magnetic disk or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A multi-mode federal learning privacy protection method is characterized in that: the method comprises the following steps:

S4. for a client side simultaneously containing image data and text data, aligning the image data and the text data through a first automatic encoder and a second automatic encoder respectively, and generating image characteristics to intermediate layers of the first automatic encoder and the second automatic encoderF' _v Text featureF' _t Respectively adding epsilon-Laplace noise of differential privacy protection; image characteristics after noise additionF' _v Text featureF' _t Uploading the data to a server;

S6. the server publishes the multimodal models to the various clients.

2. The method for privacy protection for multi-modal federated learning of claim 1, wherein: for the client only containing the image data, the image data is processed by adopting a countermeasure network algorithm based on differential privacy generation to obtain the image characteristics of the image dataF _v The method specifically comprises the following steps:

S21. client side generates random vector by using random generatorR=(r ₁ ,…,r _k )，kRepresenting random vectorsRDimension of (d); inputting the random vector into a generator neural network for generating a countermeasure network to obtain false datad' _v ；

S22. Image data of clientd _v And false datad' _v Respectively input into discriminator neural network for generating antagonistic network, and respectively outputM(d _v ) AndM(d' _v )，M(d _v )、M(d' _v ) Respectively representing the result of the neural network output of the discriminator ifM(d _v ) AndM(d' _v ) If the following condition is satisfied, outputting dummy datad' _v Performing the stepS24; otherwise, executing the stepS23；

；

Wherein γ is a privacy parameter;

representing the probability of the same result output by the neural network of the discriminator;

3. The privacy preserving method of multi-modal federal learning as in claim 2, wherein: the judger adds (epsilon, delta) -differential privacy protection to the gradient theta, specifically:

；

wherein epsilon is a first privacy budget and delta is a second privacy budget;R() Is a first disturbance function, S is expressed as a disturbance result obtained after the gradient theta is disturbed,Pr[R(θ)∈S)]expressed as findingR(θ) probability of being compromised;

representing a set of gradient parameters within the theta neighborhood.

4. The privacy preserving method of multimodal federated learning as claimed in claim 1, wherein: for the client only containing text data, the sensitive word replacement algorithm based on the localized differential privacy is adopted to perform the methodProcessing the text data to obtain text characteristicsF _t The method specifically comprises the following steps:

S31. client-side construction of sensitive attribute dictionaryD _Attr ；

S32. Candidate word dictionary generation by using synonym word stockD _Cand Calculating a candidate word dictionaryD _Cand Dictionary of each word and sensitive attributeD _Attr The euclidean distance of each word in;

S34. for each word in the text data after the sensitive word replacementW _i Deriving vectors using word embeddingw _i =Embed(W _i )Will vectorw _i Is inputted intoLSTMIn the network, obtaining text featuresF _t 。

5. The method for privacy protection for multi-modal federated learning of claim 4, wherein: computing candidate word dictionaryD _Cand Each word in the dictionary and sensitive attribute dictionaryD _Attr The euclidean distance of each word in (a) specifically includes:

；

6. The privacy preserving method of multi-modal federal learning as in claim 5, wherein: replacing all sensitive words in the text data with candidate words, wherein the replacement probability meets the random response probability of a sensitive word replacement algorithm based on localized differential privacy, and the method specifically comprises the following steps:

；

wherein ,G() Representing a second perturbation function;xa sensitive word is represented that is,x' represents a candidate word, G (x) = y represents a result y of the sensitive word x through a perturbation function G (),Pr[G(x)=y]representing the probability of finding G (x) = y; in the second perturbation function, each input sensitive word x is kept unchanged with probability p and used from the candidate word dictionary with probability qD _Cand The perturbation probability can be described as:

；

7. The method for privacy protection for multi-modal federated learning according to any of claims 1-6, wherein: for image data and text data, the loss functions of the first and second automatic encoders are described as follows:

；

wherein λ is a weight parameter, wherein,L _v as a function of the loss of the first auto-encoder,L _t as a function of the loss of the second auto-encoder,L _c a correlation loss function for measuring the correlation between an image modality and a text modality;X _v as the data of the image is to be displayed,X _t in the case of text data, the text data,X' _v 、X' _t the image data and the text data are respectively obtained after passing through a first automatic encoder and a second automatic encoder;dist() A distance metric function for the raw data and the generated data;f _v 、f _t a non-linear feature extractor respectively representing an image modality and a text modality,tr() The trace operations of the matrix are represented by,Ua matrix representation representing the potential space of the image modality,Va matrix representation of the underlying space representing the text modality.

8. The method for privacy protection for multi-modal federated learning of claim 7, wherein: server learns image characteristics uploaded by client by using characteristic fusion networkF _v Text featureF _t Image characteristicsF' _v Text featureF' _t Is specifically expressed as:

；

9. A privacy preserving system for multimodal federated learning, comprising: the system comprises a server and a plurality of clients; when the privacy protection system performs privacy protection, the method steps of the multi-modal federal learning privacy protection method claimed in any one of claims 1-8 are executed.