CN112464269A - Data selection method in federated learning scene - Google Patents
Data selection method in federated learning scene Download PDFInfo
- Publication number
- CN112464269A CN112464269A CN202011464915.XA CN202011464915A CN112464269A CN 112464269 A CN112464269 A CN 112464269A CN 202011464915 A CN202011464915 A CN 202011464915A CN 112464269 A CN112464269 A CN 112464269A
- Authority
- CN
- China
- Prior art keywords
- user
- data
- users
- server
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010187 selection method Methods 0.000 title claims abstract description 7
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 37
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000007246 mechanism Effects 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000007423 decrease Effects 0.000 claims description 3
- 238000013136 deep learning model Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A data selection method in the federated learning scene comprises the steps of filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process, and training a model; meanwhile, because the server end log information is adopted to dynamically select the user; the data selection strategy is efficient and accurate based on the gradient upper bound value selection data and in view of the impact of erroneous data on the gradient.
Description
Technical Field
The invention relates to a data selection method in a federal learning scene, and belongs to the field of data analysis and data quality evaluation.
Background
How to acquire a large number of high-quality data sets has become a common bottleneck for many machine learning models and AI applications. This is not only because collecting and labeling large numbers of samples is very expensive, but also because privacy issues prevent data sharing in many fields (e.g., medicine and economics). The advent of federal learning has made it possible for end users to jointly train network models using local data. In the federal learning process, the local data quality of the user affects the performance of the global model, and low-quality data (e.g., error label data and non-uniformly distributed data) seriously hinders the global model from achieving good effect.
The invention aims to select a group of high-quality training samples for a given federated learning task in a privacy protection mode under a given budget, so that the accuracy of a model is improved and the convergence speed of the model is accelerated.
There has been a series of work on data selection in deep learning: 1) the method provides various quality indexes such as task relevance and content diversity, performs quality index detection on data samples, and selects data with high quality scores to participate in training. 2) The training samples important to the model are dynamically selected to compose data batch during the training process to speed up model convergence, typically with importance scores quantified by gradient norm or loss values. They cannot be used directly in federal learning: 1) existing methods require direct access to all training samples, whereas in federal systems, data cannot be accessed directly by third parties. 2) Directly computing the importance of each sample incurs unacceptable overhead for resource-limited participants. 3) Existing methods do not take into account the impact of non-IID or erroneous samples on the sample selection strategy and may place more importance on erroneous samples, thereby degrading model performance.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a privacy protection mode for selecting a group of high-quality training samples for a given federal learning task, so that the accuracy of a model is improved and the convergence speed of the model is accelerated. The method comprises the steps of filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process and training models.
Preferably, the method comprises the following steps: task-related user and data filtering is such that when a FL task arrives, the server first computes each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkThe intersection of { (x) and target tag set Yk,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈YkN Y } | > v, then the user is relevant, in order to meet the need for privacy protection, we use privacy protection negotiation technology (PSI).
Preferably, the method comprises the following steps: selecting by a user before training: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: max V (Q), s.t., Sigmak∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user. The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largestUntil budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk, Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity overhead is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vectorTo be provided withHas a probability of 1 toHas a probability of 0 and a probability of 1-ff is a parameter resulting from the definition of the degree of privacy. The user then generates a disturbance sketch using the disturbance vectorsAnd handleSending the result to a server, and using the disturbance draft vector to calculate similarity by the serverAnd calculating content diversityBy this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
C) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability P that the user is selectedA(Q)=det(AQ). When homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding content heightSimilar users. Value function Vd(Q)=det(AQ),SQ=[Sij]i,j∈QWe transform the user selection problem into a log-submodel problem, iteratively selecting the maximum PA(Q { k }) of user Ck。
Preferably, the method comprises the following steps: user and data selection during training is given a selected set of high quality users, with an index set of Q ∈ [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selectedThe number m of users is selected,i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ, (zk,iT-1), then with probability(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) of those selectedData z ofk,i∈Dk。
Preferably, the method comprises the following steps: model training is that in each iteration, all selected users train their local models on the selected samples, and the server aggregates the model updates of the user side to update the global model. The Server repeats the process until a global optimal model θ is obtained*。
The invention designs an efficient data selection method facing the federal learning, improves the precision of the model and accelerates the convergence speed of the model. The method has the advantages that the vector sketch and the random response mechanism are adopted, so that the strategy selection of the user is efficient and has privacy protection; meanwhile, because the server end log information is adopted to dynamically select the user; the data selection strategy is efficient and accurate based on the gradient upper bound value selection data and in view of the impact of erroneous data on the gradient.
Drawings
FIG. 1 is a flow diagram of an efficient data selection system in a federated learning scenario.
Detailed Description
The invention will be described in detail below with reference to the following figures: as shown in fig. 1, the data selection method in the federal learning scenario proposed in the present invention is mainly divided into the following modules: filtering out users and data related to tasks, selecting users before training, selecting users and data in the training process, and training models.
(1) Task related user and data filtering: when a FL task arrives, server first calculates each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkIntersection of the target tag set YSet { (x)k,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈YkAndyjj > v, then the user is relevant. To meet the need for privacy protection, we use privacy protection intersection technology (PSI).
(2) Selecting by a user before training: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: max V (Q), s.t., Sigmak∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user. The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largestUntil budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk, Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity overhead is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vectorTo be provided withHas a probability of 1 toHas a probability of 0 and a probability of 1-ff is a parameter resulting from the definition of the degree of privacy. The user then generates a disturbance sketch using the disturbance vectorsAnd handleSending the result to a server, and using the disturbance draft vector to calculate similarity by the serverAnd calculating content diversityBy this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
(3) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability P that the user is selectedA(Q)=det(AQ). When homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding users with highly similar content. Value function Vd(Q)=det(AQ),SQ=[Sij]i,j∈QWe transform the user selection problem into]The og-submodel problem iteratively selects the maximum PA(Q { k }) of user Ck。
(4) User and data selection during training: given a selected set of high quality users, its index set is Q ∈ [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selectedThe number m of users is selected,i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ (z)k,iT-1), then with probability(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) selectionThoseData z ofk,i∈Dk。
(5) Model training: in each iteration, all selected users train their local models on the selected samples, and the server aggregates the model updates of the user side to update the global model. The Server repeats the process until a global optimal model θ is obtained*。
Claims (5)
1. A data selection method in a federated learning scene is characterized by comprising the steps of filtering out users and data related to tasks, selecting the users before training, selecting the users and data in the training process and training a model.
2. The method of claim 1, wherein the task-related users and data are filtered such that when a FL task arrives, the server first calculates each user Ck,k∈[K]Tag set Y ofk={yk|(xk,yk)∈DkThe intersection of { (x) and target tag set Yk,yk)|yk∈YkAndyy) to filter out users having target category data. If the number of samples in the intersecting set exceeds the minimum number of the target model | { (x)k,yk)|yk∈Yk∩Y>v, then the user is relevant, in order to meet the need for privacy protection, we use privacy protection requirements technology (PSI).
3. The method of claim 1, wherein the pre-training user selects: the server further selects a set of high quality users (user subscript set Q) from the set of relevant users using a lattice determinant-based (DPP) algorithm to maximize homogeneity and content diversity under a budget constraint B: maxv (q), s.t., Σk∈Q,Q∈N′bkB.V (Q) is the quality value of the selected user.The server then coordinates the selected users to begin training the model. In the module, the method mainly comprises the following steps:
a) user selection based on homogeneity: the server preferentially selects users with uniformly distributed data and without missing categories. When homogeneity is taken as an index for selecting users, Vμ(Q)=∑k∈Qμk,μkDefined as the difference between the data distribution and the uniform distribution for user k, i.e.:
calculating mu for privacy protectionkThe server and each user use the public key of the server to jointly calculate by using the efficient and safe two-party calculation protocol based on the homomorphic encryption of the BGN. The server then greedily selects the one with the largestUntil budget B is exhausted, find the best set of users.
b) Based on diversity user selection: the server selects users with various data contents to participate in model training. When content diversity is used as a user selection criterion, v (q) ═ ρ (D) and D uber @ uk∈QDk, Wherein S (v)i,vj) Is calculating user Ci,CjSimilarity functions between them, such as euclidean distance. The Server greedily selects the next user with the lowest similarity to the current set of users.
In order to calculate the content diversity of a data set, firstly, feature vector expression of data needs to be extracted, we extract features using a deep learning model, for example, extract content feature vectors of pictures using a VGG-16 network, and then calculate the content diversity of all data of the user. When in useWhen the data volume M of the user is large and the feature vector dimension l is high, the calculation content diversity expense is large O (M)2l), meanwhile, the existing computing method needs to be directly contacted with original data, so we propose an efficient privacy-protecting content diversity computing method, which constructs the features of each user data set through a low-dimensional vector based on JL transformation, and protects the privacy of each sample by using a random response mechanism, and mainly comprises the following steps:
i. constructing a data set content sketch: user CkLocally generated content feature vector phik={φk,i|i∈[Uk]Is then server selects a mapping matrix w will be phikMapped as a low dimensional vector h (phi)k,i)=sign(w·φk,i). The distortion caused by the mapping reduces the accuracy of diversity, but protects the privacy of user mapping vectors to a certain extent, and the data set DKThe content vector sketch of
Random response mechanism: to further protect the privacy of the presence of each datum, we use a random response mechanism to generate a vector sketch h (phi)k,i) Is represented by a perturbation vector To be provided withHas a probability of 1 toHas a probability of 0 and a probability of 1-ff is a user-defined controlAnd (4) a parameter for controlling the privacy degree. The user then generates a disturbance sketch using the disturbance vectorsAnd handleSending the result to a server, and using the disturbance draft vector to calculate similarity by the serverAnd calculating content diversityBy this time, the server reduces the overhead of computing content diversity by several orders of magnitude and protects the data privacy of the user.
C) Selecting by a user based on the lattice determinant: when considering both homogeneity and diversity, the user selection problem is translated into the DPP problem. User CiIs μiAnd user CjIs SijWe define a semi-positive definite matrix A[N′]=[Aij]i,j∈[N′],Aij=uiujSijThen probability of user being selectedWhen homogeneity increases and similarity decreases, determinant increases, and thus DPP-based selection tends to select users with evenly distributed categories while avoiding users with highly similar content. Function of valueSQ=[Sij]i,j∈QWe transform the user selection problem into a log-submodel problem, iteratively selecting the maximum PA(Q { k }) of user Ck。
4. The method of data selection in a federated learning scenario as in claim 1, wherein the user and data selection during the training process is given a selected set of high quality users with an index set of Q e [ N']And in order to further improve the model performance and reduce the training overhead, users of the zeta proportion are selected in each training iteration, and simultaneously the users locally select important data samples to participate in model training. An intuitive quantized user CkThe method for calculating the importance in the t-th round Is a sample zk,iIn the input and output of the last layer (L layer) of the model, the gradient-based upper bound value selection data is adopted, but in the method, when the model is complex and the data volume is large, the calculation cost is large O (n.s), n is the total data volume, and s is the number theta epsilon of the model parameters to RsTo this end, we propose a policy for dynamically selecting users based on server-side log information. Specifically, in the t-th round, the server is selected according to the probability of the user being selectedThe number m of users is selected,i.e. we have users that have a large impact on the model to be selected with a higher probability. For each selected user CkThey calculate locally the importance of each data λ (z)k,iT-1), then with probability(gradient L of error data)2Norm value far greater than the gradient L of the correct data2Norm) of those selectedData z ofk,i∈Dk。
5. The method of claim 1, wherein the model training is such that in each iteration, all selected users train their local models on selected samples, and server aggregates model updates at user end to update global models. The Server repeats the process until a global optimal model θ is obtained*。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011464915.XA CN112464269A (en) | 2020-12-14 | 2020-12-14 | Data selection method in federated learning scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011464915.XA CN112464269A (en) | 2020-12-14 | 2020-12-14 | Data selection method in federated learning scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112464269A true CN112464269A (en) | 2021-03-09 |
Family
ID=74804713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011464915.XA Pending CN112464269A (en) | 2020-12-14 | 2020-12-14 | Data selection method in federated learning scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112464269A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326947A (en) * | 2021-05-28 | 2021-08-31 | 山东师范大学 | Joint learning model training method and system |
CN114189899A (en) * | 2021-12-10 | 2022-03-15 | 东南大学 | User equipment selection method based on random aggregation beam forming |
CN114219147A (en) * | 2021-12-13 | 2022-03-22 | 南京富尔登科技发展有限公司 | Power distribution station fault prediction method based on federal learning |
CN114841016A (en) * | 2022-05-26 | 2022-08-02 | 北京交通大学 | Multi-model federal learning method, system and storage medium |
CN115391734A (en) * | 2022-10-11 | 2022-11-25 | 广州天维信息技术股份有限公司 | Client satisfaction analysis system based on federal learning |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200242466A1 (en) * | 2017-03-22 | 2020-07-30 | Visa International Service Association | Privacy-preserving machine learning |
CN111866954A (en) * | 2020-07-21 | 2020-10-30 | 重庆邮电大学 | User selection and resource allocation method based on federal learning |
-
2020
- 2020-12-14 CN CN202011464915.XA patent/CN112464269A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200242466A1 (en) * | 2017-03-22 | 2020-07-30 | Visa International Service Association | Privacy-preserving machine learning |
CN111866954A (en) * | 2020-07-21 | 2020-10-30 | 重庆邮电大学 | User selection and resource allocation method based on federal learning |
Non-Patent Citations (1)
Title |
---|
TI_ANY TUOR, ET AL.: "Data Selection for Federated Learning with Relevant and Irrelevant Data at Clients", 《ARXIV:2001.08300》, pages 1 - 14 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326947A (en) * | 2021-05-28 | 2021-08-31 | 山东师范大学 | Joint learning model training method and system |
CN113326947B (en) * | 2021-05-28 | 2023-06-16 | 山东师范大学 | Training method and system for joint learning model |
CN114189899A (en) * | 2021-12-10 | 2022-03-15 | 东南大学 | User equipment selection method based on random aggregation beam forming |
CN114219147A (en) * | 2021-12-13 | 2022-03-22 | 南京富尔登科技发展有限公司 | Power distribution station fault prediction method based on federal learning |
CN114219147B (en) * | 2021-12-13 | 2024-06-07 | 南京富尔登科技发展有限公司 | Power distribution station fault prediction method based on federal learning |
CN114841016A (en) * | 2022-05-26 | 2022-08-02 | 北京交通大学 | Multi-model federal learning method, system and storage medium |
CN115391734A (en) * | 2022-10-11 | 2022-11-25 | 广州天维信息技术股份有限公司 | Client satisfaction analysis system based on federal learning |
CN115391734B (en) * | 2022-10-11 | 2023-03-10 | 广州天维信息技术股份有限公司 | Client satisfaction analysis system based on federal learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112464269A (en) | Data selection method in federated learning scene | |
Cao et al. | Multi-marginal wasserstein gan | |
CN112446423B (en) | Fast hybrid high-order attention domain confrontation network method based on transfer learning | |
Xu et al. | Unsupervised domain adaptation via importance sampling | |
CN104008174A (en) | Privacy-protection index generation method for mass image retrieval | |
Liu et al. | Intelligent and secure content-based image retrieval for mobile users | |
CN109726195B (en) | Data enhancement method and device | |
CN112883070B (en) | Generation type countermeasure network recommendation method with differential privacy | |
US11874866B2 (en) | Multiscale quantization for fast similarity search | |
CN113378938A (en) | Edge transform graph neural network-based small sample image classification method and system | |
CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
CN116229552A (en) | Face recognition method for embedded hardware based on YOLOV7 model | |
CN116450877A (en) | Image text matching method based on semantic selection and hierarchical alignment | |
CN114401229A (en) | Encrypted traffic identification method based on Transformer deep learning model | |
Chapel et al. | Partial gromov-wasserstein with applications on positive-unlabeled learning | |
CN116630726B (en) | Multi-mode-based bird classification method and system | |
CN114003744A (en) | Image retrieval method and system based on convolutional neural network and vector homomorphic encryption | |
CN113935396A (en) | Manifold theory-based method and related device for resisting sample attack | |
CN116383470B (en) | Image searching method with privacy protection function | |
CN117456267A (en) | Class increment learning method based on similarity prototype playback | |
CN116796038A (en) | Remote sensing data retrieval method, remote sensing data retrieval device, edge processing equipment and storage medium | |
CN114911967B (en) | Three-dimensional model sketch retrieval method based on self-adaptive domain enhancement | |
CN106529601A (en) | Image classification prediction method based on multi-task learning in sparse subspace | |
CN112906829B (en) | Method and device for constructing digital recognition model based on Mnist data set | |
CN115481415A (en) | Communication cost optimization method, system, device and medium based on longitudinal federal learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |